Class UnicodeDecompressor

java.lang.Object
com.ibm.icu.text.UnicodeDecompressor
All Implemented Interfaces:
SCSU

public final class UnicodeDecompressor extends Object implements SCSU
A decompression engine implementing the Standard Compression Scheme for Unicode (SCSU) as outlined in Unicode Technical Report #6.

USAGE

The static methods on UnicodeDecompressor may be used in a straightforward manner to decompress simple strings:

  byte [] compressed = ... ; // get compressed bytes from somewhere
  String result = UnicodeDecompressor.decompress(compressed);
 

The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeDecompressor offers more powerful APIs allowing iterative decompression:

  // Decompress an array "bytes" of length "len" using a buffer of 512 chars
  // to the Writer "out"

  UnicodeDecompressor myDecompressor         = new UnicodeDecompressor();
  final static int    BUFSIZE                = 512;
  char []             charBuffer             = new char [ BUFSIZE ];
  int                 charsWritten           = 0;
  int []              bytesRead              = new int [1];
  int                 totalBytesDecompressed = 0;
  int                 totalCharsWritten      = 0;

  do {
    // do the decompression
    charsWritten = myDecompressor.decompress(bytes, totalBytesDecompressed, 
                                             len, bytesRead,
                                             charBuffer, 0, BUFSIZE);

    // do something with the current set of chars
    out.write(charBuffer, 0, charsWritten);

    // update the no. of bytes decompressed
    totalBytesDecompressed += bytesRead[0];

    // update the no. of chars written
    totalCharsWritten += charsWritten;

  } while(totalBytesDecompressed < len);

  myDecompressor.reset(); // reuse decompressor
 

Decompression is performed according to the standard set forth in Unicode Technical Report #6

See Also:
  • Field Details

    • fCurrentWindow

      private int fCurrentWindow
      Alias to current dynamic window
    • fOffsets

      private int[] fOffsets
      Dynamic compression window offsets
    • fMode

      private int fMode
      Current compression mode
    • BUFSIZE

      private static final int BUFSIZE
      Size of our internal buffer
      See Also:
    • fBuffer

      private byte[] fBuffer
      Internal buffer for saving state
    • fBufferLength

      private int fBufferLength
      Number of characters in our internal buffer
  • Constructor Details

    • UnicodeDecompressor

      public UnicodeDecompressor()
      Create a UnicodeDecompressor. Sets all windows to their default values.
      See Also:
  • Method Details

    • decompress

      public static String decompress(byte[] buffer)
      Decompress a byte array into a String.
      Parameters:
      buffer - The byte array to decompress.
      Returns:
      A String containing the decompressed characters.
      See Also:
    • decompress

      public static char[] decompress(byte[] buffer, int start, int limit)
      Decompress a byte array into a Unicode character array.
      Parameters:
      buffer - The byte array to decompress.
      start - The start of the byte run to decompress.
      limit - The limit of the byte run to decompress.
      Returns:
      A character array containing the decompressed bytes.
      See Also:
    • decompress

      public int decompress(byte[] byteBuffer, int byteBufferStart, int byteBufferLimit, int[] bytesRead, char[] charBuffer, int charBufferStart, int charBufferLimit)
      Decompress a byte array into a Unicode character array. This function will either completely fill the output buffer, or consume the entire input.
      Parameters:
      byteBuffer - The byte buffer to decompress.
      byteBufferStart - The start of the byte run to decompress.
      byteBufferLimit - The limit of the byte run to decompress.
      bytesRead - A one-element array. If not null, on return the number of bytes read from byteBuffer.
      charBuffer - A buffer to receive the decompressed data. This buffer must be at minimum two characters in size.
      charBufferStart - The starting offset to which to write decompressed data.
      charBufferLimit - The limiting offset for writing decompressed data.
      Returns:
      The number of Unicode characters written to charBuffer.
    • reset

      public void reset()
      Reset the decompressor to its initial state.