Class CharsetRecog_2022

java.lang.Object
com.ibm.icu.text.CharsetRecognizer
com.ibm.icu.text.CharsetRecog_2022
Direct Known Subclasses:
CharsetRecog_2022.CharsetRecog_2022CN, CharsetRecog_2022.CharsetRecog_2022JP, CharsetRecog_2022.CharsetRecog_2022KR

abstract class CharsetRecog_2022 extends CharsetRecognizer
class CharsetRecog_2022 part of the ICU charset detection implementation. This is a superclass for the individual detectors for each of the detectable members of the ISO 2022 family of encodings. The separate classes are nested within this class.
  • Constructor Details

    • CharsetRecog_2022

      CharsetRecog_2022()
  • Method Details

    • match

      int match(byte[] text, int textLen, byte[][] escapeSequences)
      Matching function shared among the 2022 detectors JP, CN and KR Counts up the number of legal an unrecognized escape sequences in the sample of text, and computes a score based on the total number & the proportion that fit the encoding.
      Parameters:
      text - the byte buffer containing text to analyse
      textLen - the size of the text in the byte.
      escapeSequences - the byte escape sequences to test for.
      Returns:
      match quality, in the range of 0-100.