Class CollationFCD

java.lang.Object
com.ibm.icu.impl.coll.CollationFCD

public final class CollationFCD extends Object
Data and functions for the FCD check fast path. The fast path looks at a pair of 16-bit code units and checks whether there is an FCD boundary between them; there is if the first unit has a trailing ccc=0 (!hasTccc(first)) or the second unit has a leading ccc=0 (!hasLccc(second)), or both. When the fast path finds a possible non-boundary, then the FCD check slow path looks at the actual sequence of FCD values. This is a pure optimization. The fast path must at least find all possible non-boundaries. If the fast path is too pessimistic, it costs performance. For a pair of BMP characters, the fast path tests are precise (1 bit per character). For a supplementary code point, the two units are its lead and trail surrogates. We set hasTccc(lead)=true if any of its 1024 associated supplementary code points has lccc!=0 or tccc!=0. We set hasLccc(trail)=true for all trail surrogates. As a result, we leave the fast path if the lead surrogate might start a supplementary code point that is not FCD-inert. (So the fast path need not detect that there is a surrogate pair, nor look ahead to the next full code point.) hasLccc(lead)=true if any of its 1024 associated supplementary code points has lccc!=0, for fast boundary checking between BMP & supplementary. hasTccc(trail)=false: It should only be tested for unpaired trail surrogates which are FCD-inert.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final int[]
     
    private static final byte[]
     
    private static final int[]
     
    private static final byte[]
     
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static boolean
    hasLccc(int c)
     
    static boolean
    hasTccc(int c)
     
    (package private) static boolean
    Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results.
    (package private) static boolean
    Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results.
    (package private) static boolean
    mayHaveLccc(int c)
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • lcccIndex

      private static final byte[] lcccIndex
    • lcccBits

      private static final int[] lcccBits
    • tcccIndex

      private static final byte[] tcccIndex
    • tcccBits

      private static final int[] tcccBits
  • Constructor Details

    • CollationFCD

      public CollationFCD()
  • Method Details

    • hasLccc

      public static boolean hasLccc(int c)
    • hasTccc

      public static boolean hasTccc(int c)
    • mayHaveLccc

      static boolean mayHaveLccc(int c)
    • maybeTibetanCompositeVowel

      static boolean maybeTibetanCompositeVowel(int c)
      Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results. This is a fast and imprecise test.
      Parameters:
      c - a code point
      Returns:
      true if c is U+0F73, U+0F75 or U+0F81 or one of several other Tibetan characters
    • isFCD16OfTibetanCompositeVowel

      static boolean isFCD16OfTibetanCompositeVowel(int fcd16)
      Tibetan composite vowel signs (U+0F73, U+0F75, U+0F81) must be decomposed before reaching the core collation code, or else some sequences including them, even ones passing the FCD check, do not yield canonically equivalent results. They have distinct lccc/tccc combinations: 129/130 or 129/132.
      Parameters:
      fcd16 - the FCD value (lccc/tccc combination) of a code point
      Returns:
      true if fcd16 is from U+0F73, U+0F75 or U+0F81