Package com.ibm.icu.impl
Class UnicodeSetStringSpan
java.lang.Object
com.ibm.icu.impl.UnicodeSetStringSpan
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate static final class
Helper class for UnicodeSetStringSpan. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
Set up for all variants of span()?static final int
(package private) static final short
Special spanLength short values.static final int
static final int
static final int
static final int
static final int
static final int
static final int
(package private) static final short
The spanLength is >=0xfe.private final int
Maximum lengths of relevant strings.static final int
private UnicodeSetStringSpan.OffsetList
Span helperprivate boolean
Are there strings that are not fully contained in the code point set?private short[]
The lengths of span(), spanBack() etc.private UnicodeSet
Set for span(not contained).private UnicodeSet
Set for span().The strings of the parent set.static final int
-
Constructor Summary
ConstructorsConstructorDescriptionUnicodeSetStringSpan
(UnicodeSetStringSpan otherStringSpan, ArrayList<String> newParentSetStrings) Constructs a copy of an existing UnicodeSetStringSpan.UnicodeSetStringSpan
(UnicodeSet set, ArrayList<String> setStrings, int which) Constructs for all variants of span(), or only for any one variant. -
Method Summary
Modifier and TypeMethodDescriptionprivate void
addToSpanNotSet
(int c) Adds a starting or ending string character to the spanNotSet so that a character span ends before any string.boolean
contains
(int c) For fast UnicodeSet::contains(c).(package private) static short
makeSpanLengthByte
(int spanLength) private static boolean
matches16
(CharSequence s, int start, String t, int length) (package private) static boolean
matches16CPB
(CharSequence s, int start, int limit, String t, int tlength) Compare 16-bit Unicode strings (which may be malformed UTF-16) at code point boundaries.boolean
Do the strings need to be checked in span() etc.?int
span
(CharSequence s, int start, UnicodeSet.SpanCondition spanCondition) Spans a string.int
spanAndCount
(CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount) Spans a string and counts the smallest number of set elements on any path across the span.int
spanBack
(CharSequence s, int length, UnicodeSet.SpanCondition spanCondition) Span a string backwards.private int
spanContainedAndCount
(CharSequence s, int start, OutputInt outCount) private int
spanNot
(CharSequence s, int start, OutputInt outCount) Algorithm for spanNot()==span(SpanCondition.NOT_CONTAINED) Theoretical algorithm: - Iterate through the string, and at each code point boundary: + If the code point there is in the set, then return with the current position.private int
spanNotBack
(CharSequence s, int length) (package private) static int
spanOne
(UnicodeSet set, CharSequence s, int start, int length) Does the set contain the next code point? If so, return its length; otherwise return its negative length.(package private) static int
spanOneBack
(UnicodeSet set, CharSequence s, int length) private int
spanWithStrings
(CharSequence s, int start, int spanLimit, UnicodeSet.SpanCondition spanCondition) Synchronized method for complicated spans using the offsets.
-
Field Details
-
WITH_COUNT
public static final int WITH_COUNT- See Also:
-
FWD
public static final int FWD- See Also:
-
BACK
public static final int BACK- See Also:
-
CONTAINED
public static final int CONTAINED- See Also:
-
NOT_CONTAINED
public static final int NOT_CONTAINED- See Also:
-
ALL
public static final int ALL- See Also:
-
FWD_UTF16_CONTAINED
public static final int FWD_UTF16_CONTAINED- See Also:
-
FWD_UTF16_NOT_CONTAINED
public static final int FWD_UTF16_NOT_CONTAINED- See Also:
-
BACK_UTF16_CONTAINED
public static final int BACK_UTF16_CONTAINED- See Also:
-
BACK_UTF16_NOT_CONTAINED
public static final int BACK_UTF16_NOT_CONTAINED- See Also:
-
ALL_CP_CONTAINED
static final short ALL_CP_CONTAINEDSpecial spanLength short values. (since Java has not unsigned byte type) All code points in the string are contained in the parent set.- See Also:
-
LONG_SPAN
static final short LONG_SPANThe spanLength is >=0xfe.- See Also:
-
spanSet
Set for span(). Same as parent but without strings. -
spanNotSet
Set for span(not contained). Same as spanSet, plus characters that start or end strings. -
strings
The strings of the parent set. -
spanLengths
private short[] spanLengthsThe lengths of span(), spanBack() etc. for each string. -
maxLength16
private final int maxLength16Maximum lengths of relevant strings. -
someRelevant
private boolean someRelevantAre there strings that are not fully contained in the code point set? -
all
private boolean allSet up for all variants of span()? -
offsets
Span helper
-
-
Constructor Details
-
UnicodeSetStringSpan
Constructs for all variants of span(), or only for any one variant. Initializes as little as possible, for single use. -
UnicodeSetStringSpan
public UnicodeSetStringSpan(UnicodeSetStringSpan otherStringSpan, ArrayList<String> newParentSetStrings) Constructs a copy of an existing UnicodeSetStringSpan. Assumes which==ALL for a frozen set.
-
-
Method Details
-
needsStringSpanUTF16
public boolean needsStringSpanUTF16()Do the strings need to be checked in span() etc.?- Returns:
- true if strings need to be checked (call span() here), false if not (use a BMPSet for best performance).
-
contains
public boolean contains(int c) For fast UnicodeSet::contains(c). -
addToSpanNotSet
private void addToSpanNotSet(int c) Adds a starting or ending string character to the spanNotSet so that a character span ends before any string. -
span
Spans a string.- Parameters:
s
- The string to be spannedstart
- The start index that the span beginsspanCondition
- The span condition- Returns:
- the limit (exclusive end) of the span
-
spanWithStrings
private int spanWithStrings(CharSequence s, int start, int spanLimit, UnicodeSet.SpanCondition spanCondition) Synchronized method for complicated spans using the offsets. Avoids synchronization for simple cases.- Parameters:
spanLimit
- = spanSet.span(s, start, CONTAINED)
-
spanAndCount
public int spanAndCount(CharSequence s, int start, UnicodeSet.SpanCondition spanCondition, OutputInt outCount) Spans a string and counts the smallest number of set elements on any path across the span.For proper counting, we cannot ignore strings that are fully contained in code point spans.
If the set does not have any fully-contained strings, then we could optimize this like span(), but such sets are likely rare, and this is at least still linear.
- Parameters:
s
- The string to be spannedstart
- The start index that the span beginsspanCondition
- The span conditionoutCount
- The count- Returns:
- the limit (exclusive end) of the span
-
spanContainedAndCount
-
spanBack
Span a string backwards.- Parameters:
s
- The string to be spannedspanCondition
- The span condition- Returns:
- The string index which starts the span (i.e. inclusive).
-
spanNot
Algorithm for spanNot()==span(SpanCondition.NOT_CONTAINED) Theoretical algorithm: - Iterate through the string, and at each code point boundary: + If the code point there is in the set, then return with the current position. + If a set string matches at the current position, then return with the current position. Optimized implementation: (Same assumption as for span() above.) Create and cache a spanNotSet which contains all of the single code points of the original set but none of its strings. For each set string add its initial code point to the spanNotSet. (Also add its final code point for spanNotBack().) - Loop: + Do spanLength=spanNotSet.span(SpanCondition.NOT_CONTAINED). + If the current code point is in the original set, then return the current position. + If any set string matches at the current position, then return the current position. + If there is no match at the current position, neither for the code point there nor for any set string, then skip this code point and continue the loop. This happens for set-string-initial code points that were added to spanNotSet when there is not actually a match for such a set string.- Parameters:
s
- The string to be spannedstart
- The start index that the span beginsoutCount
- If not null: Receives the number of code points across the span.- Returns:
- the limit (exclusive end) of the span
-
spanNotBack
-
makeSpanLengthByte
static short makeSpanLengthByte(int spanLength) -
matches16
-
matches16CPB
Compare 16-bit Unicode strings (which may be malformed UTF-16) at code point boundaries. That is, each edge of a match must not be in the middle of a surrogate pair.- Parameters:
s
- The string to match in.start
- The start index of s.limit
- The limit of the subsequence of s being spanned.t
- The substring to be matched in s.tlength
- The length of t.
-
spanOne
Does the set contain the next code point? If so, return its length; otherwise return its negative length. -
spanOneBack
-