Module org.apache.lucene.codecs
Package org.apache.lucene.codecs.bloom
Class BloomFilteringPostingsFormat
java.lang.Object
org.apache.lucene.codecs.PostingsFormat
org.apache.lucene.codecs.bloom.BloomFilteringPostingsFormat
- All Implemented Interfaces:
NamedSPILoader.NamedSPI
A
PostingsFormat
useful for low doc-frequency fields such as primary keys. Bloom filters
are maintained in a ".blm" file which offers "fast-fail" for reads in segments known to have no
record of the key. A choice of delegate PostingsFormat is used to record all other Postings data.
A choice of BloomFilterFactory
can be passed to tailor Bloom Filter settings on a
per-field basis. The default configuration is DefaultBloomFilterFactory
which allocates a
~8mb bitset and hashes values using MurmurHash64
. This should be suitable for most
purposes.
The format of the blm file is as follows:
- BloomFilter (.blm) --> Header, DelegatePostingsFormatName, NumFilteredFields, FilterNumFilteredFields, Footer
- Filter --> FieldNumber, FuzzySet
- FuzzySet -->See
FuzzySet.serialize(DataOutput)
- Header -->
IndexHeader
- DelegatePostingsFormatName -->
String
The name of a ServiceProvider registeredPostingsFormat
- NumFilteredFields -->
Uint32
- FieldNumber -->
Uint32
The number of the field in this segment - Footer -->
CodecFooter
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) class
(package private) static class
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final String
(package private) static final String
Extension of Bloom Filters fileprivate final BloomFilterFactory
private final PostingsFormat
static final int
static final int
Fields inherited from class org.apache.lucene.codecs.PostingsFormat
EMPTY
-
Constructor Summary
ConstructorsConstructorDescriptionBloomFilteringPostingsFormat
(PostingsFormat delegatePostingsFormat) Creates Bloom filters for a selection of fields created in the index.BloomFilteringPostingsFormat
(PostingsFormat delegatePostingsFormat, BloomFilterFactory bloomFilterFactory) Creates Bloom filters for a selection of fields created in the index. -
Method Summary
Modifier and TypeMethodDescriptionfieldsConsumer
(SegmentWriteState state) Writes a new segmentfieldsProducer
(SegmentReadState state) Reads a segment.toString()
Methods inherited from class org.apache.lucene.codecs.PostingsFormat
availablePostingsFormats, forName, getName, reloadPostingsFormats
-
Field Details
-
BLOOM_CODEC_NAME
- See Also:
-
VERSION_START
public static final int VERSION_START- See Also:
-
VERSION_CURRENT
public static final int VERSION_CURRENT- See Also:
-
BLOOM_EXTENSION
Extension of Bloom Filters file- See Also:
-
bloomFilterFactory
-
delegatePostingsFormat
-
-
Constructor Details
-
BloomFilteringPostingsFormat
public BloomFilteringPostingsFormat(PostingsFormat delegatePostingsFormat, BloomFilterFactory bloomFilterFactory) Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data.- Parameters:
delegatePostingsFormat
- The PostingsFormat that records all the non-bloom filter data i.e. postings info.bloomFilterFactory
- TheBloomFilterFactory
responsible for sizing BloomFilters appropriately
-
BloomFilteringPostingsFormat
Creates Bloom filters for a selection of fields created in the index. This is recorded as a set of Bitsets held as a segment summary in an additional "blm" file. This PostingsFormat delegates to a choice of delegate PostingsFormat for encoding all other postings data. This choice of constructor defaults to theDefaultBloomFilterFactory
for configuring per-field BloomFilters.- Parameters:
delegatePostingsFormat
- The PostingsFormat that records all the non-bloom filter data i.e. postings info.
-
BloomFilteringPostingsFormat
public BloomFilteringPostingsFormat()
-
-
Method Details
-
fieldsConsumer
Description copied from class:PostingsFormat
Writes a new segment- Specified by:
fieldsConsumer
in classPostingsFormat
- Throws:
IOException
-
fieldsProducer
Description copied from class:PostingsFormat
Reads a segment. NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an IOException should be thrown by the implementation. IOExceptions are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.- Specified by:
fieldsProducer
in classPostingsFormat
- Throws:
IOException
-
toString
- Overrides:
toString
in classPostingsFormat
-