GitHub

Lucene MapPostingsFormat

This library implements a specialized PostingsFormat for Lucene indices that can improve query time performance given the following criteria:

The terms can be encoded as 64 bit numbers (i.e. as a long)
Any query made must specify exact terms - wild card searches are not supported
The term map data structure is tuned for performance, but can take a long time to initially build
By using this format within a PerFieldPostingsFormat, it can be applied only to specific fields of an index - and other fields can still use the default PostingsFormat

Example:

The examples directory contains a few simple indexing and searching examples to demonstrate usage. Instructions to run the examples can be found in BUILD.txt.

Usage:

MapPostingsFormat is an abstract class. To use the MapPostingsFormat, subclass it and implement the following method:

getFingerprint - returns a portion of a BytesRef term that uniquely identifies that term (in most situations, most bits of the long value end up being unnecessary)

see org.rsankar.lucenecodecs.field.IndexedLongField.BytesRefToLong(BytesRef term) to convert a BytesRef to long if using the IndexedLongField

Then, include the subclass in src/main/resources/META-INF/services/org.apache.lucene.codecs.PostingsFormat

Future work

Additional work that needs to be done includes:

Coming up with a better tuning algorithm to derive the best possible hash table size and hash function
Reducing space usage (while this library improves search performance, it does make the index larger in size)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
benchmark		benchmark
example		example
gradle/wrapper		gradle/wrapper
src		src
BUILD.txt		BUILD.txt
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SYSTEM_REQUIREMENTS.txt		SYSTEM_REQUIREMENTS.txt
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lucene MapPostingsFormat

Example:

Usage:

Future work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rishisankar/lucenecodecs

Folders and files

Latest commit

History

Repository files navigation

Lucene MapPostingsFormat

Example:

Usage:

Future work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages