Memory vs performance (and features) #43

faassen · 2025-01-28T14:06:27Z

I think it's neat we have multiple levels of memory size and features. I'm going to sketch out what we have now with some speculative extensions. The more memory you give, the more features and performance becomes available. The total overhead of this index is still pretty low, though I'm not sure "with fast text access" grows it beyond the size other non-succinct indexes can offer.

Smallest

RLFMIndex - smallest but slowest
FMndex

With locate()

RLFMIndex with quantized suffix array
FMIndex with quantized suffix array

With multi text support (text ids, \0 string)

We add the "Doc array". This supports multiple separate texts to be searched separated by \0.
This can be used with or without locate() support. This allows starts-with and the like, as well as text ids.

This supports a usage scenario where if you have an immutable data structure you can have all its strings backed by
fm-index and still support search.

This feature can be type-driven: we cannot construct multiple texts (and reject a text with a \0 in the middle?) without enabling this feature first. The APIs surrounding this can all be behind a special type just like locate is.

With fast text access

We store the original text along with the index (and suffix array). This takes more memory, but reconstruction of strings is much faster. This use case is also described by the XPath paper. Can be used with or without the doc array.

The APIs don't need any changing; existing APIs for searching text and multi text just become faster as they use the real string as a backend.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory vs performance (and features) #43

Memory vs performance (and features) #43

faassen commented Jan 28, 2025

Memory vs performance (and features) #43

Memory vs performance (and features) #43

Comments

faassen commented Jan 28, 2025

Smallest

With locate()

With multi text support (text ids, \0 string)

With fast text access