You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think it's neat we have multiple levels of memory size and features. I'm going to sketch out what we have now with some speculative extensions. The more memory you give, the more features and performance becomes available. The total overhead of this index is still pretty low, though I'm not sure "with fast text access" grows it beyond the size other non-succinct indexes can offer.
Smallest
RLFMIndex - smallest but slowest
FMndex
With locate()
RLFMIndex with quantized suffix array
FMIndex with quantized suffix array
With multi text support (text ids, \0 string)
We add the "Doc array". This supports multiple separate texts to be searched separated by \0.
This can be used with or without locate() support. This allows starts-with and the like, as well as text ids.
This supports a usage scenario where if you have an immutable data structure you can have all its strings backed by
fm-index and still support search.
This feature can be type-driven: we cannot construct multiple texts (and reject a text with a \0 in the middle?) without enabling this feature first. The APIs surrounding this can all be behind a special type just like locate is.
With fast text access
We store the original text along with the index (and suffix array). This takes more memory, but reconstruction of strings is much faster. This use case is also described by the XPath paper. Can be used with or without the doc array.
The APIs don't need any changing; existing APIs for searching text and multi text just become faster as they use the real string as a backend.
The text was updated successfully, but these errors were encountered:
I think it's neat we have multiple levels of memory size and features. I'm going to sketch out what we have now with some speculative extensions. The more memory you give, the more features and performance becomes available. The total overhead of this index is still pretty low, though I'm not sure "with fast text access" grows it beyond the size other non-succinct indexes can offer.
Smallest
With locate()
With multi text support (text ids, \0 string)
We add the "Doc array". This supports multiple separate texts to be searched separated by \0.
This can be used with or without locate() support. This allows starts-with and the like, as well as text ids.
This supports a usage scenario where if you have an immutable data structure you can have all its strings backed by
fm-index and still support search.
This feature can be type-driven: we cannot construct multiple texts (and reject a text with a \0 in the middle?) without enabling this feature first. The APIs surrounding this can all be behind a special type just like locate is.
With fast text access
We store the original text along with the index (and suffix array). This takes more memory, but reconstruction of strings is much faster. This use case is also described by the XPath paper. Can be used with or without the doc array.
The APIs don't need any changing; existing APIs for searching text and multi text just become faster as they use the real string as a backend.
The text was updated successfully, but these errors were encountered: