-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiple texts #33
base: master
Are you sure you want to change the base?
multiple texts #33
Conversation
@@ -24,6 +25,53 @@ pub trait SearchIndex: BackwardIterableIndex { | |||
{ | |||
Search::new(self).search(pattern) | |||
} | |||
|
|||
// If we created a HasDoc trait (or something better named) for those |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be also an option that we implement new APIs on a new trait/struct such as MultiTextSearchIndex
that wraps a BackwardIterableIndex
instance.
- We can avoid adding
Doc
data to every FM-Index variant likeFMIndex
andRLFMIndex
. - We can consider different APIs for single-text search index
SearchIndex
and multi-texts search indexMultiTextSearchIndex
. For instance,locate
query in the former trait may just return pattern positions, whereas the latter may return positions withTextId
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it would be nice to have a search index API that doesn't allow any \0 (or at least one at the end), and a multi index that accepts \0 (or a builder pattern), that offers the extended APIs.
So let's explore this. I'm a still not entirely sure how to implement doc so I hope to get this in a shape so you can help me implement it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I'll also try to implement the multi-text search based on the APIs you outlined.
As discussed on #24, right now FM-Index doesn't properly support multiple 0s in the text.
It would be nice to support this. This PR is a draft for what the API to support this feature could look like:
TextBuilder
making it easy to construct zero-separated texts.*a
LocationInfo
struct that lets you obtain the text id for a location, as well as the text belonging to a 0 separated locationa
text
method to get the original text belonging to such a zero separated location (both on the index as well as on theLocationInfo
various new search methods that are boundary aware. Each of them should forbid the use of \0 in the query itself (maybe this should be forbidden in general; that depends on how well it works):
contains (which is basically the same as
search
)starts_with
ends_with
exact
maybe in the future lexicographical queries too.
We could gate all these methods with a
HasDoc
trait so that they don't exist if you don't construct an index with a doc. That would allow us to implement this functionality onFMIndex
only (or first) without worrying about RFLMIndex yet.