-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: try upgrade regex-automata #3575
Conversation
Signed-off-by: tison <[email protected]>
Seems some internal impl issues exist; the following cases failed:
|
src/index/src/inverted_index/search/fst_apply/intersection_apply.rs
Outdated
Show resolved
Hide resolved
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3575 +/- ##
==========================================
- Coverage 85.41% 84.88% -0.53%
==========================================
Files 911 917 +6
Lines 152425 152890 +465
==========================================
- Hits 130195 129784 -411
- Misses 22230 23106 +876 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
@zhongzc It just compiled and passed test. I suggest you do some end to end test for where you need it in the index functionality .. Also, to glue the regex-automata and fst together, "extra" checks are involved. I don't test the performance after this change. |
I'm not currently motivated to add automated testing to check the performance of fst matching regex statements. In my manual end-to-end bench, it is not a critical impact point.
Once we have e2e testing one day, I will add enough test cases for index |
@zhongzc Thank you! This makes sense to me. Could you add some other reviews to this PR? I don't know who is familiar with this logic. |
I feel like it’s more correct to write it like this, impl fst::Automaton for DfaFstAutomaton {
type State = StateID;
#[inline]
fn start(&self) -> Self::State {
let config = Config::new();
self.0.start_state(&config).unwrap()
}
#[inline]
fn is_match(&self, state: &Self::State) -> bool {
self.0.is_match_state(*state)
}
#[inline]
fn can_match(&self, state: &Self::State) -> bool {
!self.0.is_dead_state(*state)
}
#[inline]
fn accept_eof(&self, state: &StateID) -> Option<StateID> {
if self.0.is_match_state(*state) {
return Some(*state);
}
Some(self.0.next_eoi_state(*state))
}
#[inline]
fn accept(&self, state: &Self::State, byte: u8) -> Self::State {
if self.0.is_match_state(*state) {
return *state;
}
self.0.next_state(*state, byte)
}
} |
@zhongzc Cool. Let me try it out. |
Signed-off-by: tison <[email protected]>
cbb5854
to
d46b413
Compare
It works. Burnt gives a more concrete implementation in https://github.com/BurntSushi/aho-corasick/blob/56256dca1bcd2365fd1dc987c1c06195429a2e2c/src/transducer.rs, which handles Anchored and Unanchored separately. In our case, it should be always unanchored, meaning we don't ensure the patter match the whole string, but part of the string is OK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This refers to #3043.
Checklist