-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(search): Defragment search indices #6144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7665aaa to
f72b1e3
Compare
This comment was marked as resolved.
This comment was marked as resolved.
f5e8eb5 to
f50f2ac
Compare
|
augment review |
src/core/search/block_list.cc
Outdated
|
|
||
| template <typename T> struct CanDefragment<T, std::void_t<CanDefragmentT<T>>> : std::true_type {}; | ||
|
|
||
| template <typename T> inline constexpr bool CanDefragmentV = CanDefragment<T>::value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be much simpler with concepts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option would be to just use is_same_v and check for SortedVector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why at all? You can have an empty Defragment() inside CompressedSortedSet and implement it in your next PR. It's just a wrapper around vector<>, there's nothing complicated inside the compressed set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the idea was to introduce things gradually but I wasn't sure if the types that blocklist will be used with is fixed or can be extended in future, I think this got a little too complicated though, I made it simpler with overloaded methods instead of this code.
Now I see it is only used with CompressedSortedSet in addition to sorted vector. Maybe I will add a stub like you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to add a stub method to compressed sorted set
c3a1962 to
836d336
Compare
|
augment review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review completed. No suggestions at this time.
Comment augment review to trigger a new review at any time.
0356eee to
c84c13f
Compare
Since the page usage object manages traversal during a defrag operation, a quota depletion check is added to it, where it compares its start time plus the quota against _now_, and stops if _now_ is in the past. Signed-off-by: Abhijat Malviya <[email protected]>
A couple of utility functions are added and search core lib linking is changed to prepare for defragment related code in upcoming changes. Signed-off-by: Abhijat Malviya <[email protected]>
c84c13f to
72c64aa
Compare
Signed-off-by: Abhijat Malviya <[email protected]>
Defragments a map like data structure, optionally dereferencing values before calling defragment method on them Signed-off-by: Abhijat Malviya <[email protected]>
Signed-off-by: Abhijat Malviya <[email protected]>
Signed-off-by: Abhijat Malviya <[email protected]>
Signed-off-by: Abhijat Malviya <[email protected]>
Signed-off-by: Abhijat Malviya <[email protected]>
Signed-off-by: Abhijat Malviya <[email protected]>
Signed-off-by: Abhijat Malviya <[email protected]>
72c64aa to
d171cb8
Compare
|
I think this is in a good place for a review now, some things might have to be handled in a follow up pr |
|
It can be split this into smaller PRs if it helps review, eg
|
It would've been easier if pushed the new changes as separate commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍🏻 if you follow up later on the small stuff
Adds support for defragmenting tag indices. The following objects support defragmentation:
Containers if necessary, defragment containers if they support itentries_andsuffix_trie_, accesses values (which should beBlockList) and defragments them. If operation runs out of quota, store the key so that it can be used to resume defragmentation laterindices_andsort_indices_, storing keys if quota runs out.The quota is specified in terms of microseconds the entire operation can run for. The page usage object now stores the cycles at creation time. The quota is also stored, converted to cycles. The quota-depleted check is defined on this object, and checks whether more quota cycles have passed since object creation.
Each defragment operation first checks if quota has run out, if so it returns early. Some objects keep track of state so that if quota runs out, the next defragment starts from where the previous attempt left off.
The accuracy of defragmenting an object only once per cycle when working with FieldIndices or higher level is not guaranteed. For example where an object contains a flat hash map, if the defragmentation resumes from a key and new elements were added in the interim, the order of iteration may change, and an object which was defragmented before may be visited again at the expense of another object which was never defragmented.
Since this will happen only when new indices are added or fields are added to existing ones, this discrepancy should be acceptable.
The tag index will undergo frequent mutation, meaning that the rax tree lookup might fail when looking for the stored key to begin defragmentation. In this case
lower_boundis used to start defragmentation from the nearest larger key, because the previous blocklists have a high chance of already being defragmented.No state is maintained in sorted vector or blocklists, we also do not check for quota after moving each element of the sorted vector. This is because it is not possible to partially move a vector, it has to be a single operation.
The defragment process is also not hooked up to the defrag task yet to keep the PR small, this will be done in the next PR where
ShardDocIndices::Defragmentcan be called with a runtime based quota.