Skip to content

Conversation

@abhijat
Copy link
Contributor

@abhijat abhijat commented Dec 2, 2025

Adds support for defragmenting tag indices. The following objects support defragmentation:

  • SortedVector: if the memory address is in underutilized page, entries are reallocated and moved
  • BlockList: reallocate the vector of Containers if necessary, defragment containers if they support it
  • TagIndex: iterates over rax tree for entries_ and suffix_trie_, accesses values (which should be BlockList) and defragments them. If operation runs out of quota, store the key so that it can be used to resume defragmentation later
  • FieldIndices: defragment indices_ and sort_indices_, storing keys if quota runs out.
  • ShardDocIndices: defragment all indices, store key if quota runs out.

The quota is specified in terms of microseconds the entire operation can run for. The page usage object now stores the cycles at creation time. The quota is also stored, converted to cycles. The quota-depleted check is defined on this object, and checks whether more quota cycles have passed since object creation.

Each defragment operation first checks if quota has run out, if so it returns early. Some objects keep track of state so that if quota runs out, the next defragment starts from where the previous attempt left off.

The accuracy of defragmenting an object only once per cycle when working with FieldIndices or higher level is not guaranteed. For example where an object contains a flat hash map, if the defragmentation resumes from a key and new elements were added in the interim, the order of iteration may change, and an object which was defragmented before may be visited again at the expense of another object which was never defragmented.

Since this will happen only when new indices are added or fields are added to existing ones, this discrepancy should be acceptable.

The tag index will undergo frequent mutation, meaning that the rax tree lookup might fail when looking for the stored key to begin defragmentation. In this case lower_bound is used to start defragmentation from the nearest larger key, because the previous blocklists have a high chance of already being defragmented.

No state is maintained in sorted vector or blocklists, we also do not check for quota after moving each element of the sorted vector. This is because it is not possible to partially move a vector, it has to be a single operation.

The defragment process is also not hooked up to the defrag task yet to keep the PR small, this will be done in the next PR where ShardDocIndices::Defragment can be called with a runtime based quota.

@abhijat abhijat changed the title feat(core): Defragment search indices feat(search): Defragment search indices Dec 2, 2025
@abhijat abhijat force-pushed the abhijat/feat/defrag-search-indices branch 5 times, most recently from 7665aaa to f72b1e3 Compare December 2, 2025 09:33
@abhijat

This comment was marked as resolved.

@abhijat abhijat force-pushed the abhijat/feat/defrag-search-indices branch 18 times, most recently from f5e8eb5 to f50f2ac Compare December 4, 2025 09:22
@abhijat
Copy link
Contributor Author

abhijat commented Dec 4, 2025

augment review


template <typename T> struct CanDefragment<T, std::void_t<CanDefragmentT<T>>> : std::true_type {};

template <typename T> inline constexpr bool CanDefragmentV = CanDefragment<T>::value;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be much simpler with concepts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option would be to just use is_same_v and check for SortedVector

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why at all? You can have an empty Defragment() inside CompressedSortedSet and implement it in your next PR. It's just a wrapper around vector<>, there's nothing complicated inside the compressed set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the idea was to introduce things gradually but I wasn't sure if the types that blocklist will be used with is fixed or can be extended in future, I think this got a little too complicated though, I made it simpler with overloaded methods instead of this code.

Now I see it is only used with CompressedSortedSet in addition to sorted vector. Maybe I will add a stub like you mentioned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to add a stub method to compressed sorted set

@abhijat abhijat requested a review from dranikpg December 4, 2025 11:14
@abhijat abhijat force-pushed the abhijat/feat/defrag-search-indices branch 5 times, most recently from c3a1962 to 836d336 Compare December 8, 2025 06:03
@abhijat
Copy link
Contributor Author

abhijat commented Dec 8, 2025

augment review

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. No suggestions at this time.

Comment augment review to trigger a new review at any time.

@abhijat abhijat force-pushed the abhijat/feat/defrag-search-indices branch 3 times, most recently from 0356eee to c84c13f Compare December 8, 2025 06:16
Since the page usage object manages traversal during a defrag operation,
a quota depletion check is added to it, where it compares its start time
plus the quota against _now_, and stops if _now_ is in the past.

Signed-off-by: Abhijat Malviya <[email protected]>
A couple of utility functions are added and search core lib linking is
changed to prepare for defragment related code in upcoming changes.

Signed-off-by: Abhijat Malviya <[email protected]>
@abhijat abhijat force-pushed the abhijat/feat/defrag-search-indices branch from c84c13f to 72c64aa Compare December 8, 2025 06:26
Defragments a map like data structure, optionally dereferencing values
before calling defragment method on them

Signed-off-by: Abhijat Malviya <[email protected]>
Signed-off-by: Abhijat Malviya <[email protected]>
@abhijat abhijat force-pushed the abhijat/feat/defrag-search-indices branch from 72c64aa to d171cb8 Compare December 8, 2025 06:32
@abhijat
Copy link
Contributor Author

abhijat commented Dec 8, 2025

I think this is in a good place for a review now, some things might have to be handled in a follow up pr

@abhijat abhijat requested review from dranikpg and romange December 8, 2025 07:50
@abhijat
Copy link
Contributor Author

abhijat commented Dec 8, 2025

It can be split this into smaller PRs if it helps review, eg

  • adding quota to page usage
  • adding defrag support to block list, sorted vector
  • everything in the middle

@dranikpg
Copy link
Contributor

dranikpg commented Dec 8, 2025

It can be split this into smaller PRs if it helps review, eg

  • adding quota to page usage
  • adding defrag support to block list, sorted vector
  • everything in the middle

It would've been easier if pushed the new changes as separate commits

Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏻 if you follow up later on the small stuff

@abhijat abhijat merged commit f31f3c9 into main Dec 8, 2025
10 checks passed
@abhijat abhijat deleted the abhijat/feat/defrag-search-indices branch December 8, 2025 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants