Skip to content

[8.19] Conditionally force sequential reading in LuceneSyntheticSourceChangesSnapshot (#128473) #128505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

martijnvg
Copy link
Member

Backports the following commits to 8.19:

…sSnapshot (elastic#128473)

Change LuceneSyntheticSourceChangesSnapshot to force sequential stored field reading when index.code is best_compression.

In CCR benchmarks I see that relatively often we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps. However most docids are monotonically increasing, so not sequential reading harms performance. The reason that currently we're not loading sequentially is because of the logic in `StoredFieldLoader#hasSequentialDocs(...)`, which requires all requested docids to be in monotonically order (no gaps allowed). In the case of `LuceneSyntheticSourceChangesSnapshot` with stored field best compression that is too conservative. In practice, we end decompressing stored field blocks for each docid we need to synthetisize source for recovery.

I think it makes sense to do sequential reading in this case, given that it is very likely that many of the requested doc id ranges will contain monotonically increasing ranges. Note that the requested docids will always sort in ascending order (this happens in `LuceneSyntheticSourceChangesSnapshot#transformScoreDocsToRecords(...)`.
@martijnvg martijnvg added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. :StorageEngine/Logs You know, for Logs >enhancement auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport Team:Distributed Indexing Meta label for Distributed Indexing team Team:StorageEngine labels May 27, 2025
@elasticsearchmachine elasticsearchmachine merged commit 4db0f33 into elastic:8.19 May 27, 2025
15 checks passed
@martijnvg martijnvg deleted the backport/8.19/pr-128473 branch May 27, 2025 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement :StorageEngine/Logs You know, for Logs Team:Distributed Indexing Meta label for Distributed Indexing team Team:StorageEngine v8.19.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants