Skip to content

Conditionally force sequential reading in LuceneSyntheticSourceChangesSnapshot #128473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented May 26, 2025

Change LuceneSyntheticSourceChangesSnapshot to force sequential stored field reading when index.code is best_compression.

In CCR benchmarks I see that relatively often we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps. However most docids are monotonically increasing, so not sequential reading harms performance. The reason that currently we're not loading sequentially is because of the logic in StoredFieldLoader#hasSequentialDocs(...), which requires all requested docids to be in monotonically order (no gaps allowed). In the case of LuceneSyntheticSourceChangesSnapshot with stored field best compression that is too conservative.

For example: [34, 35, 36, 37, 38, 39, 40, 313, 314, 315, 316, 317, 595, 596, 597, 598, 599, 600, 601, 898, 899, 900, 901, 902, 903]

Or: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532]
(gap is after 471)

In these cases we decompress an entire stored field block for each docid. This then results in the following cpu flame graph:

image

Full flame graph: baseline3_cpu_profile_10.html.zip

I think it makes sense to do sequential reading in this case, given that it is very likely that many of the requested doc id ranges will contain monotonically increasing ranges. Note that the requested docids will always sort in ascending order (this happens in LuceneSyntheticSourceChangesSnapshot#transformScoreDocsToRecords(...).

…ng when index.code is best_compression.

In CCR benchmarks I see that sometimes we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps.

For example: [34, 35, 36, 37, 38, 39, 40, 313, 314, 315, 316, 317, 595, 596, 597, 598, 599, 600, 601, 898, 899, 900, 901, 902, 903]

Or: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532]

I think it makes sense to do sequential reading in these cases, given that many of the docids are consecutive ranges.
@martijnvg
Copy link
Member Author

martijnvg commented May 27, 2025

Benchmarking this change using the elastic/logs track with ccr enabled shows a good improvement with ccr: https://esbench-metrics.kb.us-east-2.aws.elastic-cloud.com:9243/app/dashboards#/view/6c61280a-bd58-4c3a-8591-f56ec221c4f2?_g=h@c67153a
Left side are baseline graphs, right side contender graphs (this pr). Total time spent on reading is significantly lower with this change.

@martijnvg martijnvg added :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. :StorageEngine/Logs You know, for Logs >enhancement labels May 27, 2025
@martijnvg martijnvg marked this pull request as ready for review May 27, 2025 08:56
@elasticsearchmachine elasticsearchmachine added Team:StorageEngine Team:Distributed Indexing Meta label for Distributed Indexing team labels May 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@tlrx tlrx self-requested a review May 27, 2025 09:05
Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@martijnvg
Copy link
Member Author

The Elasticsearch Serverless Checks failure is related to elastic/elasticsearch-serverless#3910.

@martijnvg martijnvg added the auto-backport Automatically create backport pull requests when merged label May 27, 2025
@martijnvg martijnvg merged commit 6a4a285 into elastic:main May 27, 2025
17 of 18 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request May 27, 2025
…sSnapshot (elastic#128473)

Change LuceneSyntheticSourceChangesSnapshot to force sequential stored field reading when index.code is best_compression.

In CCR benchmarks I see that relatively often we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps. However most docids are monotonically increasing, so not sequential reading harms performance. The reason that currently we're not loading sequentially is because of the logic in `StoredFieldLoader#hasSequentialDocs(...)`, which requires all requested docids to be in monotonically order (no gaps allowed). In the case of `LuceneSyntheticSourceChangesSnapshot` with stored field best compression that is too conservative. In practice, we end decompressing stored field blocks for each docid we need to synthetisize source for recovery.

I think it makes sense to do sequential reading in this case, given that it is very likely that many of the requested doc id ranges will contain monotonically increasing ranges. Note that the requested docids will always sort in ascending order (this happens in `LuceneSyntheticSourceChangesSnapshot#transformScoreDocsToRecords(...)`.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19

elasticsearchmachine pushed a commit that referenced this pull request May 27, 2025
…sSnapshot (#128473) (#128505)

Change LuceneSyntheticSourceChangesSnapshot to force sequential stored field reading when index.code is best_compression.

In CCR benchmarks I see that relatively often we spend a lot of time compressing the same stored field block over and over again when the doc ids are not dense. It is likely when a seqno range is requested that the corresponding doc id list contains gaps. However most docids are monotonically increasing, so not sequential reading harms performance. The reason that currently we're not loading sequentially is because of the logic in `StoredFieldLoader#hasSequentialDocs(...)`, which requires all requested docids to be in monotonically order (no gaps allowed). In the case of `LuceneSyntheticSourceChangesSnapshot` with stored field best compression that is too conservative. In practice, we end decompressing stored field blocks for each docid we need to synthetisize source for recovery.

I think it makes sense to do sequential reading in this case, given that it is very likely that many of the requested doc id ranges will contain monotonically increasing ranges. Note that the requested docids will always sort in ascending order (this happens in `LuceneSyntheticSourceChangesSnapshot#transformScoreDocsToRecords(...)`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged :Distributed Indexing/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement :StorageEngine/Logs You know, for Logs Team:Distributed Indexing Meta label for Distributed Indexing team Team:StorageEngine v8.19.0 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants