feat(pageserver): use vectored_get in collect_keyspace #10546

skyzh · 2025-01-28T19:33:26Z

Problem

This is the first patch for optimizing the L0 compaction. As we know from the previous incident, the most time-consuming thing in the L0 compaction code is at repartition before the compaction actually starts. This patch is a very safe that doesn't change any behavior except using vectored get. We refactor the collect_keyspace code to use vectored get so that it's faster to retrieve all the data even if we get hundreds of the L0 piled up.

Next patch: repartition should read from the boundary of L0 and L1 instead of latest data to further accelerate.

Summary of changes

Use vectored get in collect_keyspace.

Signed-off-by: Alex Chi Z <[email protected]>

github-actions · 2025-01-28T21:18:11Z

7414 tests run: 7062 passed, 0 failed, 352 skipped (full report)

Flaky tests (7)

Postgres 17

test_pgdata_import_smoke[None-1024-RelBlockSize.MULTIPLE_RELATION_SEGMENTS]: debug-x86-64-without-lfc, release-arm64-without-lfc, release-arm64-with-lfc
test_pgdata_import_smoke[8-1024-RelBlockSize.MULTIPLE_RELATION_SEGMENTS]: debug-x86-64-without-lfc, release-arm64-with-lfc, release-arm64-without-lfc

Postgres 14

test_pageserver_gc_compaction_idempotent[after_restart]: release-arm64-with-lfc

Code coverage* (full report)

functions: 33.5% (8506 of 25423 functions)
lines: 49.2% (71544 of 145450 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
3fb6e25 at 2025-01-28T21:18:10.839Z :recycle:}

problame

The IoConcurrency is too-short-lived, better use IoConcurrency::sequential() for now.

Also, this PR only touches the innermost loops. That's probably decent bang for buck but given the amount of times we call this function in production, we could put in a liiitle more effort.

At a mimum, what about list_rels? Can we process all dbs in parallel?

I'm thinking a bigger rewrite of this function that makes it a pipeline could be a good compromise, like so: in all the places where we currently call get(), we'd instead submit the key into into an utils::sync::spsc_fold that folds the KeySpaceRandomAccum.
There's one other task / concurrent future that reads from the spsc_fold and executes it using get_vectored.

Somehow we'd need to provide results for not-inner-most-loop reads. Maybe using oneshot channels registered in a HashMap or so?

problame · 2025-01-28T22:27:14Z

pageserver/src/pgdatadir_mapping.rs

+            // Skip the vectored-read max key check by using `get_vectored_impl`.
+            let io_concurrency = IoConcurrency::spawn_from_conf(
+                self.conf,
+                self.gate
+                    .enter()
+                    .map_err(|_| CollectKeySpaceError::Cancelled)?,
+            );


I think IoConcurrency::sequential() mode is more appropriate here, for now.
This io_concurrency that you're creating here (= tokio task you'd be spawning here if concurrent IO is enabled) is too short-lived.

problame · 2025-01-28T22:31:17Z

pageserver/src/pgdatadir_mapping.rs

-                let mut buf = self.get(relsize_key, lsn, ctx).await?;
+                relsize_keys_to_collect.add_key(relsize_key);
+            }
+            // Skip the vectored-read max key check by using `get_vectored_impl`.


We have that check for a reason: to limit memory consumption.

+1. Consider adding a Timeline helper that takes an IntoIterator<Key>, chunks it according to MAX_GET_VECTORED_KEYS, and returns a key/value iterator.

erikgrinaker

Change LGTM overall.

erikgrinaker · 2025-01-29T09:59:27Z

pageserver/src/pgdatadir_mapping.rs

-                let mut buf = self.get(relsize_key, lsn, ctx).await?;
+                relsize_keys_to_collect.add_key(relsize_key);
+            }
+            // Skip the vectored-read max key check by using `get_vectored_impl`.


+1. Consider adding a Timeline helper that takes an IntoIterator<Key>, chunks it according to MAX_GET_VECTORED_KEYS, and returns a key/value iterator.

feat(pageserver): use vectored_get in collect_keyspace

Loading
Loading status checks…

3fb6e25

Signed-off-by: Alex Chi Z <[email protected]>

skyzh requested a review from erikgrinaker January 28, 2025 19:33

skyzh requested a review from a team as a code owner January 28, 2025 19:33

skyzh marked this pull request as draft January 28, 2025 19:33

skyzh marked this pull request as ready for review January 28, 2025 22:21

problame reviewed Jan 28, 2025

View reviewed changes

erikgrinaker reviewed Jan 29, 2025

View reviewed changes

skyzh self-assigned this Feb 3, 2025

erikgrinaker mentioned this pull request Feb 6, 2025

pageserver: improve L0 compaction performance #10694

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pageserver): use vectored_get in collect_keyspace #10546

feat(pageserver): use vectored_get in collect_keyspace #10546

skyzh commented Jan 28, 2025

github-actions bot commented Jan 28, 2025

Postgres 17

Postgres 14

problame left a comment

problame Jan 28, 2025

problame Jan 28, 2025

erikgrinaker Jan 29, 2025

erikgrinaker left a comment

erikgrinaker Jan 29, 2025

feat(pageserver): use vectored_get in collect_keyspace #10546

Are you sure you want to change the base?

feat(pageserver): use vectored_get in collect_keyspace #10546

Conversation

skyzh commented Jan 28, 2025

Problem

Summary of changes

github-actions bot commented Jan 28, 2025

7414 tests run: 7062 passed, 0 failed, 352 skipped (full report)

Postgres 17

Postgres 14

Code coverage* (full report)

problame left a comment

Choose a reason for hiding this comment

problame Jan 28, 2025

Choose a reason for hiding this comment

problame Jan 28, 2025

Choose a reason for hiding this comment

erikgrinaker Jan 29, 2025

Choose a reason for hiding this comment

erikgrinaker left a comment

Choose a reason for hiding this comment

erikgrinaker Jan 29, 2025

Choose a reason for hiding this comment