pageserver: improve L0 compaction performance #10694

erikgrinaker · 2025-02-06T14:05:46Z

L0 compaction is currently struggling to keep up with ingest workloads, causing high read amp. This is a blocker to remove L0 flush upload backpressure (l0_flush_wait_upload), and enable L0 compaction backpressure (#5415) and parallel S3 uploads (#10096).

Tasks

Give feedback

The text was updated successfully, but these errors were encountered:

## Problem L0 compaction can get starved by other background tasks. It needs to be responsive to avoid read amp blowing up during heavy write workloads. Touches #10694. ## Summary of changes Add a separate semaphore for compaction, configurable via `use_compaction_semaphore` (disabled by default). This is primarily for testing in staging; it needs further work (in particular to split image/L0 compaction jobs) before it can be enabled.

## Problem The compaction loop currently runs periodically, which can cause it to wait for up to 20 seconds before starting L0 compaction by default. Also, when we later separate the semaphores for L0 compaction and image compaction, we want to give up waiting for the image compaction semaphore if L0 compaction is needed on any timeline. Touches #10694. ## Summary of changes Notify the compaction loop when an L0 flush (on any timeline) exceeds `compaction_threshold`. Also do some opportunistic cleanups in the area.

## Problem Image compaction can starve out L0 compaction if a tenant has several timelines with L0 debt. Touches #10694. Requires #10740. ## Summary of changes * Add an initial L0 compaction pass, in order of L0 count. * Add a tenant option `compaction_l0_first` to control the L0 pass (disabled by default). * Add `CompactFlags::OnlyL0Compaction` to run an L0-only compaction pass. * Clean up the compaction iteration logic. A later PR will use separate semaphores for the L0 and image compaction passes to avoid cross-tenant L0 starvation. That PR will also make image compaction yield if _any_ of the tenant's timelines have pending L0 compaction to further avoid starvation.

## Problem When image compaction yields for L0 compaction, it may not immediately schedule L0 compaction, because it just goes on to compact the next pending timeline. Touches #10694. Requires #10744. ## Summary of changes Extend `CompactionOutcome` with `YieldForL0` and `Skipped` variants, and immediately schedule an L0 compaction pass in the `YieldForL0` case.

## Problem L0 compaction frequently gets starved out by other background tasks and image/GC compaction. L0 compaction must be responsive to keep read amplification under control. Touches #10694. Resolves #10689. ## Summary of changes Use a separate semaphore for the L0-only compaction pass. * Add a `CONCURRENT_L0_COMPACTION_TASKS` semaphore and `BackgroundLoopKind::L0Compaction`. * Add a setting `compaction_l0_semaphore` (default off via `compaction_l0_first`). * Use the L0 semaphore when doing an `OnlyL0Compaction` pass. * Use the background semaphore when doing a regular compaction pass (which includes an initial L0 pass). * While waiting for the background semaphore, yield for L0 compaction if triggered. * Add `CompactFlags::NoYield` to disable L0 yielding, and set it for the HTTP API route. * Remove the old `use_compaction_semaphore` setting and compaction-scoped semaphore. * Remove the warning when waiting for a semaphore; it's noisy and we have metrics.

erikgrinaker · 2025-02-12T16:20:50Z

We've implemented all planned improvements, but not enabled them by default yet. I'll keep this open to verify in staging and roll out in production.

erikgrinaker · 2025-02-19T11:39:37Z

The planned work here is mostly complete, production rollout is tracked in https://github.com/neondatabase/cloud/issues/24664.

erikgrinaker added a/performance Area: relates to performance of the system c/storage/pageserver Component: storage: pageserver labels Feb 6, 2025

erikgrinaker mentioned this issue Feb 6, 2025

Epic: ingestion performance phase 2 #10160

Open

erikgrinaker assigned erikgrinaker and skyzh Feb 6, 2025

erikgrinaker mentioned this issue Feb 7, 2025

pageserver: add separate, disabled compaction semaphore #10716

Merged

This was referenced Feb 10, 2025

pageserver: notify compaction loop at threshold #10740

Merged

pageserver: do all L0 compaction before image compaction #10744

Merged

erikgrinaker mentioned this issue Feb 11, 2025

pageserver: properly yield for L0 compaction #10769

Merged

erikgrinaker mentioned this issue Feb 12, 2025

pageserver: add separate semaphore for L0 compaction #10780

Merged

erikgrinaker closed this as completed Feb 19, 2025

skyzh mentioned this issue Feb 19, 2025

pageserver: image compaction take too long time #10076

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: improve L0 compaction performance #10694

pageserver: improve L0 compaction performance #10694

erikgrinaker commented Feb 6, 2025 •

edited

Loading

Tasks

erikgrinaker commented Feb 12, 2025 •

edited

Loading

erikgrinaker commented Feb 19, 2025

pageserver: improve L0 compaction performance #10694

pageserver: improve L0 compaction performance #10694

Comments

erikgrinaker commented Feb 6, 2025 • edited Loading

Tasks

erikgrinaker commented Feb 12, 2025 • edited Loading

erikgrinaker commented Feb 19, 2025

erikgrinaker commented Feb 6, 2025 •

edited

Loading

erikgrinaker commented Feb 12, 2025 •

edited

Loading