Skip to content

feat: add an in-place compress path#123

Open
a10y wants to merge 5 commits intodevelopfrom
aduffy/safe-compress
Open

feat: add an in-place compress path#123
a10y wants to merge 5 commits intodevelopfrom
aduffy/safe-compress

Conversation

@a10y
Copy link
Copy Markdown
Contributor

@a10y a10y commented Aug 25, 2025

Using the FSST library currently requires you to make two allocations:

  1. Allocate a Vec to compress_into
  2. Copy that buffer to some target buffer

This is kind of annoying. What I really want to do is pre-allocate a buffer large enough to hold all of the compressed data, then do something like

let mut buffer = Vec::with_capacity(...);

let mut ptr = 0;

for string in strings {
    let written = compressor.compress_into(string, buffer.spare_capacity_mut()[ptr..]);
    ptr += written;
}

This lets me compress a bunch of values directly into a single packed byte buffer, without an intermediate copy.

The Fix

We shouldn't take a &mut Vec<u8> directly, instead we should take &mut [MaybeUninit<u8>], which can be backed by Vec, Bytes, Buffer or whatever other memory allocation we happen to get our hands on.

This PR adds a new safe compression pathway that exposes compress_into_uninit and implements the hot loop using only safe code to boot.

I'm leaving the old unsafe compress_into here to allow existing projects to keep using that interface, but have updated the docs to indicate that the compress_into_uninit is the new preferred pathway.

Performance

Performance measured on M4 Max on the micro and compress benches seems to be more or less identical.

I have a long term goal of eliminating most of the unsafe code in this repo, see #87. This brings us one step closer to this.

@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Aug 25, 2025

Local benchmark run on this branch (M4 Max)

aduffy@Andrews-MacBook-Pro /V/C/fsst (aduffy/safe-compress)> cargo bench --bench compress
   Compiling fsst-rs v0.5.3 (/Volumes/Code/fsst)
    Finished `bench` profile [optimized] target(s) in 0.60s
     Running benches/compress.rs (target/release/deps/compress-1ee906f2016f72c9)
Gnuplot not found, using plotters backend
dbtext/wikipedia/train-and-compress
                        time:   [8.1022 ms 8.1163 ms 8.1311 ms]
                        change: [-1.5404% -1.2911% -1.0334%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
dbtext/wikipedia/compress-only
                        time:   [7.4230 ms 7.4498 ms 7.4850 ms]
                        thrpt:  [364.76 MiB/s 366.48 MiB/s 367.80 MiB/s]
                 change:
                        time:   [-4.2437% -3.8784% -3.4581%] (p = 0.00 < 0.05)
                        thrpt:  [+3.5819% +4.0348% +4.4318%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
dbtext/wikipedia/decompress
                        time:   [900.59 µs 904.12 µs 907.38 µs]
                        thrpt:  [2.9384 GiB/s 2.9490 GiB/s 2.9605 GiB/s]
                 change:
                        time:   [+1.8525% +2.8612% +3.8656%] (p = 0.00 < 0.05)
                        thrpt:  [-3.7218% -2.7816% -1.8188%]
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

compressed dbtext/wikipedia 2862830 => 1640581B (compression factor 1.75:1)
dbtext/l_comment/train-and-compress
                        time:   [6.1031 ms 6.1369 ms 6.1756 ms]
                        change: [-2.3402% -1.6404% -0.7872%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe
dbtext/l_comment/compress-only
                        time:   [5.4490 ms 5.4563 ms 5.4637 ms]
                        thrpt:  [479.30 MiB/s 479.94 MiB/s 480.59 MiB/s]
                 change:
                        time:   [-3.8527% -3.6184% -3.3808%] (p = 0.00 < 0.05)
                        thrpt:  [+3.4991% +3.7542% +4.0071%]
                        Performance has improved.
dbtext/l_comment/decompress
                        time:   [395.33 µs 398.69 µs 402.28 µs]
                        thrpt:  [6.3572 GiB/s 6.4144 GiB/s 6.4689 GiB/s]
                 change:
                        time:   [+2.7463% +4.6504% +6.6160%] (p = 0.00 < 0.05)
                        thrpt:  [-6.2055% -4.4437% -2.6729%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

compressed dbtext/l_comment 2745949 => 1018169B (compression factor 2.70:1)
dbtext/urls/train-and-compress
                        time:   [11.059 ms 11.080 ms 11.102 ms]
                        change: [-1.7492% -1.0009% -0.3482%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
dbtext/urls/compress-only
                        time:   [10.320 ms 10.350 ms 10.383 ms]
                        thrpt:  [581.23 MiB/s 583.04 MiB/s 584.77 MiB/s]
                 change:
                        time:   [-0.2435% +0.1008% +0.4676%] (p = 0.60 > 0.05)
                        thrpt:  [-0.4655% -0.1007% +0.2441%]
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking dbtext/urls/decompress: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
dbtext/urls/decompress  time:   [1.0093 ms 1.0159 ms 1.0238 ms]
                        thrpt:  [5.7565 GiB/s 5.8008 GiB/s 5.8389 GiB/s]
                 change:
                        time:   [-1.5426% -0.4588% +0.6629%] (p = 0.41 > 0.05)
                        thrpt:  [-0.6585% +0.4609% +1.5667%]
                        No change in performance detected.

compressed dbtext/urls 6327875 => 2856682B (compression factor 2.22:1)

@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Aug 25, 2025

Local benchmark run on develop (M4 Max)

aduffy@Andrews-MacBook-Pro /V/C/fsst_original (develop)> cargo bench --bench compress
    Finished `bench` profile [optimized] target(s) in 0.03s
     Running benches/compress.rs (target/release/deps/compress-1ee906f2016f72c9)
Gnuplot not found, using plotters backend
dbtext/wikipedia/train-and-compress
                        time:   [8.7453 ms 8.7678 ms 8.7905 ms]
                        change: [-2.1238% -1.8275% -1.5178%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
dbtext/wikipedia/compress-only
                        time:   [8.0861 ms 8.1511 ms 8.2198 ms]
                        thrpt:  [332.15 MiB/s 334.95 MiB/s 337.64 MiB/s]
                 change:
                        time:   [-0.0001% +0.8594% +1.7039%] (p = 0.06 > 0.05)
                        thrpt:  [-1.6754% -0.8521% +0.0001%]
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
dbtext/wikipedia/decompress
                        time:   [895.29 µs 901.23 µs 906.37 µs]
                        thrpt:  [2.9416 GiB/s 2.9584 GiB/s 2.9781 GiB/s]
                 change:
                        time:   [-0.5790% +0.3510% +1.2569%] (p = 0.46 > 0.05)
                        thrpt:  [-1.2413% -0.3497% +0.5824%]
                        No change in performance detected.

compressed dbtext/wikipedia 2862830 => 1640581B (compression factor 1.75:1)
dbtext/l_comment/train-and-compress
                        time:   [6.3088 ms 6.3298 ms 6.3530 ms]
                        change: [-0.8264% -0.4224% +0.0215%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
dbtext/l_comment/compress-only
                        time:   [5.6181 ms 5.6270 ms 5.6365 ms]
                        thrpt:  [464.61 MiB/s 465.39 MiB/s 466.12 MiB/s]
                 change:
                        time:   [-1.5536% -1.2958% -1.0225%] (p = 0.00 < 0.05)
                        thrpt:  [+1.0330% +1.3128% +1.5781%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
dbtext/l_comment/decompress
                        time:   [380.76 µs 383.70 µs 386.40 µs]
                        thrpt:  [6.6184 GiB/s 6.6650 GiB/s 6.7165 GiB/s]
                 change:
                        time:   [-6.0954% -4.8400% -3.6035%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7382% +5.0862% +6.4910%]
                        Performance has improved.

compressed dbtext/l_comment 2745949 => 1018169B (compression factor 2.70:1)
dbtext/urls/train-and-compress
                        time:   [10.948 ms 10.977 ms 11.013 ms]
                        change: [-4.3787% -3.3474% -2.3954%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
dbtext/urls/compress-only
                        time:   [10.259 ms 10.279 ms 10.303 ms]
                        thrpt:  [585.75 MiB/s 587.10 MiB/s 588.26 MiB/s]
                 change:
                        time:   [-4.0140% -3.3222% -2.6764%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7500% +3.4363% +4.1819%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking dbtext/urls/decompress: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
dbtext/urls/decompress  time:   [1.0173 ms 1.0233 ms 1.0302 ms]
                        thrpt:  [5.7205 GiB/s 5.7592 GiB/s 5.7932 GiB/s]
                 change:
                        time:   [-5.4298% -3.0801% -0.9196%] (p = 0.01 < 0.05)
                        thrpt:  [+0.9281% +3.1780% +5.7416%]
                        Change within noise threshold.

compressed dbtext/urls 6327875 => 2856682B (compression factor 2.22:1)

@a10y a10y requested a review from Copilot August 25, 2025 18:30
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Aug 25, 2025

Merging this PR will degrade performance by 16.92%

⚡ 2 improved benchmarks
❌ 10 regressed benchmarks
✅ 10 untouched benchmarks
⏩ 20 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation compress-only 12.8 ms 14.9 ms -14.36%
Simulation compress 6.6 ms 7.7 ms -14.17%
Simulation compress-only 31.8 ms 37.6 ms -15.34%
Simulation train-and-compress 15.9 ms 18.1 ms -11.89%
Simulation compress 3.9 ms 4.4 ms -11.62%
Simulation compress 2.2 ms 2.4 ms -10.48%
Simulation compress-hashtab 303.6 ns 274.4 ns +10.63%
Simulation train-and-compress 22.2 ms 25.6 ms -13.29%
Simulation train-and-compress 34.7 ms 40.5 ms -14.23%
Simulation compress-twobytes 238.6 ns 209.4 ns +13.93%
Simulation compress 11.4 ms 13.8 ms -16.92%
Simulation compress-only 18.2 ms 21.6 ms -15.76%

Comparing aduffy/safe-compress (6d19571) with develop (6303f97)

Open in CodSpeed

Footnotes

  1. 20 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new safe compression pathway that avoids intermediate allocations by working directly with uninitialized memory. The change allows users to pre-allocate a buffer and compress data directly into it without requiring two separate allocations and a copy operation.

  • Adds compress_into_uninit method that takes &mut [MaybeUninit<u8>] instead of &mut Vec<u8>
  • Implements a safe version of the compression hot loop using compress_word_safe
  • Updates benchmarks to use the new safe API while maintaining performance

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/lib.rs Adds new safe compression methods and updates existing compress method to use the new pathway
benches/micro.rs Updates micro benchmarks to use the new safe compression API
benches/compress.rs Updates compression benchmarks to use the new safe API

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@a10y
Copy link
Copy Markdown
Contributor Author

a10y commented Aug 25, 2025

Wow codspeed did not like that

@spiraldb spiraldb deleted a comment from Copilot AI Aug 25, 2025
@spiraldb spiraldb deleted a comment from Copilot AI Aug 25, 2025
a10y added 4 commits March 24, 2026 11:14
Using the FSST library currently requires you to make two allocations

1. Allocate a buffer to compress_into
2. Allocate a larger packed buffer

We shouldn't take a &mut Vec<u8> directly, since Vec isn't
splittable. We should instead be taking &mut [MaybeUninit<u8>], which
can be backed by Vec, Bytes, Buffer<u8> or whatever other memory
allocation we happen to get our hands on.

This PR adds a new safe compression pathway that exposes
`compress_into_uninit` and implements the hot loop using only safe code
to boot.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
This reverts commit 01ddd72.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/safe-compress branch from 5f10dd4 to 12c05ff Compare March 24, 2026 15:14
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/safe-compress branch from 12c05ff to 6d19571 Compare March 24, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants