Skip to content

sort:Optimize sort collation for long lines#12144

Open
mattsu2020 wants to merge 2 commits intouutils:mainfrom
mattsu2020:fix_sort_performance
Open

sort:Optimize sort collation for long lines#12144
mattsu2020 wants to merge 2 commits intouutils:mainfrom
mattsu2020:fix_sort_performance

Conversation

@mattsu2020
Copy link
Copy Markdown
Contributor

What changed

  • Avoid precomputing ICU collation sort keys for lines larger than 1 MiB.
  • Store optional collation key ranges so very long lines can fall back to lazy locale comparison during sorting.

Why

Fixes #12138. In UTF-8 locales, sort precomputed ICU collation keys for every input line. For inputs with a small number of very large lines, such as 26 lines of 200 MiB each, the cost of generating and storing multi-GiB collation keys dominated runtime.

Impact

Small and normal-sized lines keep the existing precomputed-key fast path. Very long lines skip the expensive key materialization and use locale_cmp when compared.

Validation

  • cargo check -p uu_sort
  • cargo test -p uu_sort
  • cargo test -p coreutils --test tests test_sort::test_default_unsorted_ints -- --exact
  • Compared output against GNU sort with cmp for 52 MiB and 130 MiB reproducer inputs.
  • Hyperfine on the issue-sized 5.1 GiB input with LC_ALL=en_US.UTF-8 --parallel 1 --buffer-size 8G:
    • uutils release: 5.054 s
    • GNU gsort 9.11: 33.685 s

@mattsu2020 mattsu2020 changed the title [codex] Optimize sort collation for long lines sort:Optimize sort collation for long lines May 4, 2026
@mattsu2020 mattsu2020 marked this pull request as ready for review May 4, 2026 13:00
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 4, 2026

GNU testsuite comparison:

Skip an intermittent issue tests/date/resolution (fails in this run but passes in the 'main' branch)
Note: The gnu test tests/basenc/bounded-memory is now being skipped but was previously passing.
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 4, 2026

Merging this PR will degrade performance by 23.24%

❌ 3 regressed benchmarks
✅ 308 untouched benchmarks
⏩ 46 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Memory sort_german_de_locale 3.3 MB 4.3 MB -23.24%
Simulation sort_key_field[500000] 767.8 ms 804.6 ms -4.57%
Simulation sort_ascii_utf8_locale 15.4 ms 16.2 ms -4.83%

Comparing mattsu2020:fix_sort_performance (23e4bb3) with main (c23dc67)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 4, 2026

Out of interest, why choose 1 MiB as the limit, rather than something lower like u16::MAX?

@mattsu2020
Copy link
Copy Markdown
Contributor Author

Out of interest, why choose 1 MiB as the limit, rather than something lower like u16::MAX?

Since measurements using 64 KiB showed performance that was at least equivalent for the issue workload, we will change the threshold to u16::MAX.

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 4, 2026

@mattsu2020 Could you also add a benchmark (in separate PR)?

@mattsu2020
Copy link
Copy Markdown
Contributor Author

@mattsu2020 Could you also add a benchmark (in separate PR)?

Sure, I’ll keep this PR focused on the fix and open a separate PR adding a benchmark for long-line locale collation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Case where GNU sort is 40 times faster than uutils

2 participants