Skip to content

Conversation

@a10y
Copy link
Contributor

@a10y a10y commented Jan 12, 2026

Fleshes out the BufferHandle type more.

I hid the inner enum and introduced some extra methods to build/unwrap BufferHandles.

This removes the PrimitiveArray::as_slice method, replacing it with an into_buffer::<T> method which may allocate and copy a new buffer (if the handle points to device memory), or else provides a new Buffer pointing to the existing host memory.

This PR is large but a lot of it is just adding &'s and updating test code to use the buffer! macro

@a10y a10y force-pushed the bufferhandles branch 3 times, most recently from 67fabe3 to 5d5797b Compare January 12, 2026 23:26
@a10y a10y added the chore Release label indicating a trivial change label Jan 12, 2026
@a10y a10y marked this pull request as ready for review January 12, 2026 23:32
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 12, 2026

Merging this PR will degrade performance by 66.29%

❌ 42 regressed benchmarks
✅ 389 untouched benchmarks
⏩ 823 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
decompress[u8, (1000, 256)] 8.6 µs 9.7 µs -11.18%
decompress[u16, (1000, 256)] 9.9 µs 11 µs -10.63%
decompress_alp[f32, (1000, 0.0, 0.95)] 9.3 µs 13.3 µs -29.78%
decompress_alp[f32, (1000, 0.0, 1.0)] 9.2 µs 13.1 µs -30.1%
decompress_alp[f32, (1000, 0.01, 0.95)] 12.5 µs 17.5 µs -28.18%
decompress_alp[f32, (10000, 0.01, 0.95)] 32.4 µs 88.4 µs -63.39%
decompress_alp[f32, (1000, 0.01, 0.25)] 12.7 µs 18.1 µs -29.72%
decompress_alp[f32, (1000, 0.01, 1.0)] 12.4 µs 17.9 µs -31.05%
decompress_alp[f32, (10000, 0.1, 0.25)] 35.1 µs 90.6 µs -61.27%
decompress_alp[f32, (1000, 0.1, 1.0)] 13.4 µs 19.1 µs -30.18%
decompress_alp[f32, (10000, 0.0, 0.95)] 27.9 µs 82.7 µs -66.23%
decompress_alp[f32, (1000, 0.1, 0.95)] 14 µs 18.8 µs -25.45%
decompress_alp[f32, (1000, 0.1, 0.25)] 12.6 µs 17.8 µs -29.27%
decompress_alp[f32, (10000, 0.0, 1.0)] 27.8 µs 82.4 µs -66.27%
decompress_alp[f32, (10000, 0.0, 0.25)] 27.9 µs 82.8 µs -66.24%
decompress_alp[f32, (10000, 0.1, 0.95)] 42.2 µs 97.7 µs -56.83%
decompress_alp[f32, (10000, 0.01, 0.25)] 31.7 µs 87.8 µs -63.89%
decompress_alp[f64, (10000, 0.01, 0.95)] 59.3 µs 167.8 µs -64.66%
decompress_alp[f32, (10000, 0.01, 1.0)] 32.9 µs 88.6 µs -62.83%
decompress_alp[f64, (10000, 0.0, 1.0)] 54.6 µs 161.9 µs -66.29%
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing bufferhandles (50da3c2) with develop (2b2ad23)

Open in CodSpeed

Footnotes

  1. 823 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@joseph-isaacs
Copy link
Contributor

I wonder if we can remove at lot of these buffer calls from tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Release label indicating a trivial change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants