sycl: refactor quantization to q8_1 #14815

Alcpz · 2025-07-22T12:25:05Z

The current implementation of how some mul_mats do 8-bit quantization is not very flexible. While exploring other possibilities for a different gemv kernel, I run into the necessity of having a q8_1 tensor in a slightly different format, and that wasn't supported with the current convert_src1_to_q8_1 bool.

The PR refactors quantization kernels to a separate header and:

Unifies kernel submission to use sycl::nd_item<1>
Rewrites the quantize_q8_1 to have the same structure as the reorder q8_1 kernel
Adds exception handling that was ignored in the original code introduced with SYCLomatic.

Performance is unaffected.

Pinging @AD2605 as author of the reorder q8_1 kernel.

sycl: quantization to q8_1 refactor

eda44a4

Alcpz requested review from s-Nick and Rbiessy July 22, 2025 12:25

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sycl: refactor quantization to q8_1 #14815

sycl: refactor quantization to q8_1 #14815

Alcpz commented Jul 22, 2025

Uh oh!

Uh oh!

sycl: refactor quantization to q8_1 #14815

Are you sure you want to change the base?

sycl: refactor quantization to q8_1 #14815

Conversation

Alcpz commented Jul 22, 2025

Uh oh!

Uh oh!