Skip to content

sycl: refactor quantization to q8_1 #14815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Alcpz
Copy link
Collaborator

@Alcpz Alcpz commented Jul 22, 2025

The current implementation of how some mul_mats do 8-bit quantization is not very flexible. While exploring other possibilities for a different gemv kernel, I run into the necessity of having a q8_1 tensor in a slightly different format, and that wasn't supported with the current convert_src1_to_q8_1 bool.

The PR refactors quantization kernels to a separate header and:

  • Unifies kernel submission to use sycl::nd_item<1>
  • Rewrites the quantize_q8_1 to have the same structure as the reorder q8_1 kernel
  • Adds exception handling that was ignored in the original code introduced with SYCLomatic.

Performance is unaffected.

Pinging @AD2605 as author of the reorder q8_1 kernel.

@Alcpz Alcpz requested review from s-Nick and Rbiessy July 22, 2025 12:25
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant