convert: (demo) repacking compressed_tensor format of kimi-k2 #17083

ngxson · 2025-11-07T15:36:23Z

This PR is a demo. It will definitely break models other than kimi-k2

IMPORTANT: This requires deleting the "quantization_config" section in config.json; You can also rename it:

How it works: we map int4 --> GGML's Q4_0; the original scale is BF16, and will be converted to F16 (as Q4_0 only support F16)

TODO: correct the nibble layout, seems to be reversed order

ngxson added 6 commits November 6, 2025 22:30

convert: add dequant function for compressed_tensor (kimi-k2-thinking)

1bd57a3

rm redundant code

ab0b550

fix lazy loading

ed7b7c7

fix device error

489a7b8

DEMO repack

caf0e42

fix number of blocks

505f8be

github-actions bot added the python python script changes label Nov 7, 2025

Provide feedback