Releases: EAddario/llama.cpp
Releases · EAddario/llama.cpp
b5478
`server`: streaming of tool calls and thoughts when `--jinja` is on (…
b5476
releases : enable openmp in windows cpu backend build (#13756)
b5373
scripts : fix compare-llama-bench.py show parameter (#13514)
b5343
docs : Fix typo in InternVL3 model name (#13440)
b5269
llama : move end-user examples to tools directory (#13249) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b5215
model : Nomic Embed Text V2 with Mixture-of-Experts (MoE) architectur…
b5200
llama-bench : Add `--override-tensors` arg (#12922) * Add --override-tensors option to llama-bench * Correct llama-bench --override-tensors to --override-tensor * llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix. * Make new llama-bench util functions static to fix Ubuntu CI * llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)
b5191
llama : fix K-shift with quantized K and BLAS backend (#13113)
b5156
clip : refactor, add `image_manipulation` and `llava_uhd` classes (#1…
b5146
llama : recognize IBM Granite 3.3 FIM tokens (#12988) The Granite's FIM tokens are very similar to Qwen's; it's just that they use underscore instead of a dash. So <fim_middle> for example instead of <fim-middle>. Opening up tokenizer_config.json in ibm-granite/granite-3.3-8b-base shows: ``` "<fim_prefix>", "<fim_middle>", "<fim_suffix>", "<fim_pad>", ... "<reponame>", ```