Releases: arthw/llama.cpp
Releases · arthw/llama.cpp
b3555
b3554
ggml-backend : fix async copy from CPU (#8897) * ggml-backend : fix async copy from CPU * cuda : more reliable async copy, fix stream used when the devices are the same
b3517
[SYCL] Fixing wrong VDR iq4nl value (#8812)
b3482
Merge pull request #2 from arthw/refactor_dev Refactor device management and usage api
b3475
llama : add support for llama 3.1 rope scaling factors (#8676) * Add llama 3.1 rope scaling factors to llama conversion and inference This commit generates the rope factors on conversion and adds them to the resulting model as a tensor. At inference time, these factors are passed to the `ggml_rope_ext` rope oepration, improving results for context windows above 8192 * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> * address comments * address comments * Update src/llama.cpp Co-authored-by: compilade <[email protected]> * Update convert_hf_to_gguf.py Co-authored-by: compilade <[email protected]> --------- Co-authored-by: compilade <[email protected]>
b3388
fix UT of concat
b3387
mv softmax to separated file
b3313
fix for multiple cards
b3312
Merge pull request #1 from arthw/update_warp [SYCL] Fix WARP_SIZE=16 bug of Intel GPU (#8266) cherry-pick b549a1bbefb2f1fbb8b558bac1f2ae7967e60964
b3309
py : switch to snake_case (#8305) * py : switch to snake_case ggml-ci * cont ggml-ci * cont ggml-ci * cont : fix link * gguf-py : use snake_case in scripts entrypoint export * py : rename requirements for convert_legacy_llama.py Needed for scripts/check-requirements.sh --------- Co-authored-by: Francis Couture-Harpin <[email protected]>