Releases · NeoZhangJianyu/llama.cpp

20 Oct 03:28

cda0e4b

b3943 Latest

Latest

llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745)

* refactor llama_batch_get_one

* adapt all examples

* fix simple.cpp

* fix llama_bench

* fix

* fix context shifting

* free batch before return

* use common_batch_add, reuse llama_batch in loop

* null terminated seq_id list

* fix save-load-state example

* fix perplexity

* correct token pos in llama_batch_allocr

Assets 22

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-10-20T03:28:32Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-10-20T03:28:42Z
llama-b1-bin-win-hip-x64-gfx1030.zip

236 MB 2024-10-20T03:28:54Z
llama-b1-bin-win-hip-x64-gfx1100.zip

238 MB 2024-10-20T03:29:03Z
llama-b1-bin-win-hip-x64-gfx1101.zip

237 MB 2024-10-20T03:29:11Z
llama-b3943-bin-macos-arm64.zip

52.1 MB 2024-10-20T03:29:18Z
llama-b3943-bin-macos-x64.zip

53 MB 2024-10-20T03:29:21Z
llama-b3943-bin-ubuntu-x64.zip

58.7 MB 2024-10-20T03:29:23Z
llama-b3943-bin-win-avx-x64.zip

7.81 MB 2024-10-20T03:29:25Z
llama-b3943-bin-win-avx2-x64.zip

7.81 MB 2024-10-20T03:29:26Z
Source code (zip)

2024-10-18T21:18:01Z
Source code (tar.gz)

2024-10-18T21:18:01Z

18 Oct 14:25

github-actions

b3942

afd9909

b3942

rpc : backend refactoring (#9912)

* rpc : refactor backend

Use structs for RPC request/response messages

* rpc : refactor server

Assets 22

28 Sep 12:28

github-actions

b3831

89f9944

b3831

Enable use to the rebar feature to upload buffers to the device. (#9251)

Assets 22

27 Sep 02:57

github-actions

b3828

95bc82f

b3828

[SYCL] add missed dll file in package (#9577)

* update oneapi to 2024.2

* use 2024.1

---------

Co-authored-by: arthw <[email protected]>

Assets 22

21 Sep 08:19

github-actions

update_oneapi-b3789-3ae8374

3ae8374

update_oneapi-b3789-3ae8374

use 2024.1

Assets 19

20 Sep 04:18

github-actions

update_oneapi-b3788-f557ccf

f557ccf

update_oneapi-b3788-f557ccf

update oneapi to 2024.2

Assets 19

20 Sep 04:05

github-actions

b3787

6026da5

b3787

server : clean-up completed tasks from waiting list (#9531)

ggml-ci

Assets 19

12 Sep 03:42

github-actions

b3735

df4b794

b3735

cann: Fix error when running a non-exist op (#9424)

Assets 19

07 Sep 07:35

github-actions

b3678

9b2c24c

b3678

server : simplify state machine for slot (#9283)

* server : simplify state machine for slot

* add SLOT_STATE_DONE_PROMPT

* pop_deferred_task

* add missing notify_one

* fix passkey test

* metrics : add n_busy_slots_per_decode

* fix test step

* add test

* maybe fix AddressSanitizer?

* fix deque ?

* missing lock

* pop_deferred_task: also notify

* Update examples/server/server.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 19

24 Jul 02:48

github-actions

b3449

de28008

b3449

examples : Fix `llama-export-lora` example (#8607)

* fix export-lora example

* add more logging

* reject merging subset

* better check

* typo

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: NeoZhangJianyu/llama.cpp

b3943

b3942

b3831

b3828

update_oneapi-b3789-3ae8374

update_oneapi-b3788-f557ccf

b3787

b3735

b3678

b3449