Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# Changelog

## v0.8.28

### Changed

- **llama.cpp submodule** — Updated from 845282461 to dec5ca557 (24 commits, tag b9763). No NIF changes were required. `ggml/include/ggml.h`, `ggml/include/ggml-backend.h`, `common/chat.h`, `common/json-schema-to-grammar.h`, `common/sampling.h`, `common/speculative.h`, and `common/common.h` are all unchanged. The only touched header the binding compiles against is `include/llama.h`, and the sole functional change there is one new accessor — `llama_model_n_layer_nextn` (the number of next-token / MTP prediction layers) — added alongside the existing `llama_model_n_layer`; the rest of the diff is whitespace realignment. The binding does not call the new function. Notably, this range refactors the grammar generators that the binding's `json_schema_to_grammar_nif` links against (the `common/peg` AC parser and the JSON-schema-to-GBNF spacing rules below) even though `common/json-schema-to-grammar.h` itself is unchanged, so both the JSON-schema and raw-GBNF smoke paths were exercised against the freshly built NIF. The full test suite passes (158 tests + 4 skipped), all 7 end-to-end smoke tests pass (generation, streaming, chat templates, JSON-schema grammar, raw GBNF, and embeddings — the embedding paths exercised with a Qwen3-Embedding model), formatting is clean, and Dialyzer reports 0 errors.
- **model/quantization**: use `LLM_KV` for `quantization_version` & `file_type` (#24802).
- **common/grammar**: implement an AC parser for stricter grammar generation (#24869); refactor the until→GBNF grammar generation (#24839); align `json-schema-to-grammar` spacing rules with the parsers (#24835) — these underlie the binding's structured-output / GBNF path.
- **sampling**: remove the unconditional softmax+sort in the top-n-sigma sampler (#22645).
- **jinja** (template engine used by chat templates): implement the `call` statement (#24847).
- **MTP/speculative**: support Step3.5/3.7 flash mtp3 (#24340).
- **mtmd**: fix `mtmd_get_memory_usage` (#24867); add a load-progress callback (#24865).
- **server**: add an `id` to tool-call responses API (#24882); move model downloading to a dedicated process in the router (#24834); refactor/generalize the input-file schema (#24299); fix an `edit_file` crash on append at end of file (`line_start` -1) (#24893); report progress for loading spec models and add a "stages" list (#24870); refactor batch construction (#24843); add a "verbose" field to the schema (#24864); real-time model load-progress tracking via `/models/sse` (#24828).
- **SYCL**: support bf16 on the `bin_bcast` op and unary ops (#24838).
- **hexagon**: use a padded stride for ssm-conv weights (#24470).
- **webui**: prioritize favorite models in model selection (#24766); show model status and load progress via the `/models/sse` feed (#24878).
- **common/cli/build/docs**: stabilize the randomly-failing `test-args-parser` (#24826); add the `libandroid-spawn` dependency for Termux builds (#21812); whitespace clean-up (#24862).

## v0.8.27

### Changed
Expand Down
2 changes: 1 addition & 1 deletion mix.exs
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ end
defmodule LlamaCppEx.MixProject do
use Mix.Project

@version "0.8.27"
@version "0.8.28"
@source_url "https://github.com/nyo16/llama_cpp_ex"

def project do
Expand Down
2 changes: 1 addition & 1 deletion vendor/llama.cpp
Submodule llama.cpp updated 64 files
+13 −7 common/arg.cpp
+5 −1 common/arg.h
+5 −4 common/chat-auto-parser-generator.cpp
+90 −47 common/jinja/runtime.cpp
+1 −0 common/jinja/runtime.h
+23 −23 common/json-schema-to-grammar.cpp
+194 −81 common/peg-parser.cpp
+16 −3 common/peg-parser.h
+102 −35 common/speculative.cpp
+1 −1 docs/android.md
+21 −21 examples/json_schema_to_grammar.py
+10 −9 ggml/src/ggml-hexagon/htp/ssm-conv.c
+5 −0 ggml/src/ggml-sycl/binbcast.cpp
+155 −53 ggml/src/ggml-sycl/element_wise.cpp
+9 −8 include/llama.h
+8 −0 src/llama-context.cpp
+1 −0 src/llama-context.h
+2 −0 src/llama-cparams.h
+5 −0 src/llama-ext.h
+9 −2 src/llama-graph.h
+4 −0 src/llama-model.cpp
+2 −2 src/llama-quant.cpp
+0 −2 src/llama-sampler.cpp
+28 −29 src/models/step35.cpp
+148 −1 tests/peg-parser/test-gbnf-generation.cpp
+11 −1 tests/test-arg-parser.cpp
+2 −2 tests/test-chat.cpp
+26 −0 tests/test-jinja.cpp
+155 −155 tests/test-json-schema-to-grammar.cpp
+2 −2 tests/test-sampling.cpp
+59 −24 tools/mtmd/clip.cpp
+2 −0 tools/mtmd/clip.h
+8 −1 tools/mtmd/mtmd.cpp
+8 −0 tools/mtmd/mtmd.h
+3 −3 tools/server/README-dev.md
+42 −7 tools/server/README.md
+37 −25 tools/server/server-common.cpp
+637 −320 tools/server/server-context.cpp
+3 −1 tools/server/server-context.h
+245 −131 tools/server/server-models.cpp
+22 −6 tools/server/server-models.h
+3 −0 tools/server/server-schema.cpp
+6 −3 tools/server/server-task.cpp
+10 −5 tools/server/server-tools.cpp
+12 −1 tools/server/server.cpp
+20 −0 tools/server/tests/unit/test_chat_completion.py
+28 −7 tools/server/tests/unit/test_router.py
+10 −0 tools/ui/src/app.d.ts
+12 −3 tools/ui/src/lib/components/app/chat/ChatMessages/ChatMessage/ChatMessageAssistant/ChatMessageAssistant.svelte
+18 −1 tools/ui/src/lib/components/app/models/ModelsSelectorOption.svelte
+2 −1 tools/ui/src/lib/constants/api-endpoints.ts
+2 −0 tools/ui/src/lib/constants/index.ts
+14 −0 tools/ui/src/lib/constants/model-loading.ts
+16 −0 tools/ui/src/lib/constants/sse.ts
+1 −1 tools/ui/src/lib/enums/index.ts
+14 −0 tools/ui/src/lib/enums/server.enums.ts
+9 −7 tools/ui/src/lib/services/chat.service.ts
+245 −39 tools/ui/src/lib/stores/models.svelte.ts
+47 −1 tools/ui/src/lib/types/api.d.ts
+10 −1 tools/ui/src/lib/types/index.ts
+12 −1 tools/ui/src/lib/types/models.d.ts
+3 −0 tools/ui/src/lib/utils/index.ts
+43 −0 tools/ui/src/lib/utils/progress.ts
+14 −0 tools/ui/src/routes/+layout.svelte