Skip to content

Releases: ngxson/wllama

2.3.4

25 Jul 10:15
c267097
Compare
Choose a tag to compare

What's Changed

Small fix for KV cache management, which cause some issues with hybrid and recurrence models

  • if KV rm fails, we should clear the whole cache by @ngxson in #188

Full Changelog: 2.3.3...2.3.4

2.3.3

14 Jul 17:31
3262947
Compare
Choose a tag to compare

What's Changed

With latest sync from llama.cpp, new models are now supported, including Hugging Face SmolLM3 and LiquidAI LFM2

New Contributors

Full Changelog: 2.3.2...2.3.3

2.3.2

06 Jun 21:18
367be2f
Compare
Choose a tag to compare

News

Important

🚀 This release marks a special event:

Firefox now official uses wllama as one of the inference engine in their Link Preview feature!

The Link Preview feature is currently available on Beta and Nightly build. You can find the upstream code here.

Read more in this blog: https://blog.mozilla.org/en/mozilla/ai/ai-tech/ai-link-previews-firefox/

image

What's Changed

  • v2.3.2 (sync with upstream llama.cpp) by @ngxson in #179

Full Changelog: 2.3.1...2.3.2

2.3.1

18 Apr 08:24
e4bd5e7
Compare
Choose a tag to compare

What's Changed

  • sync with upstream llama.cpp source code by @ngxson in #171

Full Changelog: 2.3.0...2.3.1

2.3.0

13 Mar 14:35
Compare
Choose a tag to compare

What's Changed

You can now use the stream: true option to get an AsyncIterator:

const messages: WllamaChatMessage[] = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hi!' },
  { role: 'assistant', content: 'Hello!' },
  { role: 'user', content: 'How are you?' },
];
const stream = await wllama.createChatCompletion(messages, {
  nPredict: 10,
  sampling: {
    temp: 0.0,
  },
  stream: true, // ADD THIS
});

for await (const chunk of stream) {
  console.log(chunk.currentText);
}

Additionally, you can also use AbortSignal to stop a generation mid-way, much like how it's used in fetch API. Here is an example:

const abortController = new AbortController();
const stream = await wllama.createChatCompletion(messages, {
  abortSignal: abortController.signal, // ADD THIS
  stream: true,
});

// call abortController.abort(); to abort it
// note: this can also be called during prompt processing

Gemma 3 support: With the up-to-date llama.cpp source code, you can now use Gemma 3 models!


  • build single-file mjs + minified version by @ngxson in #161
  • bump to latest upstream llama.cpp source code by @ngxson in #162
  • add support for async generator by @ngxson in #163
  • add "stream" option for AsyncIterator by @ngxson in #164
  • add test for abortSignal by @ngxson in #165
  • bump to latest upstream llama.cpp source code by @ngxson in #166

Full Changelog: 2.2.1...2.3.0

2.2.1

01 Mar 20:06
Compare
Choose a tag to compare

What's Changed

Full Changelog: 2.2.0...2.2.1

2.2.0

08 Feb 23:21
d72123c
Compare
Choose a tag to compare

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

BIG release is dropped! Biggest changes including:

  • x2 speed for Qx_K and Qx_0 quantization 🚀 ref this PR: ggml-org/llama.cpp#11453 (while it's not merged yet on upstream, I included it inside wllama as a patch) - IQx quants will still be slow, but upcoming work is already planned
  • Switched to binary protocol for the connection between JS <==> WASM. The json.hpp dependency is now gone! Calling wllama.tokenize() on a long text now faster than ever! 🎉

Debut at FOSDEM 2025

Last week, I gave a 15-minute talk at FOSDEM 2025 which, for the first time, introduces wllama to the real world!

Watch the talk here: https://fosdem.org/2025/schedule/event/fosdem-2025-5154-wllama-bringing-llama-cpp-to-the-web/

image

What's Changed

  • add benchmark function, used internally by @ngxson in #151
  • switch to binary protocol between JS and WASM world (glue.cpp) by @ngxson in #154
  • Remove json.hpp dependency by @ngxson in #155
  • temporary apply that viral x2 speedup PR by @ngxson in #156
  • Fix a bug with kv_remove, release v2.2.0 by @ngxson in #157

Full Changelog: 2.1.3...2.2.0

2.1.4

30 Jan 16:34
e05af9e
Compare
Choose a tag to compare

Nothing new, but I messed up the version number, so have to push a new one to fix it.

2.1.3

22 Jan 14:39
e05af9e
Compare
Choose a tag to compare

What's Changed

  • Sync with upsteam source code, add demo for DeepSeek-R1 by @ngxson in #150

Try it via the demo app: https://huggingface.co/spaces/ngxson/wllama

image

Full Changelog: 2.1.2...2.1.3

2.1.2

12 Jan 14:11
30adc2a
Compare
Choose a tag to compare

What's Changed

  • sync with upstream llama.cpp source code by @ngxson in #147

Full Changelog: 2.1.1...2.1.2