Releases · ngxson/wllama

25 Jul 10:15

ngxson

2.3.4

c267097

2.3.4 Latest

Latest

What's Changed

Small fix for KV cache management, which cause some issues with hybrid and recurrence models

if KV rm fails, we should clear the whole cache by @ngxson in #188

Full Changelog: 2.3.3...2.3.4

Contributors

ngxson

Assets 2

14 Jul 17:31

ngxson

2.3.3

3262947

2.3.3

What's Changed

With latest sync from llama.cpp, new models are now supported, including Hugging Face SmolLM3 and LiquidAI LFM2

Strict tsconfig.json by @Jerboas86 in #178
Improve GGUF file validation to support URLs with query parameters by @felladrin in #177
fix ts strict mode by @ngxson in #186
sync with latest upstream llama.cpp by @ngxson in #187

New Contributors

@Jerboas86 made their first contribution in #178

Full Changelog: 2.3.2...2.3.3

Contributors

felladrin, ngxson, and Jerboas86

Assets 2

06 Jun 21:18

ngxson

2.3.2

367be2f

2.3.2

News

Important

🚀 This release marks a special event:

Firefox now official uses wllama as one of the inference engine in their Link Preview feature!

The Link Preview feature is currently available on Beta and Nightly build. You can find the upstream code here.

What's Changed

v2.3.2 (sync with upstream llama.cpp) by @ngxson in #179

Full Changelog: 2.3.1...2.3.2

Contributors

ngxson

Assets 2

18 Apr 08:24

ngxson

2.3.1

e4bd5e7

2.3.1

What's Changed

sync with upstream llama.cpp source code by @ngxson in #171

Full Changelog: 2.3.0...2.3.1

Contributors

ngxson

Assets 2

13 Mar 14:35

ngxson

2.3.0

96f5a11

2.3.0

What's Changed

You can now use the stream: true option to get an AsyncIterator:

const messages: WllamaChatMessage[] = [
  { role: 'system', content: 'You are helpful.' },
  { role: 'user', content: 'Hi!' },
  { role: 'assistant', content: 'Hello!' },
  { role: 'user', content: 'How are you?' },
];
const stream = await wllama.createChatCompletion(messages, {
  nPredict: 10,
  sampling: {
    temp: 0.0,
  },
  stream: true, // ADD THIS
});

for await (const chunk of stream) {
  console.log(chunk.currentText);
}

Additionally, you can also use AbortSignal to stop a generation mid-way, much like how it's used in fetch API. Here is an example:

const abortController = new AbortController();
const stream = await wllama.createChatCompletion(messages, {
  abortSignal: abortController.signal, // ADD THIS
  stream: true,
});

// call abortController.abort(); to abort it
// note: this can also be called during prompt processing

Gemma 3 support: With the up-to-date llama.cpp source code, you can now use Gemma 3 models!

build single-file mjs + minified version by @ngxson in #161
bump to latest upstream llama.cpp source code by @ngxson in #162
add support for async generator by @ngxson in #163
add "stream" option for AsyncIterator by @ngxson in #164
add test for abortSignal by @ngxson in #165
bump to latest upstream llama.cpp source code by @ngxson in #166

Full Changelog: 2.2.1...2.3.0

Contributors

ngxson

Assets 2

01 Mar 20:06

ngxson

2.2.1

675200b

2.2.1

What's Changed

update emsdk to 4.0.3 by @ngxson in #158
sync with latest upsteam source code by @ngxson in #159

Full Changelog: 2.2.0...2.2.1

Contributors

ngxson

Assets 2

08 Feb 23:21

ngxson

2.2.0

d72123c

2.2.0

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

BIG release is dropped! Biggest changes including:

x2 speed for Qx_K and Qx_0 quantization 🚀 ref this PR: ggml-org/llama.cpp#11453 (while it's not merged yet on upstream, I included it inside wllama as a patch) - IQx quants will still be slow, but upcoming work is already planned
Switched to binary protocol for the connection between JS <==> WASM. The json.hpp dependency is now gone! Calling wllama.tokenize() on a long text now faster than ever! 🎉

Debut at FOSDEM 2025

Last week, I gave a 15-minute talk at FOSDEM 2025 which, for the first time, introduces wllama to the real world!

Watch the talk here: https://fosdem.org/2025/schedule/event/fosdem-2025-5154-wllama-bringing-llama-cpp-to-the-web/

What's Changed

add benchmark function, used internally by @ngxson in #151
switch to binary protocol between JS and WASM world (glue.cpp) by @ngxson in #154
Remove json.hpp dependency by @ngxson in #155
temporary apply that viral x2 speedup PR by @ngxson in #156
Fix a bug with kv_remove, release v2.2.0 by @ngxson in #157

Full Changelog: 2.1.3...2.2.0

Contributors

ngxson

Assets 2

30 Jan 16:34

ngxson

2.1.4

e05af9e

2.1.4

Nothing new, but I messed up the version number, so have to push a new one to fix it.

Assets 2

22 Jan 14:39

ngxson

2.1.3

e05af9e

2.1.3

What's Changed

Sync with upsteam source code, add demo for DeepSeek-R1 by @ngxson in #150

Try it via the demo app: https://huggingface.co/spaces/ngxson/wllama

Full Changelog: 2.1.2...2.1.3

Contributors

ngxson

Assets 2

12 Jan 14:11

ngxson

2.1.2

30adc2a

2.1.2

What's Changed

sync with upstream llama.cpp source code by @ngxson in #147

Full Changelog: 2.1.1...2.1.2

Contributors

ngxson

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

News

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

Debut at FOSDEM 2025

What's Changed

Contributors

Uh oh!

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Releases: ngxson/wllama

2.3.4

What's Changed

Contributors

Uh oh!

2.3.3

What's Changed

New Contributors

Contributors

Uh oh!

2.3.2

News

What's Changed

Contributors

Uh oh!

2.3.1

What's Changed

Contributors

Uh oh!

2.3.0

What's Changed

Contributors

Uh oh!

2.2.1

What's Changed

Contributors

Uh oh!

2.2.0

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

Debut at FOSDEM 2025

What's Changed

Contributors

Uh oh!

2.1.4

Uh oh!

2.1.3

What's Changed

Contributors

Uh oh!

2.1.2

What's Changed

Contributors

Uh oh!