Web-LLM

This is an implementation of https://github.com/karpathy/llama2.c based on the excellent https://github.com/cryscan/web-rwkv project in pure Rust and WebGPU.

It is currently very slow and inefficient and is mainly a learning project and demonstration of capability.

How to use

Export a model using export.py from the https://github.com/karpathy/llama2.c repository. The .pt (checkpoint) files are available from here: https://huggingface.co/karpathy/tinyllamas.

mkdir -p models/stories15M
python3 export.py --version -1 --dtype fp32 --checkpoint stories15M.pt models/stories15M

Convert the huggingface pytorch_model.bin to safetensors:

python3 convert_safetensors.py --input models/stories15M/pytorch_model.bin --config models/stories15M/config.json --output models/stories15M/model.safetensors

Run the model:

cargo run --release --example llama models/stories15M/model.safetensors

Credits

Based on the https://github.com/cryscan/web-rwkv and uses their design.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
crates/web-rwkv-derive		crates/web-rwkv-derive
examples		examples
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
convert_safetensors.py		convert_safetensors.py
tokenizer.model		tokenizer.model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-LLM

How to use

Credits

About

Releases

Packages

Languages

License

seddonm1/web-llm

Folders and files

Latest commit

History

Repository files navigation

Web-LLM

How to use

Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages