llama2.rs

This is a one-file Rust implementation of Llama2 that works pretty well. It's Rust port of Karpathy's llama2.c

To build:

> cargo build --release

To run (follow instructions to get llama2_7b.bin.)

> target/release/llama2_rs llama2_7b.bin 0.0 11 "The only thing"
The only thing that is certain in life is change.
achieved tok/s: 1.0298662

It actually seems like it is pretty fast! On my computer this is the speed and output of running the original llama2.c

> ./run llama2_7b.bin 0.0 11 "The only thing"
The only thing that is certain in life is change.
achieved tok/s: 0.139889

How does it work?

This is basically a port of the original code, with extra type information to make it easier to extend.

There are two dependencies:

memmap2for memory mapping
rayon for parallel computation.

Todo:

- Generic over floating point size
- Faster matrix multiplications
- More safety, remove some of the C hacks.

Why?

Mostly this was an exercise in learning some Rust. Was curious how you port over things like memory mapping, parallel processing, and some of the mathematical tricks.

This is my first Rust project, so if you are an expert I would love a code review!

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.cargo		.cargo
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
tokenizer.bin		tokenizer.bin

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llama2.rs

See Also

How does it work?

Why?

About

Releases

Packages

Languages

JustinDaleGray/llama2.rs

Folders and files

Latest commit

History

Repository files navigation

llama2.rs

See Also

How does it work?

Why?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages