llama2.d

This is the D version of llama2.c by Andrej Karpathy. It runs inference for the llama2 model architecture recently published by Meta.

Initial code was generated by ctod tool and saved as ctod_initial.d

Some small manual adjustments:

added cast(float*) to calloc and mmap
because of lack clock_gettime on Darwin OS, it was changed with MonoTime from core.time
commented out pragmas for OpenMP

To build inference:

dub build -b=release

To run example:

./llama2_d stories15M.bin -i "your_prompt"

Supported platforms

Tested on:

macOS (M1)
Linux
Windows

Todo

Make code more iDiomatic
Improve performance
Add Windows support (port win.h/win.c files from original repo)
Parallelize the code with std.parallel and SIMD

Contributing

Any form of contribution is welcome. Feel free to open an issue or create a pull request. If you are contributing optimizations, please provide benchmarks and/or performance comparisons as well as the code to reproduce them.

Credits

Andrej Karpathy for the original llama2.c implementation
Dennis Korpel for great ctod tool
cgbur for ideas for optimizations and readme structure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

llama2.d

Supported platforms

Todo

Contributing

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

llama2.d

Supported platforms

Todo

Contributing

Credits