Skip to content

Latest commit

 

History

History
52 lines (40 loc) · 1.54 KB

README.md

File metadata and controls

52 lines (40 loc) · 1.54 KB

llama2.d

Cute Llama

This is the D version of llama2.c by Andrej Karpathy. It runs inference for the llama2 model architecture recently published by Meta.

Initial code was generated by ctod tool and saved as ctod_initial.d

Some small manual adjustments:

  • added cast(float*) to calloc and mmap
  • because of lack clock_gettime on Darwin OS, it was changed with MonoTime from core.time
  • commented out pragmas for OpenMP

To build inference:

dub build -b=release

To run example:

./llama2_d stories15M.bin -i "your_prompt"

Supported platforms

Tested on:

  • macOS (M1)
  • Linux
  • Windows

Todo

  • Make code more iDiomatic
  • Improve performance
  • Add Windows support (port win.h/win.c files from original repo)
  • Parallelize the code with std.parallel and SIMD

Contributing

Any form of contribution is welcome. Feel free to open an issue or create a pull request. If you are contributing optimizations, please provide benchmarks and/or performance comparisons as well as the code to reproduce them.

Credits