Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Previously much of time was spent writing to screen which is relatively slow. By enabling output buffering more work can be performed by writing groups of computed tokens to the buffer which is relatively fast, and then flushing the buffer periodically to screen/console. Testing with the smallest model, a interactive tokens/s speed up of ~14% on standard builds to ~84% on open-mp builds has been achieved. Usage: run <checkpoint_file> [temperature] [steps] [prompt] [buffer_tokens] Where buffer_tokens is the number of tokens to be buffered. Multiples of 2 seem to be ideal. 64 worked well for my use case on a low end machine. The speed up may depend on model size and OS. Example: ./run model.bin 0 0 "A car" 64
- Loading branch information