Open
Description
I installed llamacpp using the instructions below:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
the speed:
llama_print_timings: eval time = 81.91 ms / 2 runs ( 40.95 ms per token, 1.02 tokens per second)
I installed llamacpp using the instructions below:
pip install llama-cpp-python
the speed:
llama_print_timings: eval time = 81.91 ms / 2 runs ( 40.95 ms per token, 30.01 tokens per second)
My code is as follows:
result = self.model(
prompt, # Prompt
# max_tokens=nt, # Generate up to 32 tokens
# stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
# echo=True # Echo the prompt back in the output
)
So how can i use GPU to speed up?