Skip to content

Using the GPU is slower than the CPU #1123

Open
@AreckOVO

Description

@AreckOVO

I installed llamacpp using the instructions below:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
the speed:
llama_print_timings: eval time = 81.91 ms / 2 runs ( 40.95 ms per token, 1.02 tokens per second)

I installed llamacpp using the instructions below:
pip install llama-cpp-python
the speed:
llama_print_timings: eval time = 81.91 ms / 2 runs ( 40.95 ms per token, 30.01 tokens per second)

My code is as follows:
result = self.model(
prompt, # Prompt
# max_tokens=nt, # Generate up to 32 tokens
# stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
# echo=True # Echo the prompt back in the output
)

So how can i use GPU to speed up?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions