Using the GPU is slower than the CPU

I installed llamacpp using the instructions below:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
the speed:
llama_print_timings:        eval time =      81.91 ms /     2 runs   (   40.95 ms per token,    1.02 tokens per second)

I installed llamacpp using the instructions below:
pip install llama-cpp-python
the speed:
llama_print_timings:        eval time =      81.91 ms /     2 runs   (   40.95 ms per token,    30.01 tokens per second)

My code is as follows:
result = self.model(
                prompt, # Prompt
                # max_tokens=nt, # Generate up to 32 tokens
                # stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
                # echo=True # Echo the prompt back in the output
            )

So how can i use GPU to speed up?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using the GPU is slower than the CPU #1123

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Using the GPU is slower than the CPU #1123

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions