-
-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CuBLAS gives the same output every time #265
Comments
I have the save issue. llama.cpp works fine but gollama.cpp with CuBLAS did not. |
Same generation output every time (presumably despite different prompts - you aren't absolutely clear on that) suggests that both the prompt being given via go-llama is being dropped and replaced with some placeholder and probably the temperature for running the prompt is being set to zero.. On the positive side the fact that any response is coming out of it suggests that the execution path is hooked up.. ..you just need to find where (and why) in the code execution path specifically for cuBLAS there is a static prompt template, with temp zero. The template prompt will relate to the repeated output you keep seeing. Sorry, I'm not involved in maintaining project - I was just reading through the issue backlog to get a feel for where the project is at and thought you might find the observation helpful. I did notice that for Metal execution there was an issue which caused a problem (but got addressed) which involved pulling over some ggml-metal file - if I were you I'd first make sure that all code I have locally absolutely matches the latest go-llama code base, then I would have a quick look around for the cuBLAS equivalent in the current go-llama code base and see if there is anything with temp=0 or a template prompt; after that I'd see if I could work out where in the go-llama execution path things fork depending on whether cuBLAS is used or not and I'd try to follow the cuBLAS path to the point where things get handed over to llama.cpp code.. the problem will be somewhere in there! Good luck - and remember, ChatGPT or Claude are your code explorer friends.. |
Running the following code with CuBLAS returns the same output every time it's run. Running without CuBLAS returns a different generation, as expected.
Output with CuBLAS:
, I'm interested in 10000 W 127th St, Palos Park, IL 60465. Please send me more information about this property.
The text was updated successfully, but these errors were encountered: