Replies: 1 comment 4 replies
-
Hey, I am running similar experiments and have the following observations:
|
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I’m finetunning LLM on my data using SFTTrainer, bitsandbytes quatization and peft with configs like listed below. When I convert the model to GGUF for CPU inference, the model performance significantly drops. Any idea what could be a problem?
I do conversion to gguf in the following way. First, merge trained adapter with base model. Then such merged model is converted to gguf using llama.cpp, ‘convert.py’ script, I do q8_0 quantization, tested other types without success. I tested as well conversion using unsloath, as well w/o positive result.
python convert.py <MERGED_MODEL_PATH>
--outfile <OUTPUT_MODEL_NAME.gguf>
--outtype q8_0
--vocab_dir <ADAPTER_MODEL_PATH>
Beta Was this translation helpful? Give feedback.
All reactions