-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unexpected response when using llama2-7b-chat #3
Comments
For the question regarding LinearMask, comp_mask works with LoRA. I modified the LoRA Huggingface code at src/peft_custom/lora.py. Context-Memory/src/peft_custom/lora.py Line 565 in 24af6a0
Without LoRA, our model works the same as the original function, while the LoRA activates only for the compression tokens. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello!
I'm trying to use your pre-trained model with this command:
CUDA_VISIBLE_DEVICES=4,5,6,7 python inference.py -i -m llama-2-7b-chat --eval_name concat_recur
However, there is an unexpected generation stop when inputting the query:
help me list popular songs written by Taylor Swift.
The result is shown as follows:
It stops generating more content and outputs
</s>
instead.Are there any other settings I missed?
The text was updated successfully, but these errors were encountered: