Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal changes compared with llava1.5 #2

Closed
YuchenLiu98 opened this issue Aug 17, 2024 · 8 comments
Closed

Minimal changes compared with llava1.5 #2

YuchenLiu98 opened this issue Aug 17, 2024 · 8 comments

Comments

@YuchenLiu98
Copy link

Thanks a lot for your wonderful work. I wonder if you could provide the required minimal changes based on official llava (1.5) code. I would appreciate it a lot if you could help me. Thanks.

@YuchenLiu98
Copy link
Author

For the pretraining stage, the training process is successful with normal loss values.
However, for the finetuning stage, the training loss is 0 at the beginning. And I also notice a mismatch warning.
"WARNING: tokenization mismatch: 119 vs. 115. (ignored)
WARNING: tokenization mismatch: 115 vs. 111. (ignored)
WARNING: tokenization mismatch: 117 vs. 113. (ignored)
WARNING: tokenization mismatch: 131 vs. 127. (ignored)
WARNING: tokenization mismatch: 129 vs. 125. (ignored)"
Do you have any idea on how to solve this problem?Thanks a lot for your help.

@YuchenLiu98
Copy link
Author

I try to modify the variable "--version" from "llama_3_1" to "llama_3". Then the problems is solved without warning and the training loss is normal. However, I would like to train llama_3_1.

@federico1-creator
Copy link
Collaborator

Hi @YuchenLiu98, thank you for your interest in our LLaVA-MORE project!

My suggestion for identifying the differences between the two codebases is to use the diff -rq command between the two repositories. This will help you see which files have been changed.

However, to make it easier for you to address your issue, the most significant changes are located in 3 different parts:

Please, check the values of cur_len, total_len, and ensure that the correct tokenizer is instantiated.

Federico

@YuchenLiu98
Copy link
Author

Unfortunately, I fail to solve the problem. A very curious thing is that when I modify "version" to "llama3", the finetuning process continues successfully with normal value losses without the mismatch size warning. Do you have any idea about this problem? Thanks a lot for your help.

@federico1-creator
Copy link
Collaborator

federico1-creator commented Aug 19, 2024

The behavior you're observing is due to the fact that the preprocess functions for LLaMA 3 and LLaMA 3.1 are very similar, as is the structure of their respective tokenizers (dimension and special tokens).

Based on the logs you sent in your previous message, you can try to comment out this line.

cur_len= cur_len + len(tokenizer(sep, add_special_tokens=False).input_ids)

If you notice any differences, could you send us the versions of the libraries you're using, specifically tokenizers, transformers, torch, and cuda?

Federico

@YuchenLiu98
Copy link
Author

Many thanks for your help. With your mentioned modification (delete Line 592 in train.py), the warning disappears and the finetune process seems going well. In specific, I use tokenizers==0.19.1, transformers==4.43.1, torch==2.1.2, torchvision==0.16.2. I wonder will this modification influences the final performance and I will test the result once the training finishes. Thanks a lot for your help.

@aoji0606
Copy link

I got the same problem, when I commented out the line and the warning disappeared.
What is the purpose of this line?
cur_len= cur_len + len(tokenizer(sep, add_special_tokens=False).input_ids)

@federico1-creator
Copy link
Collaborator

Hi, as mentioned in this issue #7
the tokenization mismatch issue might be caused by using a different version of the llama 3.1 tokenizer.

I recommend referring to that issue to fix the problem.
@aoji0606 @YuchenLiu98

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants