Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Why does text-only data use the empty image token? #1792

Open
MSungK opened this issue Dec 9, 2024 · 0 comments
Open

[Question] Why does text-only data use the empty image token? #1792

MSungK opened this issue Dec 9, 2024 · 0 comments

Comments

@MSungK
Copy link

MSungK commented Dec 9, 2024

Question

Text-only data is implemented in such a way that learning proceeds with visual tokens set to empty. In my opinion, since the length of visual tokens is quite significant, it seems more efficient not to use meaningless visual tokens for text-only data. Moreover, since a sampler that samples data from the same modality is already implemented, I am even more puzzled.
Is there a specific reason for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant