[Question] Why does text-only data use the empty image token? #1792

MSungK · 2024-12-09T15:28:41Z

Question

Text-only data is implemented in such a way that learning proceeds with visual tokens set to empty. In my opinion, since the length of visual tokens is quite significant, it seems more efficient not to use meaningless visual tokens for text-only data. Moreover, since a sampler that samples data from the same modality is already implemented, I am even more puzzled.
Is there a specific reason for this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Why does text-only data use the empty image token? #1792

[Question] Why does text-only data use the empty image token? #1792

MSungK commented Dec 9, 2024

[Question] Why does text-only data use the empty image token? #1792

[Question] Why does text-only data use the empty image token? #1792

Comments

MSungK commented Dec 9, 2024

Question