You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Text-only data is implemented in such a way that learning proceeds with visual tokens set to empty. In my opinion, since the length of visual tokens is quite significant, it seems more efficient not to use meaningless visual tokens for text-only data. Moreover, since a sampler that samples data from the same modality is already implemented, I am even more puzzled.
Is there a specific reason for this?
The text was updated successfully, but these errors were encountered:
Question
Text-only data is implemented in such a way that learning proceeds with visual tokens set to empty. In my opinion, since the length of visual tokens is quite significant, it seems more efficient not to use meaningless visual tokens for text-only data. Moreover, since a sampler that samples data from the same modality is already implemented, I am even more puzzled.
Is there a specific reason for this?
The text was updated successfully, but these errors were encountered: