You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently attempting to reproduce the Vila-U model. As I understand it, Vision Tower training (Image and Video quantization) should be conducted first, followed by LLM training.
From what I've found, there is code available for LLM pretraining and supervised fine-tuning (SFT), but there seems to be no code available for Vision Tower training. (If I missed something, please let me know.)
Therefore, could you provide code or a recipe for Vision Tower training?
Additionally, as mentioned in the paper, COYO-700M, ShareGPT4V, MMC4, an internal dataset, and OpenVid were used for training. I am curious about how sampling was conducted on these datasets, and if the internal dataset is shareable.
I would also greatly appreciate any other details about the training process.(epochs per stage, GPU type, GPU nums, GPU hours, etc...)
The text was updated successfully, but these errors were encountered:
Thanks for great work!!
I am currently attempting to reproduce the Vila-U model. As I understand it, Vision Tower training (Image and Video quantization) should be conducted first, followed by LLM training.
From what I've found, there is code available for LLM pretraining and supervised fine-tuning (SFT), but there seems to be no code available for Vision Tower training. (If I missed something, please let me know.)
Therefore, could you provide code or a recipe for Vision Tower training?
Additionally, as mentioned in the paper, COYO-700M, ShareGPT4V, MMC4, an internal dataset, and OpenVid were used for training. I am curious about how sampling was conducted on these datasets, and if the internal dataset is shareable.
I would also greatly appreciate any other details about the training process.(epochs per stage, GPU type, GPU nums, GPU hours, etc...)
The text was updated successfully, but these errors were encountered: