Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing Vila-U Training #6

Open
Pulyong opened this issue Nov 14, 2024 · 0 comments
Open

Reproducing Vila-U Training #6

Pulyong opened this issue Nov 14, 2024 · 0 comments

Comments

@Pulyong
Copy link

Pulyong commented Nov 14, 2024

Thanks for great work!!

I am currently attempting to reproduce the Vila-U model. As I understand it, Vision Tower training (Image and Video quantization) should be conducted first, followed by LLM training.

From what I've found, there is code available for LLM pretraining and supervised fine-tuning (SFT), but there seems to be no code available for Vision Tower training. (If I missed something, please let me know.)

Therefore, could you provide code or a recipe for Vision Tower training?

Additionally, as mentioned in the paper, COYO-700M, ShareGPT4V, MMC4, an internal dataset, and OpenVid were used for training. I am curious about how sampling was conducted on these datasets, and if the internal dataset is shareable.

I would also greatly appreciate any other details about the training process.(epochs per stage, GPU type, GPU nums, GPU hours, etc...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant