Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to replace existing SigLip-400M with our clip encoder? #780

Closed
SrikanthChellappa opened this issue Jan 22, 2025 · 4 comments
Assignees

Comments

@SrikanthChellappa
Copy link

Team, Is it possible to replace the existing SigLip-400M with our clip encoder in MiniCPM-o model? If yes, can you pls assist with directions.

@YuzaChongyi
Copy link
Collaborator

Why would you want to replace the vit model? Our vit (SigLip-400M) is trained end-to-end with the LLM, so directly replacing it would definitely lead to incorrect results, unless you retrain the model.

@YuzaChongyi YuzaChongyi self-assigned this Jan 22, 2025
@SrikanthChellappa
Copy link
Author

SrikanthChellappa commented Jan 22, 2025

We have a clip model that has been fine-tuned on medical images and just wanted to check the possibility of having the swap done between siglip with our clip.

sameway goes with medical whisper and also qwen llm as well since we have individually these domain fine-tuned components in place.

Pls assist

@YuzaChongyi
Copy link
Collaborator

Theoretically, this is feasible; however, you need to convert the VIT to the NAVIT-SigLIP-400M format, paying particular attention to the embedding. It is advisable to conduct thorough training after the replacement to ensure optimal performance.

@SrikanthChellappa
Copy link
Author

Thanks @YuzaChongyi. Can you pls guide on its implementation part as well ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants