You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a tracker issue for work on interleaved in-and-out image-text generation.
There are now >= 5 open-source models that can do interleaved image-text generation--and many more are expected to be released. Thus, it would now be practical & useful for us to (1) add native support for such models and (2) standardize the logic flow of data through processors and pipelines as done in #31911 and #32472
The paper & the github repo don't actually demonstrate interleaved image-text generation yet, but they did train the model on such datasets & the model architecture(s) is perfectly suited for it
-
Initial work for Chameleon & Anole can be found here: #32013 for reference.
Notes:
We explicitly exclude models that can only do text-only generation or image-only generation. We also exclude models that can do image-text generation but not in an interleaved manner.
As I've demonstrated in my repo, explicitly implementing the Finite State Machine (FSM) for switching between text-generation and image-generation modes as done in Chameleon's repo is not necessary. Implicitly implementing the FSM with Logits Processors suffices. Although more work is needed on finding the most efficient implementation.
TODOs:
Add support for interleaved image-text generation with:
Feature request
This is a tracker issue for work on interleaved in-and-out image-text generation.
There are now >= 5 open-source models that can do interleaved image-text generation--and many more are expected to be released. Thus, it would now be practical & useful for us to (1) add native support for such models and (2) standardize the logic flow of data through processors and pipelines as done in #31911 and #32472
Initial work for Chameleon & Anole can be found here: #32013 for reference.
Notes:
TODOs:
Motivation
Your contribution
I've already started work on Chameleon & Anole here: #32013
But I'm currently blocked by (1) not having enough time due to other responsibilities and (2) not having enough compute resources.
Any help would be appreciated!
The text was updated successfully, but these errors were encountered: