How does the visual specialist (e.g., stablevideo) receive both textual instruction and task features? #17

waltonfuture · 2024-11-04T09:07:20Z

Thanks for your wonderful work! I have a question: How does the visual specialist (e.g., stablevideo) receive both textual instruction and task features?

It seems that textual instruction are a series of words, while task features are matrices or tensors. How can we combine them to input into the visual specialist?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the visual specialist (e.g., stablevideo) receive both textual instruction and task features? #17

How does the visual specialist (e.g., stablevideo) receive both textual instruction and task features? #17

waltonfuture commented Nov 4, 2024

How does the visual specialist (e.g., stablevideo) receive both textual instruction and task features? #17

How does the visual specialist (e.g., stablevideo) receive both textual instruction and task features? #17

Comments

waltonfuture commented Nov 4, 2024