-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to perform the task of generating video captions #4
Comments
Hello, thanks for the appreciation. Here we follow CapDec to perform the text-only training for image captioning. |
Thank you for your answer. I mean how I can use this code to infer captions directly. My requirement is to use this model to generate captions for my surgical video dataset (i.e., image-to-text). However, I only see the three inference modes of 'video', 'text' and 'all' in README, which seem to be unable to implement image-to-text transformation. |
Oh ok, so the current model is CLIP-like architecture, which only includes the visual and text encoders. The function of caption generation requires the trained text decoder trained from CapDec. We do not have this in the current repo, but let me check if we can integrate it. I will get to you back soon |
Thank you very much. I really need this feature for my graduation thesis. |
Great work!
How to perform the task of generating video captions?
The text was updated successfully, but these errors were encountered: