Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to perform the task of generating video captions #4

Closed
cascat0 opened this issue Oct 31, 2024 · 5 comments
Closed

How to perform the task of generating video captions #4

cascat0 opened this issue Oct 31, 2024 · 5 comments

Comments

@cascat0
Copy link

cascat0 commented Oct 31, 2024

Great work!

How to perform the task of generating video captions?

@Flaick
Copy link
Collaborator

Flaick commented Nov 5, 2024

Hello, thanks for the appreciation. Here we follow CapDec to perform the text-only training for image captioning.

@cascat0
Copy link
Author

cascat0 commented Nov 5, 2024

Hello, thanks for the appreciation. Here we follow CapDec to perform the text-only training for image captioning.

Thank you for your answer. I mean how I can use this code to infer captions directly. My requirement is to use this model to generate captions for my surgical video dataset (i.e., image-to-text). However, I only see the three inference modes of 'video', 'text' and 'all' in README, which seem to be unable to implement image-to-text transformation.

@Flaick
Copy link
Collaborator

Flaick commented Nov 5, 2024

Oh ok, so the current model is CLIP-like architecture, which only includes the visual and text encoders. The function of caption generation requires the trained text decoder trained from CapDec. We do not have this in the current repo, but let me check if we can integrate it. I will get to you back soon

@cascat0
Copy link
Author

cascat0 commented Nov 13, 2024

Oh ok, so the current model is CLIP-like architecture, which only includes the visual and text encoders. The function of caption generation requires the trained text decoder trained from CapDec. We do not have this in the current repo, but let me check if we can integrate it. I will get to you back soon

Thank you very much. I really need this feature for my graduation thesis.

@Flaick
Copy link
Collaborator

Flaick commented Feb 17, 2025

Hello, hope this is not too late, we have released the codebase for that, please refer to this repo: https://github.com/CAMMA-public/Surg-FTDA

@Flaick Flaick closed this as completed Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants