Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I wanted to use this for my lectures, but i can only transcribe 1 minute out of the full hour #3

Open
ParisMolver opened this issue May 10, 2023 · 4 comments

Comments

@ParisMolver
Copy link

Is this normal? I looked at the code and there is audio splitting (great work btw) that looks like it can handle really large file lengths

@Carleslc
Copy link
Owner

Carleslc commented May 10, 2023

No, that's not normal. Audio splitting is there to allow large files, splitting requests to the API. But no audio-splitting is done for open-source model (it should not be required, the whisper code already processes the audio in chunks).

Are you using the API or open-source model (Colab or local CLI)?

Also, is there any error displayed while you're transcribing?

@Carleslc
Copy link
Owner

I've updated the whisper and openai dependencies to the latest releases, check if now works for you. I see no changes whatsoever, it's working fine for my testing files.

@ParisMolver
Copy link
Author

ParisMolver commented May 16, 2023 via email

@Carleslc
Copy link
Owner

Oh, I tested it with files up to 30 minutes with the open-source model, as they take a while to process. I know that with the API longer files can be processed because of the audio splitting I implemented, I'm glad that worked for you, but that's not currently implemented for the open-source model. I remember whisper already processes the audio in chunks in the open-source model, but maybe I should implement the audio splitting also for the open-source model in the AudioToText code if it's causing problems with the open-source model and large files. The audio-splitting also adds the transcriptions together in the resulting txt/srt/vtt files as you want.

The usage of GPT to concise the text is out of the scope of AudioToText, but transcribing 1h+ audio files should be ok with open-source model (although it takes a while with the Tesla T4 GPU Google Colab offers to free users). Usage of the OpenAI whisper API is up to the user, it is not mandatory in the AudioToText Google Colab (just fill or empty the api_key field).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants