Enable new models in audio-to-text #163

eliteprox · 2024-08-17T16:11:37Z

This change adds support for new whisper models distil-whisper/distil-large-v3 and openai/whisper-medium.

It also optimizes those models to use the appropriate BFLOAT, FLOAT16 or FLOAT32 values.

Credit to @ad-astra-video for intially exploring these models and optimizations

runner/app/pipelines/audio_to_text.py

eliteprox · 2024-10-01T16:47:28Z

@rickstaa I made several changes since you last reviewed this PR, so I held off on merging. Could you or @ad-astra-video re-review the latest changes?

… oom occurs

…d_timestamps

…=word jobs

This commit applies some small code improvements to the A2T pipeline.

This commit fixes some rebase conflicts that were introduced in the last rebase.

This commit updates to A2T pipeline log so that it is clear the default batch_size and chunk_length_s is used.

This commit removes the debug patch that was accidentally added.

rickstaa

@eliteprox great work! Thanks!

eliteprox requested a review from rickstaa as a code owner August 17, 2024 16:11

rickstaa approved these changes Aug 17, 2024

View reviewed changes

runner/app/pipelines/audio_to_text.py Outdated Show resolved Hide resolved

eliteprox requested a review from rickstaa October 1, 2024 15:53

eliteprox force-pushed the whisper-fp16 branch 4 times, most recently from b553c6b to 5e1c90a Compare October 31, 2024 17:47

eliteprox mentioned this pull request Oct 31, 2024

Send duration in a2t inference requests livepeer/go-livepeer#3227

Merged

5 tasks

ad-astra-video and others added 12 commits November 5, 2024 19:55

enable new models in audio-to-text

0e5f682

check optimization flag values in audio-to-text

bfbc58b

(a2t) add new models and optimization defaults

ebced56

use batch_size of 4 when return_timestamps=word, clear vram when cuda…

f79f5d3

… oom occurs

optimize chunking and batch size by duration return_timestamps option

ac9cfb4

clean up modelname enum

1cc8a98

support duration from gateway, support up to 1 hour duration with wor…

e223d44

…d_timestamps

add duration as request parameter from gateway

029b35f

(a2t) adjust duration limit for lower batch_size on return_timestamps…

e8ab1cd

…=word jobs

rename job_info to job_metadata

bc9eb69

rename field job_metadata to metadata

7e8ffb8

refactor: apply some small code improvements

4775d57

This commit applies some small code improvements to the A2T pipeline.

rickstaa force-pushed the whisper-fp16 branch from dae1e50 to 4775d57 Compare November 5, 2024 18:57

rickstaa added 4 commits November 5, 2024 19:58

fix(worker): resolve worker rebase conflits

fd21775

This commit fixes some rebase conflicts that were introduced in the last rebase.

refactor: improve A2T pipeline logs

559cd45

This commit updates to A2T pipeline log so that it is clear the default batch_size and chunk_length_s is used.

fixup! refactor: improve A2T pipeline logs

c8fbf37

fix: remove debug patch

b1e7300

This commit removes the debug patch that was accidentally added.

rickstaa approved these changes Nov 5, 2024

View reviewed changes

update log line

a2253eb

eliteprox merged commit acf9b15 into livepeer:main Nov 5, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable new models in audio-to-text #163

Enable new models in audio-to-text #163

eliteprox commented Aug 17, 2024 •

edited

Loading

eliteprox commented Oct 1, 2024

rickstaa left a comment

Enable new models in audio-to-text #163

Enable new models in audio-to-text #163

Conversation

eliteprox commented Aug 17, 2024 • edited Loading

eliteprox commented Oct 1, 2024

rickstaa left a comment

Choose a reason for hiding this comment

eliteprox commented Aug 17, 2024 •

edited

Loading