Add support of audio transformers #1317

g-prz · 2024-12-03T21:21:14Z

This PR proposes an integration of audio transformers into outlines framework (#1270)

g-prz · 2024-12-04T09:17:14Z

Hey @rlouf 🙋‍♂️
I'm not sure, but it seems that the runner disk is full and qwen2-audio files are taking a bit too much space, I can try to find a gguf version or maybe a smaller model? Wdyt?

rlouf · 2024-12-04T10:12:07Z

We generally use HF's internal testing models which are much smaller, see the other tests. Can you try to find if there is one for audio?

g-prz · 2024-12-04T18:53:22Z

Hey @rlouf
Running into something funny, I have test_generate_text[model_transformers_audio-beam_search] that fails, after investigation the instantiation of the beam_search sampler with default param beams to 1 leads to a greedy sampler in hugging face transformers which leads to not calling _beam_search down to _reorder_cache that is not implemented for Qwen2...

To fix it, we could split the parametrization of test_generate_text with beam_search separated from greedy and multinomial so that we can pass more than one head to beam search and not fall into greedy sampler case

g-prz · 2024-12-11T14:43:18Z

Hey @rlouf
I updated test_generate_text
I have 95% coverage, the missing parts are in case of import errors (not sure we want to modify the test env to have the coverage) or processor with no tokenizer, to what extend do you want to increase the coverage?
Otherwise good for review on my side

g-prz force-pushed the integrate-transformers-audio branch 2 times, most recently from cbbd9a7 to 3bedff8 Compare December 3, 2024 21:34

g-prz added 5 commits December 11, 2024 11:35

feat(audio): integrate audio transfromers

f7fc7f2

fix(test): tests for audio transformers

ee5a176

fix(test): use tiny model for audio transformers

69ec787

fix(test): correctly handle beam_search in generate text

5d3142d

feat(audio): add cookbook for audio transformers integration

d7d6b65

g-prz force-pushed the integrate-transformers-audio branch from 2bbead1 to d7d6b65 Compare December 11, 2024 11:35

test(audio): improve coverage of validate prompt and media

b529821

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support of audio transformers #1317

Add support of audio transformers #1317

g-prz commented Dec 3, 2024

g-prz commented Dec 4, 2024

rlouf commented Dec 4, 2024

g-prz commented Dec 4, 2024

g-prz commented Dec 11, 2024

Add support of audio transformers #1317

Are you sure you want to change the base?

Add support of audio transformers #1317

Conversation

g-prz commented Dec 3, 2024

g-prz commented Dec 4, 2024

rlouf commented Dec 4, 2024

g-prz commented Dec 4, 2024

g-prz commented Dec 11, 2024