-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Whisper static generation #1275
base: main
Are you sure you want to change the base?
Conversation
Hi @ssarkar2 @bhargaveede @vivekgoe @regisss , if you have any issues or questions on this PR, please don't hesitate to contact me :) Here is the code to reproduced. This script is adapted from https://github.com/opea-project/GenAIComps/blob/main/comps/asr/whisper/whisper_model.py.
|
Hi @Spycsh , I haven't reviewed this code as yet and have few initial questions. Please confirm code merges cleanly with OH main branch and runs with latest 1.17.1 as well as next 1.18.0 latest build. How did you test these changes? Do you need to add any new tests to the CI? In addition, please run "tests/ci/fast_tests.sh" after installing ".[tests]" and any slow tests, e.g., the ones that run the whisper model in the the CI test suite although e.g., Also do "pip install -U ruff; make style" and check for any issues. |
Hi @emascarenhas , thanks for suggestions. I've added the test, passed style check and passed the test on habana 1.16.1 and OH merged with the latest main branch. I currently do not have environment with newer habana driver but I think it should work. Please tell me if you have further suggestions! |
I got this result Do you need to adjust the performance criteria? tests/test_speech_recognition_example.py:69: AssertionError |
@Spycsh , What I expected was a new or modified script that will use the code you introduced in the models directory. |
Hi @emascarenhas , on my gaudi2 the throughput (eval_samples_per_second) should be My detailed logs of the latest run on PR branch are shown below:
My command to trigger the test:
Regarding to your another question about where is Previously After this PR, the Welcome to further questions! |
It makes sense. I suppose we don't need to have tests/test_speech_recognition_example.py because we already have an examples test in the CI that invokes run_speech_recognition_seq2seq.py. We don't need to run the same test run twice. If it is running the same test with exact same code coverage, then I think we should remove test_speech_recognition_example.py ? |
@emascarenhas Sure :) I've removed the redundant test. |
Looks good to me. @libinta , Please add run-test label. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
What does this PR do?
Add static KV cache for Whisper family models. Previously it takes huge amount of time to do the Whisper inference (each first inference for each different input audio) on HPU because of dynamic shape of KV cache in SDPA attention module. With this PR the latency should be drastically reduced on HPU.
Before submitting