Skip to content

Commit 97a4785

Browse files
Bump version to 1.1.0 and update benchmarks (#1161)
* update version * Update CPU benchmarks * Updated GPU benchmarks * .. * more gpu benchmarks
1 parent 08f6900 commit 97a4785

File tree

2 files changed

+31
-41
lines changed

2 files changed

+31
-41
lines changed

README.md

Lines changed: 30 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -12,63 +12,53 @@ This implementation is up to 4 times faster than [openai/whisper](https://github
1212

1313
For reference, here's the time and memory usage that are required to transcribe [**13 minutes**](https://www.youtube.com/watch?v=0u7tTptBo9I) of audio using different implementations:
1414

15-
* [openai/whisper](https://github.com/openai/whisper)@[6dea21fd](https://github.com/openai/whisper/commit/6dea21fd7f7253bfe450f1e2512a0fe47ee2d258)
16-
* [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[3b010f9](https://github.com/ggerganov/whisper.cpp/commit/3b010f9bed9a6068609e9faf52383aea792b0362)
17-
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[cce6b53e](https://github.com/SYSTRAN/faster-whisper/commit/cce6b53e4554f71172dad188c45f10fb100f6e3e)
15+
* [openai/whisper](https://github.com/openai/whisper)@[v20240930](https://github.com/openai/whisper/tree/v20240930)
16+
* [whisper.cpp](https://github.com/ggerganov/whisper.cpp)@[v1.7.2](https://github.com/ggerganov/whisper.cpp/tree/v1.7.2)
17+
* [transformers](https://github.com/huggingface/transformers)@[v4.46.3](https://github.com/huggingface/transformers/tree/v4.46.3)
18+
* [faster-whisper](https://github.com/SYSTRAN/faster-whisper)@[v1.1.0](https://github.com/SYSTRAN/faster-whisper/tree/v1.1.0)
1819

1920
### Large-v2 model on GPU
2021

21-
| Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
22-
| --- | --- | --- | --- | --- | --- |
23-
| openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
24-
| faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
25-
| faster-whisper | int8 | 5 | 59s | 3091MB | 3117MB |
26-
27-
*Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S.*
22+
| Implementation | Precision | Beam size | Time | VRAM Usage |
23+
| --- | --- | --- | --- | --- |
24+
| openai/whisper | fp16 | 5 | 2m23s | 4708MB |
25+
| whisper.cpp (Flash Attention) | fp16 | 5 | 1m05s | 4127MB |
26+
| transformers (SDPA)[^1] | fp16 | 5 | 1m52s | 4960MB |
27+
| faster-whisper | fp16 | 5 | 1m03s | 4525MB |
28+
| faster-whisper (`batch_size=8`) | fp16 | 5 | 17s | 6090MB |
29+
| faster-whisper | int8 | 5 | 59s | 2926MB |
30+
| faster-whisper (`batch_size=8`) | int8 | 5 | 16s | 4500MB |
2831

29-
### Small model on CPU
32+
### distil-whisper-large-v3 model on GPU
3033

31-
| Implementation | Precision | Beam size | Time | Max. memory |
34+
| Implementation | Precision | Beam size | Time | YT Commons WER |
3235
| --- | --- | --- | --- | --- |
33-
| openai/whisper | fp32 | 5 | 10m31s | 3101MB |
34-
| whisper.cpp | fp32 | 5 | 17m42s | 1581MB |
35-
| whisper.cpp | fp16 | 5 | 12m39s | 873MB |
36-
| faster-whisper | fp32 | 5 | 2m44s | 1675MB |
37-
| faster-whisper | int8 | 5 | 2m04s | 995MB |
38-
39-
*Executed with 8 threads on a Intel(R) Xeon(R) Gold 6226R.*
36+
| transformers (SDPA) (`batch_size=16`) | fp16 | 5 | 46m12s | 14.801 |
37+
| faster-whisper (`batch_size=16`) | fp16 | 5 | 25m50s | 13.527 |
4038

39+
*GPU Benchmarks are Executed with CUDA 12.4 on a NVIDIA RTX 3070 Ti 8GB.*
40+
[^1]: transformers OOM for any batch size > 1
4141

42-
### Distil-whisper
42+
### Small model on CPU
4343

44-
| Implementation | Precision | Beam size | Time | Gigaspeech WER |
44+
| Implementation | Precision | Beam size | Time | RAM Usage |
4545
| --- | --- | --- | --- | --- |
46-
| distil-whisper/distil-large-v2 | fp16 | 4 |- | 10.36 |
47-
| [faster-distil-large-v2](https://huggingface.co/Systran/faster-distil-whisper-large-v2) | fp16 | 5 | - | 10.28 |
48-
| distil-whisper/distil-medium.en | fp16 | 4 | - | 11.21 |
49-
| [faster-distil-medium.en](https://huggingface.co/Systran/faster-distil-whisper-medium.en) | fp16 | 5 | - | 11.21 |
50-
51-
*Executed with CUDA 11.4 on a NVIDIA 3090.*
52-
53-
<details>
54-
<summary>testing details (click to expand)</summary>
46+
| openai/whisper | fp32 | 5 | 6m58s | 2335MB |
47+
| whisper.cpp | fp32 | 5 | 2m05s | 1049MB |
48+
| whisper.cpp (OpenVINO) | fp32 | 5 | 1m45s | 1642MB |
49+
| faster-whisper | fp32 | 5 | 2m37s | 2257MB |
50+
| faster-whisper (`batch_size=8`) | fp32 | 5 | 1m06s | 4230MB |
51+
| faster-whisper | int8 | 5 | 1m42s | 1477MB |
52+
| faster-whisper (`batch_size=8`) | int8 | 5 | 51s | 3608MB |
5553

56-
For `distil-whisper/distil-large-v2`, the WER is tested with code sample from [link](https://huggingface.co/distil-whisper/distil-large-v2#evaluation). for `faster-distil-whisper`, the WER is tested with setting:
57-
```python
58-
from faster_whisper import WhisperModel
54+
*Executed with 8 threads on an Intel Core i7-12700K.*
5955

60-
model_size = "distil-large-v2"
61-
# model_size = "distil-medium.en"
62-
# Run on GPU with FP16
63-
model = WhisperModel(model_size, device="cuda", compute_type="float16")
64-
segments, info = model.transcribe("audio.mp3", beam_size=5, language="en")
65-
```
66-
</details>
6756

6857
## Requirements
6958

7059
* Python 3.8 or greater
7160

61+
Unlike openai-whisper, FFmpeg does **not** need to be installed on the system. The audio is decoded with the Python library [PyAV](https://github.com/PyAV-Org/PyAV) which bundles the FFmpeg libraries in its package.
7262

7363
### GPU
7464

faster_whisper/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
"""Version information."""
22

3-
__version__ = "1.1.0rc0"
3+
__version__ = "1.1.0"

0 commit comments

Comments
 (0)