Releases · m-bain/whisperX

02 Jan 13:09

Barabazs

v3.3.0

4916192

v3.3.0 Latest

Latest

What's Changed

Update faster-whisper to 1.0.2 to enable model distil-large-v3 by @moritzbrantner in #814
latest faster-whisper support added by @Hasan-Naseer in #875
Working version with pyannote:3.3.2 and faster-whisper:1.1.0 by @ibombonato in #936
Add ultization to verbose flag by @H4CK3Rabhi in #759
Added local_files_only option on whisperx.load_model for offline mode by @RoqueGio in #867
adding cache_dir to wav2vec2 by @bnitsan in #681
feat: add basic installation test flow & restrict python versions by @Barabazs in #965
chore: add build and release workflow by @Barabazs in #966
fix: update README image source and enhance setup.py for long description by @Barabazs in #968
docs: update installation instructions in README by @Barabazs in #969
fix: add UTF-8 encoding when reading README.md by @xigh in #970
chore: loosen ctranslate2 version restriction & bump whisperX version by @Barabazs in #971

New Contributors

@moritzbrantner made their first contribution in #814
@Hasan-Naseer made their first contribution in #875
@ibombonato made their first contribution in #936
@H4CK3Rabhi made their first contribution in #759
@RoqueGio made their first contribution in #867
@bnitsan made their first contribution in #681
@xigh made their first contribution in #970

Full Changelog: v3.2.0...v3.3.0

Contributors

ibombonato, xigh, and 6 other contributors

Assets 3

18 Dec 08:03

Barabazs

v3.2.0

7307306

v3.2.0

Device and Language Support

added Korean wav2vec2 model by @Boulaouaney in #277
Add Czech alignment model by @Thebys in #280
Adding Norwegian Bokmål and Norwegian Nynorsk by @peregilk in #636
Support language names in --language parameter. by @jkukul in #517
Add align model for catalan language. by @davidmartinrius in #581
add missing Cantonese in supported languages by @MahmoudAshraf97 in #617
Add alignment model for Malayalam by @kurianbenoy in #585
Added Romanian phoneme-based ASR model by @Majdoddin in #791
added alignment for sk and sl languages by @jan-panoch in #852
Add war2vec model for Vietnamese in #278
Add Urdu model support for alignment by @abCods in #374
chore(writer): Join words without spaces for ja, zh by @jim60105 in #440

Bug Fixes and Stability Improvements

fix Unequal Stack Size VAD error by @m-bain in #281
fix: Bug in type hinting by @VisionOra in #294
pin faster whisper by @sorgfresser in #474
Fix repeat transcription on different languages and proper suppress_numerals use by @Joemgu7 in #395
fix writer fail on segments 0 by @sorgfresser in #429
fix missing speaker prefix by @invisprints in #438
fix: correct defaut_asr_options with new options (patch 0.8) by @remic33 in #458
Fixes --model_dir path by @canoalberto in #648
fix: force ctranslate to version 4.4.0 by @Barabazs in #946
fix: update faster-whisper dependencies by @cococig in #716
fix: ZeroDivisionError when --print_progress True by @mvoggu in #494
Minor fixes for word options and subtitles by @amolinasalazar in #549
fix unboundlocalerror by @sorgfresser in #554
Fix: Allow vad options to be configurable by passing to FasterWhisperPipeline and merge_chunks. by @abettke in #507
fix minimum input length for torch wav2vec2 models by @MahmoudAshraf97 in #510
fix(diarize): key error on empty track by @characat0 in #518
pip compliance for git+ installs by @spbisc97 in #603

Documentation Updates

adds link to whisperX medium on replicate.com by @CaRniFeXeR in #431
Document --compute_type command line option by @dotgrid in #430
adding link to Replicate demo by @daanelson in #352
fix: typo in error message by @zamoshchin in #493
Fix link in README.md by @jimregan in #668
Update README.md by @valentt in #509
Add a special note about Speaker-Diarization-3.0 in readme by @kaihe-stori in #521
Update README to correct speaker diarization version link by @gillens in #618
Update README.md by @mlopsengr in #630
fix link by @M0HID in #605
Remove torchvision from README by @baer in #378

Miscellaneous Changes

move model to assets by @m-bain in #945
Update alignment.py by @Ayushi-Desynova in #418
Update alignment.py by @awerks in #427
push contributions from main by @m-bain in #290
make diarization faster by @davidas1 in #400
Add device_index option by @sorgfresser in #266
Add transcribe keywords by @sorgfresser in #269
Added download path parameter. by @prameshbajra in #284
Suppress numerals by @m-bain in #303
Add Audacity export by @Ca-ressemble-a-du-fake in #309
Update transcribe.py -> small change in batch_size description by @mabergerx in #382
Suggest using pytorch-cuda 11.8 instead of 11.7 by @tijszwinkels in #255
feat: Add merge chunks chunk_size as arguments. by @jim60105 in #445
A solution to long subtitles and words without timestamps by @awerks in #459
chore(writer): improve text display(ja etc) in json file by @darwintree in #472
add faster whisper threading by @sorgfresser in #473
Pyannote3 by @remic33 in #492
Update alignment.py by @piuy11 in #487
Pass patience and beam_size to faster-whisper. by @jkukul in #527
remove the minimum length for alignment and print the failing segment by @MahmoudAshraf97 in #529
Update setup.py to use pyannote.audio version with working GPU by @wuurrd in #531
Update setup.py to download pyannote depending on platform by @justinwlin in #541
Drop ffmpeg-python dependency and call ffmpeg directly. by @hidenori-endo in #570
no align based on space by @sorgfresser in #556
Update asr.py and make the model parameter be used by @kaka1909 in #580
Move load_model after WhisperModel by @DougTrajano in #584
Update pyannote to 3.1.0 by @remic33 in #586
support for large-v3 by @MahmoudAshraf97 in #599
Added option to load Custom VAD model to load model method by @Swami-Abhinav in #654
Update pyannote to v3.1.1 to fix a diarization problem (and diarize.py) by @santialferez in #646
Get rid of numeral_symbol_tokens variable in printed message by @KossaiSbai in #669
Add Replicate large-v3 demo by @victor-upmeet in #703
local vad model by @m-bain in #944
Feat: add new align models - SHORT by @Equipo45 in #922
Update alignment.py by @peregilk in #687

Full Changelog: v3.1.1...v3.2.0

Contributors

tijszwinkels, jimregan, and 53 other contributors

Assets 3

13 May 11:19

m-bain

v3.1.1

d8a2b4f

3.1.1

translate functionality added
fix word timestamp bug (words no longer have consecutive timestamps)

Assets 2

07 May 19:34

m-bain

v3.1.0

1caddfb

v3.1.0

70x real time transcription, <8GB gpu memory requirement ⚡️⚡️
each transcript segment is a sentence (using nltk.sent_tokenize)
diarization now assigned per sentence (and outputted to srt)
clean up on alignment logic

Assets 2

04 May 19:53

m-bain

v3.0.2

4e2ac4e

v3.0.2 Pre-release

Pre-release

torch2.0, python3.10

Assets 2

01 May 10:52

m-bain

v3.0.1

e24ca9e

v3.0.1 Pre-release

Pre-release

fix pickling error (set num_workers=0) to perform dataloading iwth main process
add basic diarization
pad language detection if less than 30s

Assets 2

28 Apr 15:48

m-bain

v3.0.0

cc7e168

v3.0.0 Pre-release

Pre-release

batched inference with faster-whisper backend

Assets 2

28 Apr 15:47

m-bain

v2.0.1

da45886

v2.0.1

allow custom model_dir for torchaudio models

Assets 2

26 Feb 21:05

m-bain

v1.0.0

847a3cd

alpha

Merge pull request #96 from smly/fix-batch-processing

FIX: Assertion error in batch processing

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

Device and Language Support

Bug Fixes and Stability Improvements

Documentation Updates

Miscellaneous Changes

Contributors

Releases: m-bain/whisperX

v3.3.0

What's Changed

New Contributors

Contributors

v3.2.0

Device and Language Support

Bug Fixes and Stability Improvements

Documentation Updates

Miscellaneous Changes

Contributors

3.1.1

v3.1.0

v3.0.2

v3.0.1

v3.0.0

v2.0.1

alpha