Skip to content

Convert speech to text using HuggingFace, comparing Wav2Vec2 versus OpenAI Whisper

Notifications You must be signed in to change notification settings

tracyreuter/NLP-speech-to-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

NLP-speech-to-text

Convert speech to text using HuggingFace, comparing Wav2Vec2 versus OpenAI Whisper

Data

Speech samples included a subset of sentences recorded for this study:

Reuter, T., Sullivan, M., & Lew-Williams, C. (2021). Look at that: Spatial deixis reveals experience- related changes in prediction. Language Acquisition. https://doi.org/10.1080/10489223.2021.1932905

Audio for lab-based experiments are very clean. So this should be an easy transcription task.

Conclusion

IMO, Whisper beats Wav2Vec2 in at least 3 ways:

  1. More performant.
  • Transcribed 20% faster.

  • Future enhancements could increase speed.

  1. More accurate.
  • Transcribed "apple" versus "apples" correctly.

  • Spelled "doggies" correctly as "doggies", not as "DOGGIYS".

  1. More nuanced.
  • Transcribed 3 sentences with emphatic punctuation (! instead of .)

  • Punctuation indicates emphasis and emotion, useful for downstream sentiment analysis.