GitHub - jessb0t/emoSPIN: project | comparing humans and LLMs on decoding emotional speech in noise

Comparing Human and LLM Transcription of Emotional Speech in Noise

In a recent experiment by Alexander & Llanos, human listeners were tasked with transcribing emotional speech in background noise. Unsurprisingly, human transcribers performed better at higher SNRs. Of greater interest, performance was also better for happy and angry prosodies relative to neutral. Given recent work comparing human performance to the capacities of speech-based large language models (Patman & Chodroff, 2024 and Kim et al., 2024), I wondered: how might speech-to-text LLMs fare with emotional speech? This mini study extracts transcriptions from five speech-to-text models (Wav2Vec2.0 (base), Wav2Vec2.0 (large), Whisper (base), Whisper (large), and SpeechT5) and compares their performance to human listeners.

Repository License: CC BY-SA 4.0

Written Summary with Plots

`emoSPIN-summary.Rmd`

Please open the knitted .html in a web browser to read about the project.

Data Extraction

`speecht5.py`, `wav2vec2_base.py`, `wav2vec2_large.py`, `whisper_base.py`, `whisper_large.py`

Each .py file extracts transcriptions for all stimuli from the associated LLM. Transcriptions are stored in outputs/.

Human Data

Human performance data is available on OSF.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Comparing Human and LLM Transcription of Emotional Speech in Noise

Written Summary with Plots

`emoSPIN-summary.Rmd`

Data Extraction

`speecht5.py`, `wav2vec2_base.py`, `wav2vec2_large.py`, `whisper_base.py`, `whisper_large.py`

Human Data

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
outputs		outputs
.gitignore		.gitignore
emoSPIN-summary.Rmd		emoSPIN-summary.Rmd
emoSPIN-summary.html		emoSPIN-summary.html
readme.md		readme.md
speecht5.py		speecht5.py
wav2vec2_base.py		wav2vec2_base.py
wav2vec2_large.py		wav2vec2_large.py
whisper_base.py		whisper_base.py
whisper_large.py		whisper_large.py

jessb0t/emoSPIN

Folders and files

Latest commit

History

Repository files navigation

Comparing Human and LLM Transcription of Emotional Speech in Noise

Written Summary with Plots

emoSPIN-summary.Rmd

Data Extraction

speecht5.py, wav2vec2_base.py, wav2vec2_large.py, whisper_base.py, whisper_large.py

Human Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`emoSPIN-summary.Rmd`

`speecht5.py`, `wav2vec2_base.py`, `wav2vec2_large.py`, `whisper_base.py`, `whisper_large.py`

Packages