README.txt

Code Design

The code is structured as two pipelines of scripts. The following diagrams capture the dependency structure of the scripts (the following script depends on the output of the previous script):

Modules

Install the listed dependencies for each of these modules -- following the instructions on each of their pages.

evaluate by HuggingFace

Link: https://github.com/huggingface/evaluate Library Version: 0.4.0 Python Version: 3.8

torcheval by PyTorch

Link: https://github.com/pytorch/torcheval Library Version: 0.0.7 Python Version: 3.8

We had to modify this code, so we provide the code here as a subdirectory.

Spotify Podcast Dataset

Link: https://podcastsdataset.byspotify.com/

This dataset is maintained by Spotify, and access to the dataset is determined by Spotify.

Additional Dependencies

Pandas (Link: https://pandas.pydata.org/)
tqdm (Link: https://github.com/tqdm/tqdm)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
english-fisher-annotations		english-fisher-annotations
img		img
.gitignore		.gitignore
Get_ChatGPT_Punctuation.ipynb		Get_ChatGPT_Punctuation.ipynb
Get_English_Nonenglish.ipynb		Get_English_Nonenglish.ipynb
Get_Lt_10_Words.ipynb		Get_Lt_10_Words.ipynb
Get_Top_Tokens_For_Disfluent_Nodes.ipynb		Get_Top_Tokens_For_Disfluent_Nodes.ipynb
LICENSE		LICENSE
Large_Scale_Analysis.ipynb		Large_Scale_Analysis.ipynb
Large_Scale_Filtering.ipynb		Large_Scale_Filtering.ipynb
Large_Scale_Transcripts.ipynb		Large_Scale_Transcripts.ipynb
Large_Scale_WhisperX.py		Large_Scale_WhisperX.py
README.md		README.md
Small_Scale_After_Annotations.ipynb		Small_Scale_After_Annotations.ipynb
Small_Scale_Analysis.ipynb		Small_Scale_Analysis.ipynb
Small_Scale_Transcripts_And_Analysis.ipynb		Small_Scale_Transcripts_And_Analysis.ipynb
Small_Scale_WhisperX.py		Small_Scale_WhisperX.py
utils_general.py		utils_general.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README.txt

Code Design

Modules

evaluate by HuggingFace

torcheval by PyTorch

WhisperX

OpenAI API

english-fisher-annotations

Spotify Podcast Dataset

Additional Dependencies

About

Releases

Packages

Languages

License

mariateleki/Comparing-ASR-Systems

Folders and files

Latest commit

History

Repository files navigation

README.txt

Code Design

Modules

evaluate by HuggingFace

torcheval by PyTorch

WhisperX

OpenAI API

english-fisher-annotations

Spotify Podcast Dataset

Additional Dependencies

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages