A bachelors final year project combining various deep learning models to build a pipeline for diagnosing articulation and speech disorders
- Python
- Conda
git clone
this repositorygit submodule init
andgit submodule update
to initialize the submodules- Create conda environment
voca
& resolve dependencies from voca directory - Create conda environment
deca
& install dependencies from deca directory - Create conda environment
autoeditor
& runpip install auto-editor
- Create conda environment
pyqt
& runpip install pyqt5
- Activate
pyqt
environment and execute speech app
- User provides input in the form of video
- Frame rate of input video is changed to 24
- Silence part is removed
- Duration is adjusted
- Extract Audio from Video
- Convert Video to Frames
- Convert Frames to 3D Meshes
- Compare 3D Meshes with Standard
By default, this repository contains only one standard stream. If you wish to add more standard words, perform the following steps:
- Make sure selenium is installed
- Place your desired words in words.txt file to scrape from online dictionary.
- Run audio_scraper.py
- Place scraped mp3 files into standard audios folder
- Activate environment
voca
- Run preprocess_audios.py
- Run generate_vertices.py
- You should see your standard's 3D Meshes generated by VOCA model