Skip to content

mishra-18/lipnet-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lipnet-pytorch

The code is based on the paper LipNet: End-to-End Sentence-level Lipreading. LipNet utilizes 3D Convolutions and Recurrent Units to make sentence level prediction by extracting features from the lip movment in the input frames. This implementation provides 3DConv-Bi-LSTM over the 3DConv-GRU model along with a few other models with varying complexity. CTC loss is used to deal with variable length of input alignments (spoken sentences). The model weights are initialised with the same (he) initialization as proposed in the paper.

Image Description

Training

venv suggested..

python -m venv lipenv
source lipenv/bin/activate

Install gdown for downloading the dataset from drive

pip install gdown

Train your model

python main.py --epoch 300 \
               --lr 0.001  \
               --hidden_size 256  \
               --model lipnet-lstm \
               --batch 16 \
               --workers 4

Reference

@article{assael2016lipnet,
  title={LipNet: End-to-End Sentence-level Lipreading},
  author={Assael, Yannis M. and Shillingford, Brendan and Whiteson, Shimon and de Freitas, Nando},
  journal={arXiv preprint arXiv:1611.01599},
  year={2016}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published