This repo contains the training and evalution code of Sign2Text setup for translation sign langauge videos to spoken language sentences.
This code is based on an earlier version of Luong et al.'s Neural Machine Translation Tutorial.
- Download and extract RWTH-PHOENIX-Weather 2014 T: Parallel Corpus of Sign Language Video, Gloss and Translation and then resize the images to 227x227
- Download and install Tensorflow 1.3.0+
- Download AlexNet TensorFlow weights and put it under the folder BaseModel
- Python 2.7
python -m nmt --src=sign --tgt=de --train_prefix=../Data/phoenix2014T.train --dev_prefix=../Data/phoenix2014T.dev --test_prefix=../Data/phoenix2014T.test --out_dir=<your_output_dir> --vocab_prefix=../Data/phoenix2014T.vocab --source_reverse=True --num_units=1000 --num_layers=4 --num_train_steps=150000 --residual=True --attention=luong --base_gpu=<gpu_id> --unit_type=gru
python -m nmt --out_dir=<your_model_dir> --inference_input_file=<input_video_paths.sign> --inference_output_file=<predictions.de> --inference_ref_file=<ground_truth.de> --base_gpu=<gpu_id>
Please cite the paper below if you use this code in your research:
@inproceedings{camgoz2018neural,
author = {Necati Cihan Camgoz and Simon Hadfield and Oscar Koller and Hermann Ney and Richard Bowden},
title = {Neural Sign Language Translation},
booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2018}
}
This work was funded by the SNSF Sinergia project "Scalable Multimodal Sign Language Technology for Sign Language Learning and Assessment" (SMILE) grant agreement number CRSII2 160811 and the European Union’s Horizon2020 research and innovation programme under grant agreement no. 762021 (Content4All). This work reflects only the author’s view and the Commission is not responsible for any use that may be made of the information it contains. We would also like to thank NVIDIA Corporation for their GPU grant.