This repo contains the official implementations of the two papers:
- Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
- TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
- [2023.02] 🎉 TVTS is accepted to CVPR 2023.
- [2023.03] The official code of TVTS has been released.
- [2023.05] 🚀 TVTSv2 is comming out! Please refer to this link for details.
- [2023.08] The official code of TVTSv2 and the pre-trained models have been released. All zero-shot evaluations are available on a single GPU. We provide scripts for extracting your own video features. Try it now 😎!
Folder v1 contains the official code of TVTS. See v1-README for details.
Folder v2 contains the official code of TVTSv2, an upgraded version of TVTS that produces powerful video representations for out-of-the-box usage. See v2-README for details.
If you find our work helps, please cite our paper.
@InProceedings{Zeng_2023_CVPR,
author = {Zeng, Ziyun and Ge, Yuying and Liu, Xihui and Chen, Bin and Luo, Ping and Xia, Shu-Tao and Ge, Yixiao},
title = {Learning Transferable Spatiotemporal Representations From Natural Script Knowledge},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {23079-23089}
}
@misc{zeng2023tvtsv2,
title={TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale},
author={Ziyun Zeng and Yixiao Ge and Zhan Tong and Xihui Liu and Shu-Tao Xia and Ying Shan},
year={2023},
eprint={2305.14173},
archivePrefix={arXiv},
primaryClass={cs.CV}
}