This repository contains an official PyTorch implementation of Position-aware Location Regression Network (PLRN) for temporal video grounding, which is presented in the paper Position-aware Location Regression Network for Temporal Video Grounding.
The overall architecture of the proposed network (PLRN). To understand comprehensive contexts with only one semantic phrase, PLRN exploits position-aware features of a query and a video. Specifically, PLRN first encodes both the video and query using positional information of words and video segments. Then, a semantic phrase feature is extracted from an encoded query with attention. The semantic phrase feature and encoded video are merged and made into a context-aware feature by reflecting local and global contexts. Finally, PLRN predicts start, end, center, and width values of a grounding boundary.
- Ubuntu 16.04
- Anaconda 3
- Python 3.6
- Cuda 10.1
- Cudnn 7.6.5
- PyTorch 1.1.0
We downloaded all data including annotations, video features (I3D for Charades-STA, C3D for ActivityNet Captions), pre-processed annotation information from here.
conda activate plrn
cd PLRN
bash scripts/train_model.sh PLRN plrn charades 0 4 0
conda activate plrn
cd PLRN
bash scripts/eval_model.sh PLRN plrn charades 0
Local-Global Video-Text Interactions for Temporal Grounding was very helpful for our implementation.
If you have found our implementation useful, please cite our paper:
@inproceedings{kim2021position,
title={Position-aware Location Regression Network for Temporal Video Grounding},
author={Kim, Sunoh and Yun, Kimin and Choi, Jin Young},
booktitle={2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)},
year={2021}
}