Introduced in our CVPR 2016 submission "Forecasting Social Navigation in Crowded Complex Scenes", the Stanford Aerial Pedestrian Dataset consists of annotated videos of pedestrians, bikers, skateboarders, cars, buses, and golf carts navigating eight unique scenes on the Stanford University campus. For more details, please visit http://cvgl.stanford.edu/resources.html
The eight scenes are:
bookstore coupa deathCircle gates hyang little nexus quad
Each video for each scene in the videos directory has an associated annotation file (annotation.txt) and exemplary frame (reference.jpg) in the annotations directory.
Annotation file format: Each line in the annotations.txt file corresponds to an annotation. Each line contains 10+ columns, separated by spaces. The definition of these columns are:
1 Track ID. All rows with the same ID belong to the same path.
2 xmin. The top left x-coordinate of the bounding box.
3 ymin. The top left y-coordinate of the bounding box.
4 xmax. The bottom right x-coordinate of the bounding box.
5 ymax. The bottom right y-coordinate of the bounding box.
6 frame. The frame that this annotation represents.
7 lost. If 1, the annotation is outside of the view screen.
8 occluded. If 1, the annotation is occluded.
9 generated. If 1, the annotation was automatically interpolated.
10 label. The label for this annotation, enclosed in quotation marks.