Skip to content

Latest commit

 

History

History
1378 lines (1245 loc) · 48 KB

README.md

File metadata and controls

1378 lines (1245 loc) · 48 KB

DriveAGI

This is "The One" project that OpenDriveLab is committed to contribute to the community, providing some thought and general picture of how to embrace foundation models into autonomous driving.

Table of Contents

NEWS

[ NEW❗️] 2024/09/08 We released a mini version of OpenDV-YouTube, containing 25 hours of driving videos. Feel free to try the mini subset by following instructions at OpenDV-mini!

2024/05/28 We released our latest research, Vista, a generalizable driving world model. It's capable of predicting high-fidelity and long-horizon futures, executing multi-modal actions, and serving as a generalizable reward function to assess driving behaviors.

2024/03/24 OpenDV-YouTube Update: Full suite of toolkits for OpenDV-YouTube is now available, including data downloading and processing scripts, as well as language annotations. Please refer to OpenDV-YouTube.

2024/03/15 We released the complete video list of OpenDV-YouTube, a large-scale driving video dataset, for GenAD project. Data downloading and processing script, as well as language annotations, will be released next week. Stay tuned.

2024/01/24 We are excited to announce some update to our survey and would like to thank John Lambert, Klemens Esterle from the public community for their advice to improve the manuscript.

At A Glance

Here are some key components to construct a large foundation model curated for an autonomous system.

overview

Below we would like to share the latest update from our team on the DriveData side. We will release the detail of the DriveEngine and the DriveAGI in the future.

Vista

Simulated futures in a wide range of driving scenarios by Vista. Best viewed on demo page.

Quick facts:

@inproceedings{gao2024vista,
 title={Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability}, 
 author={Shenyuan Gao and Jiazhi Yang and Li Chen and Kashyap Chitta and Yihang Qiu and Andreas Geiger and Jun Zhang and Hongyang Li},
 booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
 year={2024}
}

@inproceedings{yang2024genad,
  title={{Generalized Predictive Model for Autonomous Driving}},
  author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2024}
}

GenAD: OpenDV Dataset

opendv

Examples of real-world driving scenarios in the OpenDV dataset, including urban, highway, rural scenes, etc.

🎦 The Largest Driving Video dataset to date, containing more than 1700 hours of real-world driving videos and being 300 times larger than the widely used nuScenes dataset.

  • Complete video list (under YouTube license): OpenDV Videos.
    • The downloaded raw videos (mostly 1080P) consume about 3 TB storage space. However, these hour-long videos cannot be directly applied for model training as they are extremely memory consuming.
    • Therefore, we preprocess them into conseductive images which are more flexible and efficient to load during training. Processed images consumes about 24 TB storage space in total.
    • It's recommended to set up your experiments on a small subset, say 1/20 of the whole dataset. An official mini subset is also provided and you can refer to OpenDV-mini for details. After stablizing the training, you can then apply your method on the whole dataset and hope for the best 🤞.
  • [ New❗️] Mini subset: OpenDV-mini.
    • A mini version of OpenDV-YoUTube. The raw videos consume about 44 GB of storage space and the processed images will consume about 390 GB of storage space.
  • Step-by-step instruction for data preparation: OpenDV-YouTube.
  • Language annotation for OpenDV-YouTube: OpenDV-YouTube-Language.

Quick facts:

  • Task: large-scale video prediction for driving scenes.
  • Data source: YouTube, with careful collection and filtering process.
  • Diversity Highlights: 1700 hours of driving videos, covering more than 244 cities in 40 countries.
  • Related work: GenAD Accepted at CVPR 2024, Highlight
  • Note: Annotations for other public datasets in OpenDV-2K will not be released since we randomly sampled a subset of them in training, which are incomplete and hard to trace back to their origins (i.e., file name). Nevertheless, it's easy to reproduce the collection and annotation process on your own following our paper.
@inproceedings{yang2024genad,
  title={Generalized Predictive Model for Autonomous Driving},
  author={Jiazhi Yang and Shenyuan Gao and Yihang Qiu and Li Chen and Tianyu Li and Bo Dai and Kashyap Chitta and Penghao Wu and Jia Zeng and Ping Luo and Jun Zhang and Andreas Geiger and Yu Qiao and Hongyang Li},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

DriveLM

Introducing the First benchmark on Language Prompt for Driving.

Quick facts:

DriveData Survey

Abstract

With the continuous maturation and application of autonomous driving technology, a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem. In this survey, we provide a comprehensive analysis of more than 70 papers on the timeline, impact, challenges, and future trends in autonomous driving dataset.

Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future

@article{li2024_driving_dataset_survey,
 title = {Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future},
 author = {Hongyang Li and Yang Li and Huijie Wang and Jia Zeng and Huilin Xu and Pinlong Cai and Li Chen and Junchi Yan and Feng Xu and Lu Xiong and Jingdong Wang and Futang Zhu and Chunjing Xu and Tiancai Wang and Fei Xia and Beipeng Mu and Zhihui Peng and Dahua Lin and Yu Qiao},
 journal = {SCIENTIA SINICA Informationis},
 year = {2024},
 doi = {10.1360/SSI-2023-0313}
}

overview

Current autonomous driving datasets can broadly be categorized into two generations since the 2010s. We define the Impact (y-axis) of a dataset based on sensor configuration, input modality, task category, data scale, ecosystem, etc.

overview

Related Work Collection

We present comprehensive paper collections, leaderboards, and challenges.(Click to expand)

Challenges and Leaderboards
Title Host Year Task Entry
Autonomous Driving Challenge OpenDriveLab CVPR2023 Perception / OpenLane Topology 111
Perception / Online HD Map Construction
Perception / 3D Occupancy Prediction
Prediction & Planning / nuPlan Planning
Waymo Open Dataset Challenges Waymo CVPR2023 Perception / 2D Video Panoptic Segmentation 35
Perception / Pose Estimation
Prediction / Motion Prediction
Prediction / Sim Agents
CVPR2022 Prediction / Motion Prediction 128
Prediction / Occupancy and Flow Prediction
Perception / 3D Semantic Segmentation
Perception / 3D Camera-only Detection
CVPR2021 Prediction / Motion Prediction 115
Prediction / Interaction Prediction
Perception / Real-time 3D Detection
Perception / Real-time 2D Detection
Argoverse Challenges Argoverse CVPR2023 Prediction / Multi-agent Forecasting 81
Perception & Prediction / Unified Sensorbased Detection, Tracking, and Forecasting
Perception / LiDAR Scene Flow
Prediction / 3D Occupancy Forecasting
CVPR2022 Perception / 3D Object Detection 81
Prediction / Motion Forecasting
Perception / Stereo Depth Estimation
CVPR2021 Perception / Stereo Depth Estimation 368
Prediction / Motion Forecasting
Perception / Streaming 2D Detection
CARLA Autonomous Driving Challenge CARLA Team, Intel 2023 Planning / CARLA AD Challenge 2.0 -
NeurIPS2022 Planning / CARLA AD Challenge 1.0 19
NeurIPS2021 Planning / CARLA AD Challenge 1.0 -
粤港澳大湾区 (黄埔)国际算法算例大赛 琶洲实验室 2023 感知 / 跨场景单目深度估计 -
感知 / 路侧毫米波雷达标定和目标跟踪 -
2022 感知 / 路侧三维感知算法 -
感知 / 街景图像店面招牌文字识别 -
AI Driving Olympics ETH Zurich, University of Montreal,Motional NeurIP2021 Perception / nuScenes Panoptic 11
ICRA2021 Perception / nuScenes Detection 456
Perception / nuScenes Tracking
Prediction / nuScenes Prediction
Perception / nuScenes LiDAR Segmentation
计图 (Jittor)人工智能算法挑战赛 国家自然科学基金委信息科学部 2021 感知 / 交通标志检测 37
KITTI Vision Benchmark Suite University of Tübingen 2012 Perception / Stereo, Flow, Scene Flow, Depth, Odometry, Object, Tracking, Road, Semantics 5,610

(back to top)

Perception Datasets
Dataset Year Diversity Sensor Annotation Paper
Scenes Hours Region Camera Lidar Other
KITTI 2012 50 6 EU Font-view GPS & IMU 2D BBox & 3D BBox Link
Cityscapes 2016 - - EU Font-view 2D Seg Link
Lost and Found 2016 112 - - Font-view 2D Seg Link
Mapillary 2016 - - Global Street-view 2D Seg Link
DDD17 2017 36 12 EU Front-view GPS & CAN-bus & Event Camera - Link
Apolloscape 2016 103 2.5 AS Front-view GPS & IMU 3D BBox & 2D Seg Link
BDD-X 2018 6984 77 NA Front-view Language Link
HDD 2018 - 104 NA Front-view GPS & IMU & CAN-bus 2D BBox Link
IDD 2018 182 - AS Front-view 2D Seg Link
SemanticKITTI 2019 50 6 EU 3D Seg Link
Woodscape 2019 - - Global 360° GPS & IMU & CAN-bus 3D BBox & 2D Seg Link
DrivingStereo 2019 42 - AS Front-view - Link
Brno-Urban 2019 67 10 EU Front-view GPS & IMU & Infrared Camera - Link
A*3D 2019 - 55 AS Front-view 3D BBox Link
Talk2Car 2019 850 283.3 NA Front-view Language & 3D BBox Link
Talk2Nav 2019 10714 - Sim 360° Language Link
PIE 2019 - 6 NA Front-view 2D BBox Link
UrbanLoco 2019 13 - AS & NA 360° IMU - Link
TITAN 2019 700 - AS Front-view 2D BBox Link
H3D 2019 160 0.77 NA Front-view GPS & IMU - Link
A2D2 2020 - 5.6 EU 360° GPS & IMU & CAN-bus 3D BBox & 2D Seg Link
CARRADA 2020 30 0.3 NA Front-view Radar 3D BBox Link
DAWN 2019 - - Global Front-view 2D BBox Link
4Seasons 2019 - - - Front-view GPS & IMU - Link
UNDD 2019 - - - Front-view 2D Seg Link
SemanticPOSS 2020 - - AS GPS & IMU 3D Seg Link
Toronto-3D 2020 4 - NA 3D Seg Link
ROAD 2021 22 - EU Front-view 2D BBox & Topology Link
Reasonable Crowd 2021 - - Sim Front-view Language Link
METEOR 2021 1250 20.9 AS Front-view GPS Language Link
PandaSet 2021 179 - NA 360° GPS & IMU 3D BBox Link
MUAD 2022 - - Sim 360° 2D Seg& 2D BBox Link
TAS-NIR 2022 - - - Front-view Infrared Camera 2D Seg Link
LiDAR-CS 2022 6 - Sim 3D BBox Link
WildDash 2022 - - - Front-view 2D Seg Link
OpenScene 2023 1000 5.5 AS & NA 360° 3D Occ Link
ZOD 2023 1473 8.2 EU 360° GPS & IMU & CAN-bus 3D BBox & 2D Seg Link
nuScenes 2019 1000 5.5 AS & NA 360° GPS & CAN-bus & Radar & HDMap 3D BBox & 3D Seg Link
Argoverse V1 2019 324k 320 NA 360° HDMap 3D BBox & 3D Seg Link
Waymo 2019 1000 6.4 NA 360° 2D BBox & 3D BBox Link
KITTI-360 2020 366 2.5 EU 360° 3D BBox & 3D Seg Link
ONCE 2021 - 144 AS 360° 3D BBox Link
nuPlan 2021 - 120 AS & NA 360° 3D BBox Link
Argoverse V2 2022 1000 4 NA 360° HDMap 3D BBox Link
DriveLM 2023 1000 5.5 AS & NA 360° Language Link

(back to top)

Mapping Datasets
Dataset Year Diversity Sensor Annotation Paper
Scenes Frames Camera Lidar Type Space Inst. Track
Caltech Lanes 2008 4 1224/1224 PV Link
VPG 2017 - 20K/20K PV - Link
TUsimple 2017 6.4K 6.4K/128K PV Link
CULane 2018 - 133K/133K PV - Link
ApolloScape 2018 235 115K/115K PV Link
LLAMAS 2019 14 79K/100K Front-view Image Laneline PV Link
3D Synthetic 2020 - 10K/10K PV - Link
CurveLanes 2020 - 150K/150K PV - Link
VIL-100 2021 100 10K/10K PV Link
OpenLane-V1 2022 1K 200K/200K 3D Link
ONCE-3DLane 2022 - 211K/211K 3D - Link
OpenLane-V2 2023 2K 72K/72K Multi-view Image Lane Centerline, Lane Segment 3D Link
Prediction and Planning Datasets
Subtask Input Output Evaluation Dataset
Motion Prediction Surrounding Traffic States Spatiotemporal Trajectories of Single/Multiple Vehicle(s) Displacement Error Argoverse
nuScenes
Waymo
Interaction
MONA
Trajectory Planning Motion States for Ego Vehicles, Scenario Cognition and Prediction Trajectories for Ego Vehicles Displacement Error, Safety, Compliance, Comfort nuPlan
CARLA
MetaDrive
Apollo
Path Planning Maps for Road Network Routes Connecting to Nodes and Links Efficiency, Energy Conservation OpenStreetMap
Transportation Networks
DTAlite
PeMS
New York City Taxi Data

OpenScene

The Largest up-to-date 3D Occupancy Forecasting dataset for visual pre-training.

Quick facts:

OpenLane-V2 Update

Flourishing OpenLane-V2 with Standard Definition (SD) Map and Map Elements.

Quick facts: