Skip to content

Commit

Permalink
[Docs] Add BEV-based detection pipeline in NuScenes Dataset tutorial (#…
Browse files Browse the repository at this point in the history
…2672)

* update the part of  in doc of nuScenes dataset

* update nuScenes tutorial

* add alternative bev sample code and necessary description for the nuscenes dataset

* update nuscenes tutorial

* update nuscenes tutorial

* update nuscenes tutorial

* use two subsections to introduce monocular and BEV

* use two subsections to introduce monocular and BEV

* use two subsections to introduce monocular and BEV

* update NuScenes dataset BEV based tutorial

* update NuScenes dataset BEV based tutorial
  • Loading branch information
1uciusy authored Sep 13, 2023
1 parent c04831c commit 74878d1
Show file tree
Hide file tree
Showing 2 changed files with 131 additions and 3 deletions.
66 changes: 65 additions & 1 deletion docs/en/advanced_guides/datasets/nuscenes.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,9 @@ Intensity is not used by default due to its yielded noise when concatenating the

### Vision-Based Methods

A typical training pipeline of image-based 3D detection on nuScenes is as below.
#### Monocular-based

In the NuScenes dataset, for multi-view images, this paradigm usually involves detecting and outputting 3D object detection results separately for each image, and then obtaining the final detection results through post-processing (such as NMS). Essentially, it directly extends monocular 3D detection to multi-view settings. A typical training pipeline of image-based monocular 3D detection on nuScenes is as below.

```python
train_pipeline = [
Expand Down Expand Up @@ -184,6 +186,68 @@ It follows the general pipeline of 2D detection while differs in some details:
- Some data augmentation techniques need to be adjusted, such as `RandomFlip3D`.
Currently we do not support more augmentation methods, because how to transfer and apply other techniques is still under explored.

#### BEV-based

BEV, Bird's-Eye-View, is another popular 3D detection paradigm. It directly takes multi-view images to perform 3D detection, for nuScenes, they are `CAM_FRONT`, `CAM_FRONT_LEFT`, `CAM_FRONT_RIGHT`, `CAM_BACK`, `CAM_BACK_LEFT` and `CAM_BACK_RIGHT`. A basic training pipeline of bev-based 3D detection on nuScenes is as below.

```python
class_names = [
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
train_transforms = [
dict(type='PhotoMetricDistortion3D'),
dict(
type='RandomResize3D',
scale=(1600, 900),
ratio_range=(1., 1.),
keep_ratio=True)
]
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles',
to_float32=True,
num_views=6, ),
dict(type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True,
with_attr_label=False),
# optional, data augmentation
dict(type='MultiViewWrapper', transforms=train_transforms),
# optional, filter object within specific point cloud range
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
# optional, filter object of specific classes
dict(type='ObjectNameFilter', classes=class_names),
dict(type='Pack3DDetInputs', keys=['img', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```

To load multiple view of images, a little modification should be made to the dataset.

```python
data_prefix = dict(
CAM_FRONT='samples/CAM_FRONT',
CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
CAM_BACK='samples/CAM_BACK',
CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
)
train_dataloader = dict(
batch_size=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type="NuScenesDataset",
data_root="./data/nuScenes",
ann_file="nuscenes_infos_train.pkl",
data_prefix=data_prefix,
modality=dict(use_camera=True, use_lidar=False, ),
pipeline=train_pipeline,
test_mode=False, )
)
```

## Evaluation

An example to evaluate PointPillars with 8 GPUs with nuScenes metrics is as follows.
Expand Down
68 changes: 66 additions & 2 deletions docs/zh_cn/advanced_guides/datasets/nuscenes.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,9 @@ train_pipeline = [

### 基于视觉的方法

nuScenes 上基于图像的 3D 检测的典型训练流水线如下。
#### 基于单目方法

在NuScenes数据集中,对于多视角图像,单目检测范式通常由针对每张图像检测和输出 3D 检测结果以及通过后处理(例如 NMS )得到最终检测结果两步组成。从本质上来说,这种范式直接将单目 3D 检测扩展到多视角任务。NuScenes 上基于图像的 3D 检测的典型训练流水线如下。

```python
train_pipeline = [
Expand All @@ -159,7 +161,7 @@ train_pipeline = [
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='mmdet.Resize', img_scale=(1600, 900), keep_ratio=True),
dict(type='mmdet.Resize', scale=(1600, 900), keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(
type='Pack3DDetInputs',
Expand All @@ -176,6 +178,68 @@ train_pipeline = [
- 它需要加载 3D 标注。
- 一些数据增强技术需要调整,例如`RandomFlip3D`。目前我们不支持更多的增强方法,因为如何迁移和应用其他技术仍在探索中。

#### 基于BEV方法

鸟瞰图,BEV(Bird's-Eye-View),是另一种常用的 3D 检测范式。它直接利用多个视角图像进行 3D 检测。对于 NuScenes 数据集而言,这些视角包括前方`CAM_FRONT`、左前方`CAM_FRONT_LEFT`、右前方`CAM_FRONT_RIGHT`、后方`CAM_BACK`、左后方`CAM_BACK_LEFT`、右后方`CAM_BACK_RIGHT`。一个基本的用于 BEV 方法的流水线如下。

```python
class_names = [
'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]
point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]
train_transforms = [
dict(type='PhotoMetricDistortion3D'),
dict(
type='RandomResize3D',
scale=(1600, 900),
ratio_range=(1., 1.),
keep_ratio=True)
]
train_pipeline = [
dict(type='LoadMultiViewImageFromFiles',
to_float32=True,
num_views=6, ),
dict(type='LoadAnnotations3D',
with_bbox_3d=True,
with_label_3d=True,
with_attr_label=False),
# 可选,数据增强
dict(type='MultiViewWrapper', transforms=train_transforms),
# 可选, 筛选特定点云范围内物体
dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
# 可选, 筛选特定类别物体
dict(type='ObjectNameFilter', classes=class_names),
dict(type='Pack3DDetInputs', keys=['img', 'gt_bboxes_3d', 'gt_labels_3d'])
]
```

为了读取多个视角的图像,数据集也应进行相应微调。

```python
data_prefix = dict(
CAM_FRONT='samples/CAM_FRONT',
CAM_FRONT_LEFT='samples/CAM_FRONT_LEFT',
CAM_FRONT_RIGHT='samples/CAM_FRONT_RIGHT',
CAM_BACK='samples/CAM_BACK',
CAM_BACK_RIGHT='samples/CAM_BACK_RIGHT',
CAM_BACK_LEFT='samples/CAM_BACK_LEFT',
)
train_dataloader = dict(
batch_size=4,
persistent_workers=True,
sampler=dict(type='DefaultSampler', shuffle=True),
dataset=dict(
type="NuScenesDataset",
data_root="./data/nuScenes",
ann_file="nuscenes_infos_train.pkl",
data_prefix=data_prefix,
modality=dict(use_camera=True, use_lidar=False, ),
pipeline=train_pipeline,
test_mode=False, )
)
```

## 评估

使用 8 个 GPU 以及 nuScenes 指标评估的 PointPillars 的示例如下
Expand Down

0 comments on commit 74878d1

Please sign in to comment.