Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gap w/ OpenPCDet for SECOND and PointPillars #648

Open
Divadi opened this issue Jun 16, 2021 · 28 comments
Open

Gap w/ OpenPCDet for SECOND and PointPillars #648

Divadi opened this issue Jun 16, 2021 · 28 comments

Comments

@Divadi
Copy link
Contributor

Divadi commented Jun 16, 2021

Comparing Car moderate 3D strict results from this repository with that of OpenPCDet, there seems to be a 2~3 point difference:
SECOND: 76.91425 (here) vs 78.62 (OpenPCDet)
PointPillars: 73.88344 (here) vs 77.28 (OpenPCDet)

I was wondering if anyone had some clues to why this difference might occur.
Is it the BEV NMS used in this repository vs 3D rotated NMS in OpenPCDet perhaps?

@Divadi Divadi changed the title Gap between OpenPCDet for SECOND and PointPillars Gap w/ OpenPCDet for SECOND and PointPillars Jun 16, 2021
@yezhen17
Copy link
Collaborator

Hi @Divadi , sorry for the late reply. I'm not sure about the reason behind the performance gap, yet the difference between NMS may be possible for causing the gap.

@zhanggefan
Copy link
Contributor

zhanggefan commented Sep 8, 2021

Hi, @Divadi @THU17cyz,

I have been watching this thread for weeks, still no progress? Maybe we could share the information here if you find any potential reason or anything suspicious.

I did a simple comparison between the training configs of PointPillars on kitti used by the 2 codebases and found these differences:

  1. Batch size. OpenPCDet uses 4 samples * 8 while here MMDet3D uses 6 samples * 8
  2. GT object sampling augment. OpenPCDet uses the ground plane to adjust the sampled objects' z location. Besides, for 3 classes detection task, the objects' sampling counts are also different.
  3. Object noise augment. I find no object noise augment in OpenPCDet. BTW, this augmentation seems troublesome and the augmented point cloud has some weird representations like pedestrians floating in the air or vehicles driving under the ground plane.

I am currently doing experiments to align the data pipeline but it is really laborious. I hope somebody could give me a hand.

@yezhen17
Copy link
Collaborator

yezhen17 commented Sep 8, 2021

Hi @zhanggefan. Thank you very much for sharing your discoveries! Recently we have been busy with other lines of work, so we haven't looked into this issue yet. From my point of view, the use of ground planes should boost the performance. Maybe @Tai-Wang can take a look?

@zhanggefan
Copy link
Contributor

zhanggefan commented Sep 8, 2021

@THU17cyz,

I just finished a round of training and takes the ground plane into account during gt sampling (messy experimental code and not ready for PR so please don't ask for it here, hahaha). What surprises me is that the AP score even drops...

But this is not the weirdest.

I switched to the refactored coordinates branch today, and from #913 I know that there is a performance gap of around 1% compared to the master branch.

In one of the experiments (Pointpillars on Kitti 3 classes) I did today I accidentally swapped all of the w and h of the anchor settings, but it improves the final AP score by around 1%. Especially for cyclist class, the AP increases by 4%. I used the training schedule of 12 * 4 with 3090s and got an average AP score of 61.5% (overall moderate strict).

I guess there is still something wrong with the new branch.

Training logs are attached here for your information:

swapped w&h: mAP 61.5
20210908_122709.log

correct one: mAP 60.5
20210908_153654.log

@yezhen17
Copy link
Collaborator

yezhen17 commented Sep 8, 2021

Oh you're really fast lol. I'm not familiar with the effect of the planes, but I'm responsible for the refactored coordinate system, and your finding is really interesting. May you elaborate more on how you switched w and h, say, provide the fraction of modified config?

@yezhen17
Copy link
Collaborator

yezhen17 commented Sep 8, 2021

By the way, did you generate new db_infos for KITTI

@zhanggefan
Copy link
Contributor

zhanggefan commented Sep 8, 2021

By the way, did you generate new db_infos for KITTI

Of course I did, but after all, I had to modify the data convert script and regenerate files so that I can put the plane information into the pickle files.
I checked the data pipeline with the visualizer and everything looked alright.

@zhanggefan
Copy link
Contributor

Oh you're really fast lol. I'm not familiar with the effect of the planes, but I'm responsible for the refactored coordinate system, and your finding is really interesting. May you elaborate more on how you switched w and h, say, provide the fraction of modified config?

The configs are dumped into the head of the log files.

@yezhen17
Copy link
Collaborator

yezhen17 commented Sep 8, 2021

Hmmm sorry I missed the logs viewing from my phone. I see. However I remember using the original anchor config hurts the performance months ago...Please give me some time to look into this.

@zhanggefan
Copy link
Contributor

zhanggefan commented Sep 9, 2021

@THU17cyz, @Tai-Wang

Here are some more discoveries concerning the performance gap compared to OpenPCDet

  1. The training schedule is different. OpenPCDet trains point pillars for only 80e, but MMDet3D trains for 160e. Whats more, OpenPCDet uses max(lr)==0.003, but here in MMDet3D max(lr)==0.01

  2. The PFE parts have a lot of differences. The input channels count of the first linear layer of PFE is 9 in MMDet3D but 10 in OpenPCDet. Besides, setting the legacy flag of PillarFeatureNet to False boost the mAP score by 1.4. The legacy flag is really dumb because it produces redundant input features to the network. When set legacy flag to True (default value), the input features would always satisfy:
    features[..., [0,1]] == features[..., [-2,-1]]

@Divadi
Copy link
Contributor Author

Divadi commented Sep 9, 2021

A couple other points I've noticed before:
Both:
For optimization, the Adam betas are slightly different (0.9, 0.99) for OpenPCDet vs (0.95, 0.99) here.
The post-processing is slightly different. iirc, OpenPCDet uses 3D rotated NMS while mm3d uses BEV rotated NMS. Further, NMS pre->post is 4096->500 for OpenPCDet, it's 100->50 here. (I have actually tested this myself before, but results were inconclusive. I think Ped improved but Cyc declined or something along those lines. I felt it was difficult to do a complete comparison because Ped & Cyc varies quite a bit between runs.) For reference, I had ported "iou3d_nms" from OpenPCDet into mm3d and modified the top of "mmdet3d/core/post_processing/box3d_nms.py" to be:

import numba
import numpy as np
import torch

from mmdet3d.ops.iou3d.iou3d_utils import nms_gpu, nms_normal_gpu
from mmdet3d.ops.iou3d_nms import pcdet_nms_gpu


def box3d_multiclass_nms(mlvl_bboxes,
                         mlvl_bboxes_for_nms,
                         mlvl_scores,
                         score_thr,
                         max_num,
                         cfg,
                         mlvl_dir_scores=None,
                         mlvl_attr_scores=None,
                         mlvl_bboxes2d=None):
    """Multi-class nms for 3D boxes.
    Args:
        mlvl_bboxes (torch.Tensor): Multi-level boxes with shape (N, M).
            M is the dimensions of boxes.
        mlvl_bboxes_for_nms (torch.Tensor): Multi-level boxes with shape
            (N, 5) ([x1, y1, x2, y2, ry]). N is the number of boxes.
        mlvl_scores (torch.Tensor): Multi-level boxes with shape
            (N, C + 1). N is the number of boxes. C is the number of classes.
        score_thr (float): Score thredhold to filter boxes with low
            confidence.
        max_num (int): Maximum number of boxes will be kept.
        cfg (dict): Configuration dict of NMS.
        mlvl_dir_scores (torch.Tensor, optional): Multi-level scores
            of direction classifier. Defaults to None.
        mlvl_attr_scores (torch.Tensor, optional): Multi-level scores
            of attribute classifier. Defaults to None.
        mlvl_bboxes2d (torch.Tensor, optional): Multi-level 2D bounding
            boxes. Defaults to None.
    Returns:
        tuple[torch.Tensor]: Return results after nms, including 3D \
            bounding boxes, scores, labels, direction scores, attribute \
            scores (optional) and 2D bounding boxes (optional).
    """
    # added new!
    if cfg.get('use_pcdet_nms_gpu', False):
        # This is 3D, rotated, class agnostic.
        # assumes that non-sigmoid, so ignore the final background prob
        max_class_labels = mlvl_scores[:, :-1].argmax(dim=1)
        mlvl_max_class_scores = mlvl_scores[torch.arange(mlvl_scores.shape[0]), max_class_labels]
        nms_selected, _ = pcdet_nms_gpu(mlvl_bboxes, mlvl_max_class_scores, score_thr)
        bboxes = mlvl_bboxes[nms_selected]
        scores = mlvl_max_class_scores[nms_selected]
        labels = max_class_labels[nms_selected]

        if mlvl_dir_scores is not None:
            mlvl_dir_scores = mlvl_dir_scores[nms_selected]
        if mlvl_attr_scores is not None:
            mlvl_attr_scores = mlvl_attr_scores[nms_selected]
        if mlvl_bboxes2d is not None:
            mlvl_bboxes2d = mlvl_bboxes2d[nms_selected]
        
        if bboxes.shape[0] > max_num:
            _, inds = scores.sort(descending=True)
            inds = inds[:max_num]
            bboxes = bboxes[inds, :]
            labels = labels[inds]
            scores = scores[inds]

            if mlvl_dir_scores is not None:
                mlvl_dir_scores = mlvl_dir_scores[inds]
            if mlvl_attr_scores is not None:
                mlvl_attr_scores = mlvl_attr_scores[inds]
            if mlvl_bboxes2d is not None:
                mlvl_bboxes2d = mlvl_bboxes2d[inds]

        results = (bboxes, scores, labels)
        if mlvl_dir_scores is not None:
            results = results + (mlvl_dir_scores, )
        if mlvl_attr_scores is not None:
            results = results + (mlvl_attr_scores, )
        if mlvl_bboxes2d is not None:
            results = results + (mlvl_bboxes2d, )

        return results

Other differences:
PointPillars:
OpenPCDet uses grad norm clip 10, this repository uses 35

SECOND:
IoU Matching for Ped & Cyc are different - it's 0.2 ~ 0.35 here. For OpenPCDet it's 0.45 ~ 0.6

I'll keep posting as I remember/find things

@Divadi
Copy link
Contributor Author

Divadi commented Sep 9, 2021

@zhanggefan Did you try an experiment without object noise?

@zhanggefan
Copy link
Contributor

@Divadi I tried it today but found no improvement.

But here is the good news

The config for the anchor generator is absolutely wrong!!! The centers of the generated anchors turn out to have wrong stride! And I believe it is possibly a day zero bug. @THU17cyz @Tai-Wang

[0, -39.68, -0.6, 70.4, 39.68, -0.6],
[0, -39.68, -0.6, 70.4, 39.68, -0.6],
[0, -39.68, -1.78, 70.4, 39.68, -1.78],

@zhanggefan
Copy link
Contributor

I just started an experiment to examine how much impact this bug has on the result.

The correct config should be

ranges=[
    [0.16, -39.52, -0.6, 68.96, 39.52, -0.6],
    [0.16, -39.52, -0.6, 68.96, 39.52, -0.6],
    [0.16, -39.52, -1.78, 68.96, 39.52, -1.78],
]

It shows that for ranged anchors the anchors' center bias w.r.t the feature map positions is up to 1.44 meters.

It must have introduced some sort of feature-prior misalignment and thus hurts the feature extractor's performance, and influences the anchor-gt assignment as well.

@yezhen17
Copy link
Collaborator

Wow, great observations @Divadi , @zhanggefan ! I'm not very familiar with the anchors, and now I have enrolled in graduate school and cannot maintain MMDet3D like before : ( But I will still keep an eye on this thread and add my opinions!

@zhanggefan
Copy link
Contributor

zhanggefan commented Sep 11, 2021

Update:

  1. Best score of pointpillars on kitti yet:

[email protected]
[email protected]
[email protected]
[email protected]

  • There is still a gap of around 0.6 compared to the score from OpenPCDet
  • modifications:
    • model
      • correct the anchor config (this is critical!)
      • set PillarFeatureNet's legacy flags to False
      • refactor code of PillarFeatureNet's and add points' z diff to pillar center z as an additional feature channel, just like what HardVFE does
    • data pipeline
      • disable ObjectNoise
      • use the ground plane to adjust z of sampled gt objects
      • sample more (10->15) gt objects for all classes
      • lower the minimum points filtering threshold from 10 to 5
    • schedule
      • 12 samples per GPU with 4*3090s
  1. New discovery:

The next experiment I am planning to do is aligning the behavior of the assigners by setting assign_per_class to true in the anchor head's configuration.

@Divadi
Copy link
Contributor Author

Divadi commented Sep 11, 2021

With reference to the anchor offset - then does this issue exist over the entire repository?
I notice now that OpenPCDet actually had a similar problem "align_center"
https://github.com/open-mmlab/OpenPCDet/blob/c9d31d393acaae34d74cb03bd6ccff9976d3d1f3/pcdet/models/dense_heads/target_assigner/anchor_generator.py#L28
However, in their configs, for backward compatibility, they leave it as false
https://github.com/open-mmlab/OpenPCDet/blob/c9d31d393acaae34d74cb03bd6ccff9976d3d1f3/tools/cfgs/kitti_models/pointpillar.yaml#L87
open-mmlab/OpenPCDet#272
So even without this tweak, there is a performance gap?

@zhanggefan
Copy link
Contributor

@Divadi
The anchor config error exists only in the
https://github.com/open-mmlab/mmdetection3d/blob/2efdb6b6d94fb7ce944e83f743b98b1563e96fee/configs/_base_/models/hv_pointpillars_secfpn_kitti.py

The error only affects the results of pointpillars on kitti, including both 3 classes and car-only results.

[0, -39.68, -0.6, 70.4, 39.68, -0.6],
[0, -39.68, -0.6, 70.4, 39.68, -0.6],
[0, -39.68, -1.78, 70.4, 39.68, -1.78],

It is not just the center misalignment by half feature map stride. The value 70.4 is simply wrong. I believe the author just copied and pasted the existing configs from SECOND and forgot to adjust some of the values according to the different voxelization range for pointpillars. The voxelization range for SECOND is [0, -40, -3, 70.4, 40, 1], different from the range [0, -39.68, -3, 69.12, 39.68, 1] for pointpillars.

The OpenPCDet's config does introduce center misalignment as well, but the maximum misalignment bias is only 0.16 m, which is way smaller than the bias of 1.44 m we have here in MMDet3D.

I just finished the experiment of "assignment behavior aligning" and the gap is finally gone. The best result of pointpillars on kitti now is:

[email protected]
[email protected]
[email protected]
[email protected]

20210912_053007.log

@Divadi
Copy link
Contributor Author

Divadi commented Sep 12, 2021

Ah, I understand - it's the 69.12 vs 70.4 for PointPillars

@Tai-Wang
Copy link
Member

Hi @zhanggefan @Divadi , awesome study! Sorry for the little late reply. The overall study has been really exhaustive, so let me conclude and share some experiences.

  • About the gap between these two implementations:
    Our implementation should follow the originally released PointPillars version, so you can find many similar settings (including those mentioned by @zhanggefan ). I think there must be better hyperparameters as this is a typical case also for other datasets and models, so we can improve the current version with your valuable experiences.

  • About some tricks that possibly do not work
    From my experience, adding the ground plane information brings really marginal gains, so this can be omitted. ObjectNoise does not work in most cases (other datasets), but it can be sometimes useful in the KITTI experiments.

  • About the legacy issue of PillarFeatureNet
    Turning on the legacy option actually violates the original implementation, because we do not want to make inplace operations while computing the center features. Set it to False should be correct. It is a little interesting that previous experiments seem to show that setting legacy to True can yield better performance, but now the results show the opposite.

  • Incorrect anchor settings
    It seems to be a bug. It would be fixed ASAP.

@zhanggefan
Copy link
Contributor

Hi, @Tai-Wang
Any progress? Looking forward to the release of new baseline scores!

@Tai-Wang
Copy link
Member

Hi, @zhanggefan
We are asking our colleagues to help follow up and update the models and benchmarks (I am busy with the code release of our latest monocular 3D detector and my own research recently), but it cannot be very quick because we do not have enough resources (as much as in the previous 3 months) to deal with all the issues. The good news is that this is one of the issues that are given the highest priority. We will post the results here ASAP if there are any.

@ZCMax ZCMax self-assigned this Sep 14, 2021
@Divadi
Copy link
Contributor Author

Divadi commented Sep 23, 2021

Another point I just found was that for cut & paste augmentation, the "filter by difficulty" does nothing for KITTI.


Is where difficulty is stored in the GT database. However,
def get_ann_info(self, index):

in kitti_dataset does not store "difficulty" in the returned dict, so all objects are given difficulty "0", hence nothing being filtered for cut and paste. This is unlike openpcdet where objects with difficulty "-1" are not used for cut & paste.

@yezhen17
Copy link
Collaborator

yezhen17 commented Sep 27, 2021

@THU17cyz,

I just finished a round of training and takes the ground plane into account during gt sampling (messy experimental code and not ready for PR so please don't ask for it here, hahaha). What surprises me is that the AP score even drops...

But this is not the weirdest.

I switched to the refactored coordinates branch today, and from #913 I know that there is a performance gap of around 1% compared to the master branch.

In one of the experiments (Pointpillars on Kitti 3 classes) I did today I accidentally swapped all of the w and h of the anchor settings, but it improves the final AP score by around 1%. Especially for cyclist class, the AP increases by 4%. I used the training schedule of 12 * 4 with 3090s and got an average AP score of 61.5% (overall moderate strict).

I guess there is still something wrong with the new branch.

Training logs are attached here for your information:

swapped w&h: mAP 61.5
20210908_122709.log

correct one: mAP 60.5
20210908_153654.log

Hi @zhanggefan , I went through the refactoring again and it seems the anchors should be [l, w, h] instead of [w, l, h] logically... Maybe we can ask someone to work on this and figure it out.

@zhanggefan
Copy link
Contributor

@THU17cyz,
I just finished a round of training and takes the ground plane into account during gt sampling (messy experimental code and not ready for PR so please don't ask for it here, hahaha). What surprises me is that the AP score even drops...

But this is not the weirdest.

I switched to the refactored coordinates branch today, and from #913 I know that there is a performance gap of around 1% compared to the master branch.
In one of the experiments (Pointpillars on Kitti 3 classes) I did today I accidentally swapped all of the w and h of the anchor settings, but it improves the final AP score by around 1%. Especially for cyclist class, the AP increases by 4%. I used the training schedule of 12 * 4 with 3090s and got an average AP score of 61.5% (overall moderate strict).
I guess there is still something wrong with the new branch.
Training logs are attached here for your information:
swapped w&h: mAP 61.5
20210908_122709.log
correct one: mAP 60.5
20210908_153654.log

Hi @zhanggefan , I went through the refactoring again and it seems the anchors should be [l, w, h] instead of [w, l, h] logically... Maybe we can ask someone to work on this and figure it out.

Yep, [l, w, h] is absolutely logically correct. I think the weird score improvement might relate to the config bug. I am not sure either.

But anyway, I think the problem is solved after the config bug is fixed. The best score I got so far with point pillars on Kitti is 64.9 Overall.

BTW, taking the behavior of the stride-conv into account, the best config for anchor range should be:

ranges=[
    [0.08, -39.60, -0.6, 68.88, 39.44, -0.6],
    [0.08, -39.60, -0.6, 68.88, 39.44, -0.6],
    [0.08, -39.60, -1.78, 68.88, 39.44, -1.78],
]

@ZCMax
Copy link
Collaborator

ZCMax commented Dec 29, 2021

@zhanggefan Hello, Did you do these modifications on the dev branch? or on the master branch?

@zhanggefan
Copy link
Contributor

@ZCMax Hi, I did those on the dev branch.

@Xiangxu-0103
Copy link
Collaborator

Xiangxu-0103 commented Dec 31, 2021

@ZCMax @zhanggefan Hi, I think the anchor range has a little difference compared to OpenPCDet.
The following code is the anchor generate provided by OpenPCDet

# align_center = False
# self.anchor_range = [0, -39.68, -3, 69.12, 39.68, 1]
# grid_size = [216, 248]
# anchor_height = -1.78  # car for example

if align_center:
    x_stride = (self.anchor_range[3] - self.anchor_range[0]) / grid_size[0]
    y_stride = (self.anchor_range[4] - self.anchor_range[1]) / grid_size[1]
    x_offset, y_offset = x_stride / 2, y_stride / 2
else:
    x_stride = (self.anchor_range[3] - self.anchor_range[0]) / grid_size[0]
    y_stride = (self.anchor_range[4] - self.anchor_range[1]) / grid_size[1]
    x_offset, y_offset = 0, 0
x_shifts = torch.arange(self.anchor_range[0] + x_offset, self.anchor_range[3] + 1e-5, step=x_stride, dtype=torch.float32).cuda()
y_shifts = torch.arange(self.anchor_range[1] + y_offset, self.anchor_range[4] + 1e-5, step=y_stride, dtype=torch.float32).cuda()
z_shifts = x_shifts.new_tensor(anchor_height)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants