Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVXNet fusion not work (iamge features can not use) #1243

Open
853108389 opened this issue Feb 16, 2022 · 13 comments
Open

MVXNet fusion not work (iamge features can not use) #1243

853108389 opened this issue Feb 16, 2022 · 13 comments
Assignees

Comments

@853108389
Copy link

Help!

When I input diffrent images with same point data from KITTI on MVXNet,The results has no change.
I have tried to find some reason:

Is there are some coding error on MVXnet config ?
in MVXnet config , I can't see any config about image or fusion.

or
In class MVXTwoStageDetector,
the function simple_test can;t use image feats because the property with_img_bbox is always fales?

Why?so, what the meaning of image features ?
looking forward your response

This is MVXNet config

_base_ = ['../_base_/schedules/cosine.py', '../_base_/default_runtime.py']

# model settings
voxel_size = [0.05, 0.05, 0.1]
point_cloud_range = [0, -40, -3, 70.4, 40, 1]

model = dict(
    type='DynamicMVXFasterRCNN',
    img_backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=False),
        norm_eval=True,
        style='caffe'),
    img_neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    pts_voxel_layer=dict(
        max_num_points=-1,
        point_cloud_range=point_cloud_range,
        voxel_size=voxel_size,
        max_voxels=(-1, -1),
    ),
    pts_voxel_encoder=dict(
        type='DynamicVFE',
        in_channels=4,
        feat_channels=[64, 64],
        with_distance=False,
        voxel_size=voxel_size,
        with_cluster_center=True,
        with_voxel_center=True,
        point_cloud_range=point_cloud_range,
        fusion_layer=dict(
            type='PointFusion',
            img_channels=256,
            pts_channels=64,
            mid_channels=128,
            out_channels=128,
            img_levels=[0, 1, 2, 3, 4],
            align_corners=False,
            activate_out=True,
            fuse_out=False)),
    pts_middle_encoder=dict(
        type='SparseEncoder',
        in_channels=128,
        sparse_shape=[41, 1600, 1408],
        order=('conv', 'norm', 'act')),
    pts_backbone=dict(
        type='SECOND',
        in_channels=256,
        layer_nums=[5, 5],
        layer_strides=[1, 2],
        out_channels=[128, 256]),
    pts_neck=dict(
        type='SECONDFPN',
        in_channels=[128, 256],
        upsample_strides=[1, 2],
        out_channels=[256, 256]),
    pts_bbox_head=dict(
        type='Anchor3DHead',
        num_classes=3,
        in_channels=512,
        feat_channels=512,
        use_direction_classifier=True,
        anchor_generator=dict(
            type='Anchor3DRangeGenerator',
            ranges=[
                [0, -40.0, -0.6, 70.4, 40.0, -0.6],
                [0, -40.0, -0.6, 70.4, 40.0, -0.6],
                [0, -40.0, -1.78, 70.4, 40.0, -1.78],
            ],
            sizes=[[0.6, 0.8, 1.73], [0.6, 1.76, 1.73], [1.6, 3.9, 1.56]],
            rotations=[0, 1.57],
            reshape_out=False),
        assigner_per_size=True,
        diff_rad_by_sin=True,
        assign_per_class=True,
        bbox_coder=dict(type='DeltaXYZWLHRBBoxCoder'),
        loss_cls=dict(
            type='FocalLoss',
            use_sigmoid=True,
            gamma=2.0,
            alpha=0.25,
            loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=2.0),
        loss_dir=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.2)),
    # model training and testing settings
    train_cfg=dict(
        pts=dict(
            assigner=[
                dict(  # for Pedestrian
                    type='MaxIoUAssigner',
                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
                    pos_iou_thr=0.35,
                    neg_iou_thr=0.2,
                    min_pos_iou=0.2,
                    ignore_iof_thr=-1),
                dict(  # for Cyclist
                    type='MaxIoUAssigner',
                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
                    pos_iou_thr=0.35,
                    neg_iou_thr=0.2,
                    min_pos_iou=0.2,
                    ignore_iof_thr=-1),
                dict(  # for Car
                    type='MaxIoUAssigner',
                    iou_calculator=dict(type='BboxOverlapsNearest3D'),
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.45,
                    min_pos_iou=0.45,
                    ignore_iof_thr=-1),
            ],
            allowed_border=0,
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        pts=dict(
            use_rotate_nms=True,
            nms_across_levels=False,
            nms_thr=0.01,
            score_thr=0.1,
            min_bbox_size=0,
            nms_pre=100,
            max_num=50)))

as you can see ,There is no config for images .Comapre to imvotenet's Config:

     pts=dict(
            vote_module_cfg=dict(
                in_channels=256,
                vote_per_seed=1,
                gt_per_seed=3,
                conv_channels=(256, 256),
                conv_cfg=dict(type='Conv1d'),
                norm_cfg=dict(type='BN1d'),
                norm_feats=True,
                vote_loss=dict(
                    type='ChamferDistance',
                    mode='l1',
                    reduction='none',
                    loss_dst_weight=10.0)),
            vote_aggregation_cfg=dict(
                type='PointSAModule',
                num_point=256,
                radius=0.3,
                num_sample=16,
                mlp_channels=[256, 128, 128, 128],
                use_xyz=True,
                normalize_xyz=True)),
        img=dict(
            vote_module_cfg=dict(
                in_channels=256,
                vote_per_seed=1,
                gt_per_seed=3,
                conv_channels=(256, 256),
                conv_cfg=dict(type='Conv1d'),
                norm_cfg=dict(type='BN1d'),
                norm_feats=True,
                vote_loss=dict(
                    type='ChamferDistance',
                    mode='l1',
                    reduction='none',
                    loss_dst_weight=10.0)),
            vote_aggregation_cfg=dict(
                type='PointSAModule',
                num_point=256,
                radius=0.3,
                num_sample=16,
                mlp_channels=[256, 128, 128, 128],
                use_xyz=True,
                normalize_xyz=True)),
        loss_weights=[0.4, 0.3, 0.3]),
    img_mlp=dict(
        in_channel=18,
        conv_channels=(256, 256),
        conv_cfg=dict(type='Conv1d'),
        norm_cfg=dict(type='BN1d'),
        act_cfg=dict(type='ReLU')),
    fusion_layer=dict(
        type='VoteFusion',
        num_classes=len(class_names),
        max_imvote_per_pixel=3),
@ZCMax
Copy link
Collaborator

ZCMax commented Feb 18, 2022

if with_img_bbox is true, it means model will output the detection results from image branch. However, MVXNet is not a model which fuses the detection results directly from point cloud branch and image branch (post-process), it uses image feature for fusing the image feature and point future during voxelization. You can see the details in

def extract_pts_feat(self, points, img_feats, img_metas):

@853108389
Copy link
Author

Thank you for your reply.
However, I also have a problem.

When I input diffrent images with same point data from KITTI on MVXNet,the results has no change.
I have checked the code, the feature from image have chagned,but the result have no change.

How can I get the right results?

(the original results)
image

(I have chagned the images to this , but have no effect on the result)
image

@ZCMax
Copy link
Collaborator

ZCMax commented Feb 18, 2022

It's strange, I recommend to check the Intermediate Variable(like the constructed voxel feature), whether they are same or not.

@853108389
Copy link
Author

emmm .... Maybe it's not my problem.
There is no need to modify the code, you can easily reproduce my bug by replacing the pictures in the demo with pictures of the same size.

python demo/multi_modality_demo.py demo/data/kitti/kitti_000008.bin demo/data/kitti/kitti_000008.png demo/data/kitti/kitti_000008_infos.pkl configs/mvxnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py checkpoints/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class_20200621_003904-10140f2d.pth

just replace kitti_000008.png with 'kitti_000009.png' or other pictures

https://github.com/open-mmlab/mmdetection3d/blob/master/docs/zh_cn/demo.md

I look forward to your help

@ZCMax
Copy link
Collaborator

ZCMax commented Feb 18, 2022

We will reproduce the problem ASAP.

@khaledmohamed00
Copy link

yes i encountered the same problem i think the model does not rely on the image feature heavily .

@853108389
Copy link
Author

853108389 commented Feb 24, 2022 via email

@khaledmohamed00
Copy link

@853108389 I think it is a problem .

@853108389
Copy link
Author

We will reproduce the problem ASAP.
Thank you for the fusion model you provided. I hope you can notify me here after you fix this bug

@Chenfanqing
Copy link

Hello, I would like to ask how you got the kitti_000008_infos.pkl file when you run demo.py of mvx-net

Although I know that it is generated based on the annotation information, the official does not seem to give the specific format of this pkl file, or other conversion script etc.

if possible, could you please send me a copy of the pkl file you generated for me , I would be very grateful, this is my qq email: [email protected]. Thank you very much

@mljack
Copy link

mljack commented May 20, 2022

Hello, I would like to ask how you got the kitti_000008_infos.pkl file when you run demo.py of mvx-net

Although I know that it is generated based on the annotation information, the official does not seem to give the specific format of this pkl file, or other conversion script etc.

if possible, could you please send me a copy of the pkl file you generated for me , I would be very grateful, this is my qq email: [email protected]. Thank you very much

@Chenfanqing , follow this doc to convert kitti to pkl: https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/datasets/kitti_det.md

@mhmodayman
Copy link

I know it is quite late, but one way to prove this problem
is to multiply the input image by zero after it is read by the code and convered to numpy_array

x1 = np.array(input_lidar_points) # shape -1, 4
x2 = np.array(input_pixels_image) # shape -1, 3

--> I suggest doing the following
x2 = x2 * 0

this way the input image is an array of zeros

and do another experiment with lidar points array as zeros
x1 = x1 * 0

I think this will clear out this issue

@mhmodayman
Copy link

@Tai-Wang, @VVsssssk, anyway is this issue solved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants