Most recent 6D object pose methods use 2D optical flow to refine their results. However, the general optical flow methods typically do not consider the target’s 3D shape information during matching, making them less effective in 6D object pose estimation. In this work, we propose a shape-constraint recurrent matching framework for 6D ob- ject pose estimation.
Figure 1. Different pose refinement paradigms. (a) Most pose refinement methods rely on a recurrent architecture to estimate dense 2D flow between the rendered image I1 and the real input image I2, based on a dynamically-constructed correlation map according to the flow results of the previous iteration. After the convergence of the flow network and lifting the 2D flow to a 3D-to-2D correspondence field, they use PnP solvers to compute a new refined pose. This strategy, however, has a large matching space for every pixel in constructing correlation maps, and optimizes a surrogate matching loss does not reflect the final 6D pose estimation tasl. (b) By contrast, we propose optimizing the pose and flow simultaneously in an end-to-end recurrent framework with the guidance of the target's 3D shape. We impose a shape constraint on the correlation map construction by forcing the construction to comply with the target’s 3D shape, which reduces the matching space significantly. Furthermore, we propose learning the object pose based on the current flow prediction, which, in turn, helps the flow prediction and yields an end-to-end system for object pose Figure 3. Overview of our shape-constraint recurrent framework. After building a 4D correlation volume between the rendered image and the input target image, we use GRU to predict an intermediate flow, based on the predicted flow Fk−1 and the hidden state hk−1 of GRU from the previous iteration. We then use a pose regressor to predict the relative pose ∆Pk based on the intermediate flow, which is used to update the previous pose estimation Pk−1. Finally, we compute a pose-induced flow based on the displacement of 2D reprojection between the initial pose and the currently estimated pose Pk . We use this pose-induced flow to index the correlation map for the following iterations, which reduces the matching space significantly. Here we show the flow and its corresponding warp results in the dashed boxes. Note how the intermediate flow does not preserve the shape of the target, but the pose-induced flow does.This code has been tested on a ubuntu 18.04
server with CUDA 11.3
- Install necessary packages by
pip install -r requirements.txt
- Install
pytorch3d
by building this pytorch3d project
- Download YCB-V dataset from the BOP website and place it under the
data/ycbv
directory. - Download image lists and place them under the
data/ycbv/image_lists
directory. - Download PoseCNN initial pose and place it under
data/initial_poses/ycbv_posecnn
directory.
- Download the RAFT pretrained model from mmflow and convert the checkpoint.
python tools/mmflow_ckpt_converter.py --model_url https://download.openmmlab.com/mmflow/raft/raft_8x2_100k_flyingthings3d_400x720.pth
- Replace the
_base_
in theconfigs/refine_models/scflow.py
with different training setting inconfigs/refine_datasets
. - Use
train.py
.python train.py --config configs/refine_models/scflow.py
Evaluate the performance.
python test.py --config configs/refine_models/scflow.py --checkpoint *** --eval
Save the results.
python test.py --config configs/refine_models/scflow.py --checkpoint *** --format-only --save-dir ***
We put the pretrained models under different training settings at here.
If you find our project is helpful, please cite:
@inproceedings{yang2023scflow,
title={Shape-Constraint Flow for 6D Object Pose Estimation},
author={Yang, Hai and Rui, Song and Jiaojiao, Li and Yinlin, Hu},
booktitle={Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
year={2023}}
We build this project based on mmflow, GDR-Net, and PFA. We thank the authors for their great code repositories.