This is the offical repository of our recent work 6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-Based Instance Representation Learning, we provide the pose estimation results on the REAL275 testset to evaluate the performance of our method.
More information will be released soon.
- Python 3.6
- PyTorch 1.7.1+cu110
- CUDA 11.2
- OpenCV-python 4.4.0
- Download the Mask R-CNN results, pose predictions by NOCS, NOF, SPD and our 6D-ViT from here
- The pretrained model on the NOCS-REAL dataset is here
unzip -q real_test.zip
ROOT=/path/to/6D-ViT
mkdir $ROOT/results
mv real_test/* $ROOT/results
rmdir real_test
cd $ROOT
python evaluate_mean_real.py
The evaluation results will be generated under the folder $ROOT/results/6D-ViT_results/real_test/
Dataset | Category | 3D50 | 3D75 | 5°2cm | 5°5cm | 10°2cm | 10°5cm | 10°10cm |
REAL275 | Bottle | 0.5766 | 0.5005 | 0.5799 | 0.6318 | 0.7969 | 0.8703 | 0.9752 |
Bowl | 0.9999 | 0.9992 | 0.7874 | 0.8186 | 0.9548 | 0.9914 | 0.9914 | |
Camera | 0.8709 | 0.1917 | 0.0000 | 0.0000 | 0.0014 | 0.0019 | 0.0019 | |
Can | 0.7146 | 0.6996 | 0.5350 | 0.5624 | 0.8573 | 0.9551 | 0.9555 | |
Laptop | 0.8334 | 0.6170 | 0.3383 | 0.4461 | 0.6163 | 0.9217 | 0.9361 | |
Mug | 0.9878 | 0.8577 | 0.0490 | 0.0524 | 0.3166 | 0.3333 | 0.3333 | |
Average | 0.8306 | 0.6443 | 0.3816 | 0.4186 | 0.5906 | 0.6789 | 0.6989 |
If you find this work helpful, please consider citing
@article{zou20226d,
title={6d-vit: Category-level 6d object pose estimation via transformer-based instance representation learning},
author={Zou, Lu and Huang, Zhangjin and Gu, Naijie and Wang, Guoping},
journal={IEEE Transactions on Image Processing},
volume={31},
pages={6907--6921},
year={2022},
publisher={IEEE}
}
Our work is built upon object-deformnet, we thank the authors for releasing their code.