From f4e635a0eb299a6f50a153b00043e008d3e62fab Mon Sep 17 00:00:00 2001
From: Toefinder <43576719+Toefinder@users.noreply.github.com>
Date: Wed, 21 Dec 2022 18:48:19 +0800
Subject: [PATCH 1/6] Fix dep issue conda create env for glumpy
Previously, running `conda create env` with the environment.yaml file
generates ModuleNotFoundError for Cython even though Cython is listed
just before glumpy in pip dependency list. Increasing glumpy version
solves this issue.
---
environment.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/environment.yaml b/environment.yaml
index 70605b5..3e5be90 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -45,7 +45,7 @@ dependencies:
- pypng==0.0.20
- PyOpenGL==3.1.0
- Cython==0.29.21
- - glumpy==1.1.0
+ - glumpy==1.2.0
- ipdb==0.12.3
- colorama==0.4.3
- scikit-video==1.1.11
From 1e6076d6c40730e60e23c8f416d21b85c3bdd8f3 Mon Sep 17 00:00:00 2001
From: Toefinder <43576719+Toefinder@users.noreply.github.com>
Date: Wed, 21 Dec 2022 19:52:58 +0800
Subject: [PATCH 2/6] Reformat README based on markdownlint suggestions
Fix broken cross-section links. Reformat based on suggestions
from vscode markdownlint suggestions: clearer annotations for block
codes and better spacing. Also fix some typos.
---
README.md | 248 +++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 171 insertions(+), 77 deletions(-)
diff --git a/README.md b/README.md
index 34d4c8c..790412b 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,7 @@ ECCV: European Conference on Computer Vision, 2020
[Paper]
[Project page]
[Video (1 min)]
-[Video (10 min)]
+[Video (10 min)]
[Slides]
@@ -24,11 +24,11 @@ Winner of the
-
# Citation
+
If you use this code in your research, please cite the paper:
-```
+```bibtex
@inproceedings{labbe2020,
title= {CosyPose: Consistent multi-view multi-object 6D pose estimation}
author={Y. {Labbe} and J. {Carpentier} and M. {Aubry} and J. {Sivic}},
@@ -37,8 +37,11 @@ year={2020}}
```
# News
+
- CosyPose is the winning method in the [BOP challenge 2020](https://bop.felk.cvut.cz/challenges/) (5 awards in total, including best overall method and best RGB-only method) ! All the code and models used for the challenge are available in this repository.
-- We participate in the [BOP challenge 2020](https://bop.felk.cvut.cz/challenges/bop-challenge-2020/). Results are available on the public [leaderboard](https://bop.felk.cvut.cz/leaderboards/) for 7 pose estimation benchmarks. We release 2D detection models (MaskRCNN) and 6D pose estimation models (coarse+refiner) used on each dataset.
+- We participate in the [BOP challenge 2020](https://bop.felk.cvut.cz/challenges/bop-challenge-2020/).
+Results are available on the public [leaderboard](https://bop.felk.cvut.cz/leaderboards/) for 7 pose estimation benchmarks.
+We release 2D detection models (MaskRCNN) and 6D pose estimation models (coarse+refiner) used on each dataset.
- The paper is available on arXiv and full code is released.
- Our paper on CosyPose is accepted at ECCV 2020.
@@ -47,41 +50,47 @@ year={2020}}
# Table of content
+
- [Overview](#overview)
- [Installation](#installation)
- [Downloading and preparing data](#downloading-and-preparing-data)
- [Note on GPU parallelization](#note-on-gpu-parallelization)
- [Reproducing single-view results](#reproducing-single-view-results)
- [Training the single-view 6D pose estimation models](#training-the-single-view-6D-pose-estimation-models)
- - [Synthetic data generation](#synthetic-data-generation)
+ - [Synthetic data generation script](#synthetic-data-generation-script)
- [Training script](#training-script)
- [Reproducing multi-view results](#reproducing-multi-view-results)
- [Using CosyPose in a custom scenario](#using-cosypose-in-a-custom-scenario)
- [BOP20 models and results](#bop20-models-and-results)
# Overview
+
This repository contains the code for the full CosyPose approach, including:
-### Single-view single-object 6D pose estimator
+
+## Single-view single-object 6D pose estimator
+
![Single view predictions](images/example_predictions.png)
- Given an RGB image and a 2D bounding box of an object with known 3D model, the 6D pose estimator predicts the full 6D pose of the object with respect to the camera. Our method is inspired from DeepIM with several simplications and technical improvements. It is fully implemented in pytorch and achieve single-view state-of-the-art on YCB-Video and T-LESS. We provide pre-trained models used in our experiments on both datasets. We make the training code that we used to train them available. It can be parallelized on multiple GPUs and multiple nodes.
+ Given an RGB image and a 2D bounding box of an object with known 3D model, the 6D pose estimator predicts the full 6D pose of the object with respect to the camera. Our method is inspired by DeepIM with several simplifications and technical improvements. It is fully implemented in pytorch and achieve single-view state-of-the-art on YCB-Video and T-LESS. We provide pre-trained models used in our experiments on both datasets. We make the training code that we used to train them available. It can be parallelized on multiple GPUs and multiple nodes.
+
+## Synthetic data generation
-### Synthetic data generation
![Synthetic images](images/synthetic_images.png)
-The single-view 6D pose estimation models are trained on a mix of synthetic and real images. We provide the code for generating the additionnal synthetic images.
+The single-view 6D pose estimation models are trained on a mix of synthetic and real images. We provide the code for generating the additional synthetic images.
+
+## Multi-view multi-object scene reconstruction
-### Multi-view multi-object scene reconstruction
![Multiview](images/multiview.png)
-Single-view object-level reconstruction of a scene often fails because of detection mistakes, pose estimation errors and occlusions; which makes it inpractical for real applications. Our multi-view approach, CosyPose, addresseses these single-view limitations and helps improving 6D pose accuracy by leveraging information from multiple cameras with unknown positions. We provide the full code, including robust object-level multi-view matching and global scene refinement. The method is agnostic to the 6D pose estimator used, and can therefore be combined with many other existing single-view object pose estimation method to solve problems on other datasets, or in real scenarios. We provide a utility for running CosyPose given a set of input 6D object candidates in each image.
+Single-view object-level reconstruction of a scene often fails because of detection mistakes, pose estimation errors and occlusions; which makes it impractical for real applications. Our multi-view approach, CosyPose, addresses these single-view limitations and helps improving 6D pose accuracy by leveraging information from multiple cameras with unknown positions. We provide the full code, including robust object-level multi-view matching and global scene refinement. The method is agnostic to the 6D pose estimator used, and can therefore be combined with many other existing single-view object pose estimation method to solve problems on other datasets, or in real scenarios. We provide a utility for running CosyPose given a set of input 6D object candidates in each image.
+## BOP challenge 2020: single-view 2D detection + 6D pose estimation models
-### BOP challenge 2020: single-view 2D detection + 6D pose estimation models
![BOP](images/bop_datasets.png)
We used our {coarse+refinement} single-view 6D pose estimation method in the [BOP challenge 2020](https://bop.felk.cvut.cz/challenges/bop-challenge-2020/). In addition, we trained a MaskRCNN detector (torchvision's implementation) on each of the 7 core datasets (LM-O, T-LESS, TUD-L, IC-BIN, ITODD, HB, YCB-V). We provide 2D detectors and 6D pose estimation models for these datasets. All training (including 2D detector), inference and evaluation code are available in this repository. It can be easily used for another dataset in the BOP format.
-
# Installation
-```
+
+```sh
git clone --recurse-submodules https://github.com/ylabbe/cosypose.git
cd cosypose
conda env create -n cosypose --file environment.yaml
@@ -89,57 +98,69 @@ conda activate cosypose
git lfs pull
python setup.py install
```
+
The installation may take some time as several packages must be downloaded and installed/compiled. If you plan to change the code, run `python setup.py develop`.
Notes:
-- We use the [bop_toolkit](https://github.com/thodan/bop_toolkit) to compute some evaluation metrics on T-LESS. To ensure reproducibility, we use our [own fork](https://github.com/ylabbe/bop_toolkit_cosypose) of the repository. It is downloaded in `deps/`.
+- We use the [bop_toolkit](https://github.com/thodan/bop_toolkit) to compute some evaluation metrics on T-LESS. To ensure reproducibility, we use our [own fork](https://github.com/ylabbe/bop_toolkit_cosypose) of the repository. It is downloaded in `deps/`.
# Downloading and preparing data
+
Click for details...
All data used (datasets, models, results, ...) are stored in a directory `local_data` at the root of the repository. Create it with `mkdir local_data` or use a symlink if you want the data to be stored at a different place. We provide the utility `cosypose/scripts/download.py` for downloading required data and models. All of the files can also be [downloaded manually](https://drive.google.com/drive/folders/1JmOYbu1oqN81Dlj2lh6NCAMrC8pEdAtD?usp=sharing).
## BOP Datasets
+
For both T-LESS and YCB-Video, we use the datasets in the [BOP format](https://bop.felk.cvut.cz/datasets/). If you already have them on your disk, place them in `local_data/bop_datasets`. Alternatively, you can download it using :
-```
+
+```sh
python -m cosypose.scripts.download --bop_dataset=ycbv
python -m cosypose.scripts.download --bop_dataset=tless
```
-Additionnal files that contain informations about the datasets used to fairly compare with prior works on both datasets.
-```
+Additional files that contain information about the datasets used to fairly compare with prior works on both datasets.
+
+```sh
python -m cosypose.scripts.download --bop_extra_files=ycbv
python -m cosypose.scripts.download --bop_extra_files=tless
```
We use [pybullet](https://pybullet.org/wordpress/) for rendering images which requires object models to be provided in the URDF format. We provide converted URDF files, they can be downloaded using:
-```
+
+```sh
python -m cosypose.scripts.download --urdf_models=ycbv
python -m cosypose.scripts.download --urdf_models=tless.cad
```
In the BOP format, the YCB objects `002_master_chef_can` and `040_large_marker` are considered symmetric, but not by previous works such as PoseCNN, PVNet and DeepIM. To ensure a fair comparison (using ADD instead of ADD-S for ADD-(S) for these objects), these objects must *not* be considered symmetric in the evaluation. To keep the uniformity of the models format, we generate a set of YCB objects `models_bop-compat_eval` that can be used to fairly compare our approach against previous works. You can download them directly:
-```
+
+```sh
python -m cosypose.scripts.download --ycbv_compat_models
```
Notes:
+
- The URDF files were obtained using these commands (requires `meshlab` to be installed):
-```
-python -m cosypose.scripts.convert_models_to_urdf --models=ycbv
-python -m cosypose.scripts.convert_models_to_urdf --models=tless.cad
-```
+
+ ```sh
+ python -m cosypose.scripts.convert_models_to_urdf --models=ycbv
+ python -m cosypose.scripts.convert_models_to_urdf --models=tless.cad
+ ```
+
- Compatibility models were obtained using the following script:
-```
-python -m cosypose.scripts.make_ycbv_compat_models
-```
-## Pre-trained models
+ ```sh
+ python -m cosypose.scripts.make_ycbv_compat_models
+ ```
+
+## Pre-trained models for single-view estimator
+
The pre-trained models of the single-view pose estimator can be downloaded using:
-```
+```sh
# YCB-V Single-view refiner
python -m cosypose.scripts.download --model=ycbv-refiner-finetune--251020
@@ -153,8 +174,10 @@ python -m cosypose.scripts.download --model=tless-refiner--585928
```
## 2D detections
+
To ensure a fair comparison with prior works on both datasets, we use the same detections as DeepIM (from PoseCNN) on YCB-Video and the same as Pix2pose (from a RetinaNet model) on T-LESS. Download the saved 2D detections for both datasets using
-```
+
+```sh
python -m cosypose.scripts.download --detections=ycbv_posecnn
# SiSo detections: 1 detection with highest per score per class per image on all images
@@ -169,88 +192,109 @@ python -m cosypose.scripts.download --detections=tless_pix2pose_retinanet_vivo_a
If you are interested in re-training a detector, please see the BOP 2020 section.
-
Notes:
+
- The PoseCNN detections (and coarse pose estimates) on YCB-Video were extracted and converted from [these PoseCNN results](https://github.com/yuxng/YCB_Video_toolbox/blob/master/results_PoseCNN_RSS2018.zip).
- The Pix2pose detections were extracted using [pix2pose's](https://github.com/kirumang/Pix2Pose) code. We used the detection model from their paper, see [here](https://github.com/kirumang/Pix2Pose#download-pre-trained-weights). For the ViVo detections, their code was slightly modified. The code used to extract detections can be found [here](https://github.com/ylabbe/pix2pose_cosypose).
# Note on GPU parallelization
+
Click for details...
Training and evaluation code can be parallelized across multiple gpus and multiple machines using vanilla `torch.distributed`. This is done by simply starting multiple processes with the same arguments and assigning each process to a specific GPU via `CUDA_VISIBLE_DEVICES`. To run the processes on a local machine or on a SLUMR cluster, we use our own utility [job-runner](https://github.com/ylabbe/job-runner) but other similar tools such as [dask-jobqueue](https://github.com/dask/dask-jobqueue) or [submitit](https://github.com/facebookincubator/submitit) could be used. We provide instructions for single-node multi-gpu training, and for multi-gpu multi-node training on a SLURM cluster.
## Single gpu on a single node
-```
+
+```sh
# CUDA ID of GPU you want to use
export CUDA_VISIBLE_DEVICES=0
python -m cosypose.scripts.example_multigpu
```
+
where `scripts.example_multigpu` can be replaced by `scripts.run_pose_training` or `scripts.run_cosypose_eval` (see below for usage of training/evaluation scripts).
## Configuration of `job-runner` for multi-gpu usage
+
Change the path to the code directory, anaconda location and specify a temporary directory for storing job logs by modifying `job-runner-config.yaml'. If you have access to a SLURM cluster, specify the name of the queue, it's specifications (number of GPUs/CPUs per node) and the flags you typically use in a slurm script. Once you are done, run:
-```
+
+```sh
runjob-config job-runner-config.yaml
```
## Multi-gpu on a single node
-```
+
+```sh
# CUDA IDS of GPUs you want to use
export CUDA_VISIBLE_DEVICES=0,1
runjob --ngpus=2 --queue=local python -m cosypose.scripts.example_multigpu
```
+
The logs of the first process will be printed. You can check the logs of the other processes in the job directory.
## On a SLURM cluster
-```
+
+```sh
runjob --ngpus=8 --queue=gpu_p1 python -m cosypose.scripts.example_multigpu
```
+
# Reproducing single-view results
+
Click for details...
## YCB-Video
-```
+
+```sh
python -m cosypose.scripts.run_cosypose_eval --config ycbv
```
+
This will run the inference and evaluation on YCB-Video. We use our own implementation of the evaluation. We have checked that it matches the results from the original [matlab implementation](https://github.com/yuxng/YCB_Video_toolbox) for the AUC of ADD-S and AUC of ADD(-S) metrics. For example, you can see that the PoseCNN results are similar to the ones reported in the PoseCNN/DeepIM paper:
-```
+
+```text
PoseCNN/AUC of ADD(-S): 0.613
```
The YCB-Video results and metrics can be downloaded directly:
-```
+
+```sh
python -m cosypose.scripts.download --result_id=ycbv-n_views=1--5154971130
```
## T-LESS
-```
+
+```sh
python -m cosypose.scripts.run_cosypose_eval --config tless-siso
```
+
This will run inference on the entire T-LESS dataset and print some metrics but not e_vsd<0.3 which is not supported in our code.
The results can also be downloaded:
-```
+
+```sh
python -m cosypose.scripts.download --result_id=tless-siso-n_views=1--684390594
```
To measure e_vsd<0.3, we use the BOP Toolkit. You can run it using:
-```
+
+```sh
python -m cosypose.scripts.run_bop_eval --result_id=tless-siso-n_views=1--684390594 --method=pix2pose_detections/refiner/iteration=4
```
-This will create a `local_data/bop_predictions_csv/cosyposeXXXX-eccv2020_tless-test-primesense.csv` file in the BOP format and run evaluation. Intermediate metrics and final scores are saved in `local_data/bop_eval_outputs/cosposyXXXX-eccV2020_tless-test-primesense/`, where `XXXXX` correponds to a random number generated by the script.
+
+This will create a `local_data/bop_predictions_csv/cosyposeXXXX-eccv2020_tless-test-primesense.csv` file in the BOP format and run evaluation. Intermediate metrics and final scores are saved in `local_data/bop_eval_outputs/cosposyXXXX-eccV2020_tless-test-primesense/`, where `XXXXX` corresponds to a random number generated by the script.
The T-LESS SiSo results can also be downloaded directly:
-```
+
+```sh
python -m cosypose.scripts.download --bop_result_id=cosypose847205-eccv2020_tless-test-primesense
```
You can check the results match those from the paper:
-```
+
+```console
cat local_data/bop_eval_outputs/cosypose847205-eccv2020_tless-test-primesense/error\=vsd_ntop\=1_delta\=15.000_tau\=20.000/scores_th\=0.300_min-visib\=0.100.json
{
@@ -264,74 +308,90 @@ cat local_data/bop_eval_outputs/cosypose847205-eccv2020_tless-test-primesense/er
"tp_count": 31922
}
```
+
Following other works, we reported `mean_obj_recall` in the paper.
## Single-view visualization
+
You can visualize the single-view predictions using [this](notebooks/visualize_singleview_predictions.ipynb) notebook as example.
# Training the single-view 6D pose estimation models
+
Click for details...
## Downloading synthetic images
+
The pose estimation models are trained on a mix of real images provided with the T-LESS/YCB-Video datasets and a set of images that we generated. For each dataset, we generated 1 million synthetic images. You can download these **large** datasets:
-```
+
+```sh
# 106 GB
python -m cosypose.scripts.download --synt_dataset=tless-1M
# 113 GB
python -m cosypose.scripts.download --synt_dataset=ycbv-1M
```
+
We provide below the instructions to generate these dataset locally if you are interested in using our synthetic data generation code.
-## Synthetic data generation
+## Synthetic data generation script
### Textures for domain randomization
+
The synthetic training images are generated with some domain randomization. It includes adding textures to the background (and objects and T-LESS). We use a set of textures extracted from ShapeNet objects. Download the texture dataset:
-```
+
+```sh
python -m cosypose.scripts.download --texture_dataset
```
### Recording a synthetic dataset
-The synthetic images are generated using multiple proceses managed by [dask](https://docs.dask.org/en/latest/setup/single-distributed.html). The synthetic training images can be generated using the following commands for both datasets:
-```
+
+The synthetic images are generated using multiple processes managed by [dask](https://docs.dask.org/en/latest/setup/single-distributed.html). The synthetic training images can be generated using the following commands for both datasets:
+
+```sh
export CUDA_VISIBLE_DEVICES=0
python -m cosypose.scripts.run_dataset_recording --config tless --local
python -m cosypose.scripts.run_dataset_recording --config ycbv --local
```
+
Make sure that enough space is available on your disk. We generate 1 million images which is around 120GB for each dataset. Note that we use a high number of synthetic images, but it may be possible to use fewer images. Please see directly the script `scripts/run_dataset_recording.py` for additionnal parameters. It is also possible to use [dask-jobqueue](https://jobqueue.dask.org/en/latest/) to generate the images on a cluster but we do not provide a simple configuration script for this at the moment. If you are interested in generating data using multiple machines on a cluster, you will have to modify dask-jobqueue's `Cluster` definition [here](cosypose/recording/record_dataset.py).
### Visualizing images of the dataset
-You can visualize the images of the generated dataset using [this](notebooks/inspect_dataset.py) notebook. You can check that the ground truth prvided by a dataset is correct using [this](notebooks/render_dataset.py) notebook.
+You can visualize the images of the generated dataset using [this](notebooks/inspect_dataset.py) notebook. You can check that the ground truth prvided by a dataset is correct using [this](notebooks/render_dataset.py) notebook.
## Background images for data augmentation
+
We apply data augmentation to the training images. Data augmentation includes pasting random images of the pascal VOC dataset on the background of the scenes. You can download Pascal VOC using the following commands:
-```
+
+```sh
cd local_data
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar -xvf VOCtrainval_11-May-2012.tar
```
+
(If the website is down, which happens periodically, you can alternatively download these files from [a mirror](https://pjreddie.com/projects/pascal-voc-dataset-mirror/) at https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar)
## Training script
+
Once you have generated the synthetic data images and downloaded pascal VOC, you can run the training script. On YCB-Video, we train a coarse model on synthetic data only and fine-tune it on the synthetic + real images. On T-LESS, we train a coarse and refinement model and synthetic + provided real images of isolated objects directly from scratch. In our experiments, all models are trained using the same procedure on 32 GPUs.
-```
+```sh
runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config ycbv-refiner-syntonly
runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config ycbv-refiner-finetune
runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config tless-coarse
runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config tless-refiner
```
-You can visualize the logs of the provided models in [this](notebooks/paper_training_logs.ipynb) notebook.
+You can visualize the logs of the provided models in [this](notebooks/paper_training_logs.ipynb) notebook.
![Logs](images/screenshot_logs.png)
You can add the `run_id` of each model that are your are training to visualize training metrics.
Notes:
+
- While we used 32 GPUs in our experiments, the training script can be ran with any number of GPUs. It will just be slower and the overall batch size will be smaller. We have not studied the impact of batch size on final performance of the model. On 32 NVIDIA V100, training a model takes approximately 10 hours. Note that the models are trained from scratch on all the objects of each dataset simulatenously.
- If you are interested in training with limited resources, you could consider the following changes to the code: (a) use a smaller backbone, e.g. flownet, resnet18 or resnet34, (b) train for fewer iterations, (c) start from one of our pre-trained models. All the parameters are defined in `cosypose/scripts/run_pose_training.py`. If you are trying to train with limited resources or on your own dataset and datas, please do not hesitate to share your experience, by opening an issue or by sending an email !
- We run evaluation of the models a few times during training. You can disable it by adding the flag `--no-eval` to speed up training. Note that the we do not use the evaluation metrics to find the best model since no official validation splits are available for YCB-Video/T-LESS. We always report results for the model obtained at the end of the training.
@@ -339,20 +399,24 @@ Notes:
# Reproducing multi-view results
+
Click for details...
The following scripts will run the full CosyPose pipeline (single-view predictions + multi-view scene reconstruction), compute the metrics reported in the paper and save the results to a directory in `local_data/results/`.
-```
+
+```sh
export CUDA_VISIBLE_DEVICES=0
python -m cosypose.scripts.run_cosypose_eval --config tless-vivo --nviews=4
python -m cosypose.scripts.run_cosypose_eval --config tless-vivo --nviews=8
python -m cosypose.scripts.run_cosypose_eval --config ycbv --nviews=5
```
+
Note that the inference and evaluation can be sped up using `runjob` if you have access to multiple GPUs. The mAP@ADD-S<0.1d and AUC of ADD-S metrics are computed using our own code since they are not supported by the BOP toolkit. We refer to the appendix of the main paper for more details on these metrics.
The results can be also downloaded directly:
-```
+
+```sh
# YCB-Video 5 views
python -m cosypose.scripts.download --result_id=ycbv-n_views=5--8073381555
@@ -364,13 +428,16 @@ python -m cosypose.scripts.download --result_id=tless-vivo-n_views=8--2322743008
```
On T-LESS ViVo, the evsd<0.3 and ADD-S<0.1d metrics are computed using the BOP toolkit, for example for computing the multi-view results for ViVo 8 views:
-```
+
+```sh
python -m cosypose.scripts.run_bop_eval --results tless-vivo-n_views=8--2322743008 --method pix2pose_detections/ba_output+all_cand --vivo
```
+
The `ba_output+all_cand` predictions correspond to the output of CosyPose concatenated to all the single-view candidates as explained in the experiment section of the paper. The single-view candidates have strictly lower score than the multi-view predictions, which means that single-view estimates are used for evaluation only if there are no multi-view predictions for an object, e.g. typically because a camera cannot be placed with respect to the scene because there are too few inlier candidates.
We also provide the BOP evaluation results that we computed and reported in the paper:
-```
+
+```sh
# T-LESS ViVo 1 view
python -m cosypose.scripts.download --bop_results=cosypose68486-eccv2020_tless-test-primesense
@@ -382,29 +449,32 @@ python -m cosypose.scripts.download --bop_result_id=cosypose114533-eccv2020_tles
```
## Multi-view visualization
+
You can use [this](notebooks/visualize_multiview_predictions.ipynb) notebook to visualize the multi-view results on YCB-Video and T-LESS and generate the 3D visualization GIFs.
![plots_cosypose](images/screenshot_plots_cosypose.png)
![GIF](notebooks/gifs/scene_ds=tless.primesense.test.bop19-scene=16-nviews=8-scene_group=105.gif)
-
-# Running CosyPose in a custom scenario
+# Using CosyPose in a custom scenario
+
Click for details...
Stage 2 and 3 of CosyPose are agnostic to the 6D pose estimator used, and can therefore be combined with many other existing single-view object pose estimation method to solve problems on other datasets, or for real applications. We provide a utility for running CosyPose given a set of input 6D object candidates in each image.
If you are willing to combine CosyPose with your own pose estimator, you will need to provide the following:
+
- The 3D models of the objects considered and their associated symmetries. The models should be provided in a format similar to the BOP format in a `models` directory.
- A set of input 6D object candidates in each image `candidates.csv`. We use the same convention as the BOP format, but all the candidates in this file must be provided for a unique scene (a single 3D reconstruction) in different views.
- The intrinsics parameters of the cameras of each view in a file `scene_camera.json` following the BOP format.
Use these commands to create a custom scenario with T-LESS objects and run CosyPose on it:
-```
+
+```sh
cd local_data
mkdir -p custom_scenarios/example
ln -s $(pwd)/bop_datasets/tless/models custom_scenarios/example
@@ -415,6 +485,7 @@ python -m cosypose.scripts.run_custom_scenario --scenario=example
```
This will generate the following files:
+
- `results/subscene=0/predicted_scene.json` a set of predicted objects and cameras with their associated poses in a common reference frame.
- `results/subscene=0/scene_reprojected.csv` poses of predicted objects expressed in camera frames, in the BOP format.
@@ -423,6 +494,7 @@ This will generate the following files:
You can use this as an example to check the different formats in which the informations should be provided.
Notes:
+
- This is experimental. The default parameters for the pipeline should give good results in many scenarios (we use the same on YCB-Video and T-LESS) but we have yet not conducted experiments in many custom scenarios. If you are trying to apply CosyPose to your own 6D pose estimations and encounter any issues or would like to obtain better results, please consider sharing your experience, I would be very happy to help you.
- The script is quite slow to run for a single scene because all models need to be loaded and the first cuda call with pytorch is always slow. If you would like to use this for an application, consider using directly the API of the `MultiviewScenePredictor` in your own code. You can use the script `scripts/run_custom_scenario.py` as an example on how to use it.
@@ -432,10 +504,12 @@ Notes:
# BOP20 models and results
+
Click for details...
We provide the training code that we used to train single-view single-object pose estimation models on the 7 core datasets (LM-O, TLESS, TUD-L, IC-BIN, ITODD, HB, YCB-V) and pre-trained detector and pose estimation models. Note that these models are different from the ones used in the paper. The differences with the models used in the paper are the following:
+
- In the paper, we use already available detectors for T-LESS and YCB-Video. For the BOP20 challenge, we trained our own detectors on each dataset.
- Detection and pose estimation models are trained using PBR synthetic images provided with the BOP challenge instead of using our own synthetic data to make it easier to compare fairly with the other approaches.
- In the BOP20 challenge results, the initialization of the pose provided to the coarse model is slightly different. First, the canonical orientation has been changed to have the z-axis parallel to the camera instead of having the x-axis parallel to the camera, a position with z-axis upward and parallel to the camera makes the overall shape and details of the objects more visible. Second, instead of fixing the z value of the canonical translation to 1 meter, we compute a guess of object depth using the height and width of the 2D bounding box and the 3D model. This makes the method more general as the canonical depth is always within a reasonable range of the correct depth even if the object is very far from the camera.
@@ -443,19 +517,23 @@ We provide the training code that we used to train single-view single-object pos
Even though the challenge is focused on single-view pose estimation, we also reported multi-view results on YCB-Video, T-LESS and HB for 4 and 8 views.
## Downloading BOP datasets
-```
+
+```sh
python -m cosypose.scripts.download --bop_dataset=DATASET --pbr_training_images
python -m cosypose.scripts.download --urdf_models=DATASET
```
+
for DATASET={hb,icbin,itodd,lm,lmo,tless,tudl,ycbv}. If you are not interested in training the models, you can remove the flag --pbr_training_images and you can omit lm.
-## Pre-trained models
+## Downloading pre-trained models
+
You can download all the models that we trained for the challenge using our downloading script:
-```
+
+```sh
python -m cosypose.scripts.download --model=model_id
```
-where model_id is given by the table below:
+where model_id is given by the table below:
| Dataset | Model type | Training images | `model_id` |
|---------|------------|-----------------|--------------------------------------|
@@ -496,38 +574,43 @@ where model_id is given by the table below:
| ycbv | coarse | SYNT+REAL | coarse-bop-ycbv-synt+real--822463 |
| ycbv | refiner | SYNT+REAL | refiner-bop-ycbv-synt+real--631598 |
-
The detectors are MaskRCNN models with resnet50 FPN backbone. PBR corresponds to training only on provided synthetic images. SYNT+REAL corresponds to training on all available synthetic and real images when available (only for tless, tudl and ycbv). SYNT+REAL models are pre-trained from PBR.
If you want to use all the models for a complete evaluation:
-```
+
+```sh
python -m cosypose.scripts.download --all_bop20_models
```
## Running inference
+
The following commands will reproduce the results that we reported on the [leaderboard](https://bop.felk.cvut.cz/leaderboards/) for all the datasets:
-```
-# CosyPose-ECCV20-PBR-1VIEW
+
+```sh
+# CosyPose-ECCV20-PBR-1VIEW
python -m cosypose.scripts.run_bop_inference --config bop-pbr
# CosyPose-ECCV20-SYNT+REAL-1VIEW
python -m cosypose.scripts.run_bop_inference --config bop-synt+real
-# CosyPose-ECCV20-SYNT+REAL-1VIEW-ICP
+# CosyPose-ECCV20-SYNT+REAL-1VIEW-ICP
python -m cosypose.scripts.run_bop_inference --config bop-synt+real --icp
-# CosyPose-ECCV20-SYNT+REAL-4VIEWS
+# CosyPose-ECCV20-SYNT+REAL-4VIEWS
python -m cosypose.scripts.run_bop_inference --config bop-synt+real --nviews=4
-# CosyPose-ECCV20-SYNT+REAL-8VIEWS
+# CosyPose-ECCV20-SYNT+REAL-8VIEWS
python -m cosypose.scripts.run_bop_inference --config bop-synt+real --nviews=8
```
+
The inference script is compatible with `runjob`.
Inference results on all datasets can be downloaded directly:
-```
+
+```sh
python -m cosypose.scripts.download --result_id=result_id
```
+
where result_id is given by the table below
| BOP20 method name | `result_id` |
@@ -539,11 +622,13 @@ where result_id is given by the table below
| CosyPose-ECCV20-SYNT+REAL-8VIEWS | bop-synt+real-nviews=8--763684 |
If you want to download everything:
-```
+
+```sh
python -m cosypose.scripts.download --all_bop20_results
```
Notes:
+
- The ICP refiner was adapted from [Pix2Pose code](https://github.com/kirumang/Pix2Pose/blob/843effe0097e9982f4b07dd90b04ede2b9ee9294/tools/5_evaluation_bop_icp3d.py#L57). Be careful if you want to use it, it slightly decrease performance over RGB-only on T-LESS instead of improving the results. Qualitative results show a misalignment of many objects after ICP, there is likely a small bug with my version but I haven't had time to go in detail. Note that our method and paper are focused on the RGB-only setting.
@@ -551,30 +636,39 @@ Notes:
## Running evaluation
+
You can run locally the evaluation on the publicly available test sets:
-```
+
+```sh
python -m cosypose.scripts.run_bop20_eval_multi --result_id=result_id --method=method
```
+
where method is `maskrcnn_detections/refiner/iteration=4` for single-view, `maskrcnn_detections/icp` when ICP is ran, and `maskrcnn_detections/multiview` for multi-view (n_views > 1).
If you are only interested in generating the bop predictions file suitable for submission to the website, you can run
-```
+
+```sh
python -m cosypose.scripts.run_bop20_eval_multi --result_id=result_id --method=method --convert_only
```
## Training details
### Detection
+
We use torchvision's MaskRCNN implementation for the detection. The models were trained using:
-```
+
+```sh
runjob --ngpus=32 python -m cosypose.scripts.run_detector_training --config bop-DATASET-TRAINING_IMAGES
```
+
where DATASET={lmo,tless,tudl,icbin,itodd,hb,ycbv} and TRAINING_IMAGES={pbr,synt+real} (synt+real only for datasets where real images are available: tless, tudl and ycbv).
### Pose estimation
-```
+
+```sh
runjob --ngpus=32 python -m cosypose.scripts.run_pose_training --config bop-DATASET-TRAINING_IMAGES-MODEL_TYPE
```
+
where MODEL_TYPE={coarse,refiner}.
From 29fe8e819a4178455b8f16f1a3f326a6bc04261a Mon Sep 17 00:00:00 2001
From: Toefinder <43576719+Toefinder@users.noreply.github.com>
Date: Wed, 21 Dec 2022 19:59:22 +0800
Subject: [PATCH 3/6] Git clone Simple Robotics instead of original repo
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 790412b..42ac54b 100644
--- a/README.md
+++ b/README.md
@@ -91,7 +91,7 @@ We used our {coarse+refinement} single-view 6D pose estimation method in the [BO
# Installation
```sh
-git clone --recurse-submodules https://github.com/ylabbe/cosypose.git
+git clone --recurse-submodules https://github.com/Simple-Robotics/cosypose.git
cd cosypose
conda env create -n cosypose --file environment.yaml
conda activate cosypose
From cd3ca822b4468a31b167902d719ee84054395c55 Mon Sep 17 00:00:00 2001
From: Toefinder <43576719+Toefinder@users.noreply.github.com>
Date: Mon, 26 Dec 2022 21:55:06 +0800
Subject: [PATCH 4/6] Update BOP dataset download link
---
cosypose/scripts/download.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/cosypose/scripts/download.py b/cosypose/scripts/download.py
index dfb3b95..c04e239 100644
--- a/cosypose/scripts/download.py
+++ b/cosypose/scripts/download.py
@@ -14,7 +14,7 @@
DOWNLOAD_DIR = LOCAL_DATA_DIR / 'downloads'
DOWNLOAD_DIR.mkdir(exist_ok=True)
-BOP_SRC = 'http://ptak.felk.cvut.cz/6DB/public/bop_datasets/'
+BOP_SRC = 'https://bop.felk.cvut.cz/media/data/bop_datasets/'
BOP_DATASETS = {
'ycbv': {
'splits': ['train_real', 'train_synt', 'test_all']
From 08614c6542e7b2342950e65f8ca23c199b140c72 Mon Sep 17 00:00:00 2001
From: Toefinder <43576719+Toefinder@users.noreply.github.com>
Date: Tue, 27 Dec 2022 17:37:28 +0800
Subject: [PATCH 5/6] Add note for liburdfdom-tools dependency
---
README.md | 1 +
1 file changed, 1 insertion(+)
diff --git a/README.md b/README.md
index 42ac54b..a92470f 100644
--- a/README.md
+++ b/README.md
@@ -104,6 +104,7 @@ The installation may take some time as several packages must be downloaded and i
Notes:
- We use the [bop_toolkit](https://github.com/thodan/bop_toolkit) to compute some evaluation metrics on T-LESS. To ensure reproducibility, we use our [own fork](https://github.com/ylabbe/bop_toolkit_cosypose) of the repository. It is downloaded in `deps/`.
+- The package `pinocchio` also requires `liburdfdom-tools` to be installed in the system. In Ubuntu, you can install this library by running `sudo apt install liburdfdom-tools`.
# Downloading and preparing data
From 62f8d280b2182075c01c517c015d83a1470d4d31 Mon Sep 17 00:00:00 2001
From: Toefinder <43576719+Toefinder@users.noreply.github.com>
Date: Wed, 4 Jan 2023 23:18:39 +0800
Subject: [PATCH 6/6] Fix dependency jinja2 version
---
environment.yaml | 1 +
1 file changed, 1 insertion(+)
diff --git a/environment.yaml b/environment.yaml
index 3e5be90..b67b4ed 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -36,6 +36,7 @@ dependencies:
- xarray==0.14.1
- pyarrow==0.15.1
- matplotlib==3.1.2
+ - jinja2==3.0.0
- bokeh==1.4.0
- plyfile==0.7.1
- trimesh==3.5.16