Princeton Vision & Learning Lab (PVL)
We recommend creating a python enviroment with anaconda.
conda create -n OGNIDC python=3.8
conda activate OGNIDC
# For CUDA Version == 11.3
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install mmcv==1.4.4 -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.10/index.html
pip install mmsegmentation==0.22.1
pip install timm tqdm thop tensorboardX tensorboard opencv-python ipdb h5py ipython Pillow==9.5.0 plyfile einops
We used NVIDIA Apex for multi-GPU training. Apex can be installed as follows:
git clone https://github.com/NVIDIA/apex
cd apex
git reset --hard 4ef930c1c884fdca5f472ab2ce7cb9b505d26c1a
conda install cudatoolkit-dev=11.3 -c conda-forge
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
You may face the bug ImportError: cannot import name 'container_abcs' from 'torch._six'
. In this case, change line 14 of apex/apex/_amp_state.py to import collections.abc as container_abcs
and re-install apex.
(Needed only if you use NLSPN) Build and install DCN module.
cd src/model/deformconv
sh make.sh
Create a folder named datasets
and put all datasets under it.
We used preprocessed NYUv2 HDF5 dataset provided by Fangchang Ma.
cd datasets
wget http://datasets.lids.mit.edu/sparse-to-dense/data/nyudepthv2.tar.gz
tar -xvf nyudepthv2.tar.gz
After that, you will get a data structure as follows:
nyudepthv2_h5
├── train
│ ├── basement_0001a
│ ... ├── 00001.h5
│ └── ...
└── val
└── official
├── 00001.h5
└── ...
Download the following files and unzip under the kitti_depth
folder:
data_depth_annotated, data_depth_velodyne, data_depth_selection
Finally, download kitti raw images by:
cd datasets/kitti_depth
wget https://github.com/youmi-zym/CompletionFormer/files/12575038/kitti_archives_to_download.txt
wget -i kitti_archives_to_download.txt -P kitti_raw/
cd kitti_raw
unzip "*.zip"
The overall data directory is structured as follows:
kitti_depth
├──data_depth_annotated
| ├── train
| └── val
├── data_depth_velodyne
| ├── train
| └── val
├── data_depth_selection
| ├── test_depth_completion_anonymous
| |── test_depth_prediction_anonymous
| └── val_selection_cropped
└── kitti_raw
├── 2011_09_26
├── 2011_09_28
├── 2011_09_29
├── 2011_09_30
└── 2011_10_03
First download the zip files (you can use gdown) under datasets
:
cd datasets
https://drive.google.com/open?id=1rzTFD35OCxMIguxLDcBxuIdhh5T2G7h4
Under the datasets
folder, run
sh ../unzip_void.sh
Finally, the file structure should be:
void_release
├── void_150
│ ├── data
│ │ ├── birthplace_of_internet
│ │ └── ...
│ ├── test_absolute_pose.txt
│ └── ...
├── void_500
│ └── ...
└── void_1500
└── ...
We use the dataset pre-processed by the VPP4DC authors. You can download from:
https://drive.google.com/open?id=1y8Rt3Hld8zVTSKxx9d9yYXSzr5niKN7i
Unzip it and get the file structure:
void_release
└── pregenerated
└── val
├── gt
├── hints
├── intrinsics
└── rgb
Download checkpoints from
https://drive.google.com/drive/folders/1LWrb1uFcJ5SGJdS8a9aqeyzkRMYyECRt?usp=sharing
and puts them under the checkpoints
folder.
cd src
# NYU in-domain (500 points)
sh testing_scripts/test_nyu.sh
# NYUv2 sparsity level generalization (5~20,000 points)
sh testing_scripts/test_nyu_sparse_inputs.sh
# KITTI sparsity level generalization (8~64 lines)
sh testing_scripts/test_kitti_sparse_inputs.sh
# Zero-shot test on VOID
sh testing_scripts/test_void.sh
# Zero-shot test on DDAD
sh testing_scripts/test_ddad.sh
# Example command for generating KITTI online server submission file
sh testing_scripts/test_kitti_server_submit.sh
Training on NYU requires 1x24GB GPU (e.g., RTX 3090) and ~3 days. Training on KITTI requires 8x48GB GPUs (e.g., RTX A6000) and ~7 days.
cd src
# NYU
sh training_scripts/train_nyu_generalizable.sh
# KITTI
sh training_scripts/train_kitti_generalizable.sh
# NYU best in-domain performance
sh training_scripts/train_nyu_best_performance.sh
# KITTI online server (L1)
sh training_scripts/train_kitti_best_performance_l1.sh
# KITTI online server (L1+L2)
sh training_scripts/train_kitti_best_performance_l1+l2.sh
This codebase is developed based on CompletionFormer by Youmin Zhang et al. We thank the authors for making their code public.
If you find our work helpful please consider citing our paper:
@inproceedings{ognidc,
title={OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations},
author={Zuo, Yiming and Deng, Jia},
booktitle={European Conference on Computer Vision (ECCV)},
pages={78--95},
year={2024}
}