When Epipolar Constraint Meets Non-local Operators in Multi-View Stereo
Authors: Tianqi Liu, Xinyi Ye, Weiyue Zhao, Zhiyu Pan, Min Shi*, Zhiguo Cao
Institute: Huazhong University of Science and Technology
ICCV 2023
Learning-based multi-view stereo (MVS) method heavily relies on feature matching, which requires distinctive and descriptive representations. An effective solution is to apply non-local feature aggregation, e.g., Transformer. Albeit useful, these techniques introduce heavy computation overheads for MVS. Each pixel densely attends to the whole image. In contrast, we propose to constrain non-local feature augmentation within a pair of lines: each point only attends the corresponding pair of epipolar lines. Our idea takes inspiration from the classic epipolar geometry, which shows that one point with different depth hypotheses will be projected to the epipolar line on the other view. This constraint reduces the 2D search space into the epipolar line in stereo matching. Similarly, this suggests that the matching of MVS is to distinguish a series of points lying on the same line. Inspired by this point-to-line search, we devise a line-to-point non-local augmentation strategy. We first devise an optimized searching algorithm to split the 2D feature maps into epipolar line pairs. Then, an Epipolar Transformer (ET) performs non-local feature augmentation among epipolar line pairs. We incorporate the ET into a learning-based MVS baseline, named ET-MVSNet. ET-MVSNet achieves state-of-the-art reconstruction performance on both the DTU and Tanks-and-Temples benchmark with high efficiency.
conda create -n etmvsnet python=3.10.8
conda activate etmvsnet
pip install -r requirements.txt
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 -f https://download.pytorch.org/whl/torch_stable.html
Training data. We use the same DTU training data as mentioned in MVSNet and CasMVSNet. Download DTU training data and Depth raw. Unzip and organize them as:
dtu_training
├── Cameras
├── Depths
├── Depths_raw
└── Rectified
Testing Data. Download DTU testing data. Unzip it as:
dtu_testing
├── scan1
├── scan4
├── ...
Download BlendedMVS and unzip it as:
blendedmvs
├── 5a0271884e62597cdee0d0eb
├── 5a3ca9cb270f0e3f14d0eddb
├── ...
├── training_list.txt
├── ...
Download Tanks and Temples and unzip it as:
tanksandtemples
├── advanced
│ ├── Auditorium
│ ├── ...
└── intermediate
├── Family
├── ...
We use the camera parameters of short depth range version (included in your download), you should replace the cams
folder in intermediate
folder with the short depth range version manually.
Or you can download our processed data here.
We recommend the latter.
To train the model from scratch on DTU, specify DTU_TRAINING
in ./scripts/train_dtu.sh
first and then run:
bash scripts/train_dtu.sh exp_name
After training, you will get model checkpoints in ./checkpoints/dtu/exp_name
.
To fine-tune the model on BlendedMVS, you need specify BLD_TRAINING
and BLD_CKPT_FILE
in ./scripts/train_bld.sh
first, then run:
bash scripts/train_bld.sh exp_name
For DTU testing, we use the model (pretrained model) trained on DTU training dataset. Specify DTU_TESTPATH
and DTU_CKPT_FILE
in ./scripts/test_dtu.sh
first, then run the following command to generate point cloud results.
bash scripts/test_dtu.sh exp_name
For quantitative evaluation, download SampleSet and Points from DTU's website. Unzip them and place Points
folder in SampleSet/MVS Data/
. The structure is just like:
SampleSet
├──MVS Data
└──Points
Specify datapath
, plyPath
, resultsPath
in evaluations/dtu/BaseEvalMain_web.m
and datapath
, resultsPath
in evaluations/dtu/ComputeStat_web.m
, then run the following command to obtain the quantitative metics.
cd evaluations/dtu
matlab -nodisplay
BaseEvalMain_web
ComputeStat_web
We recommend using the finetuned model (pretrained model) to test on Tanks and Temples benchmark. Similarly, specify TNT_TESTPATH
and TNT_CKPT_FILE
in scripts/test_tnt_inter.sh
and scripts/test_tnt_adv.sh
. To generate point cloud results, just run:
bash scripts/test_tnt_inter.sh exp_name
bash scripts/test_tnt_adv.sh exp_name
For quantitative evaluation, you can upload your point clouds to Tanks and Temples benchmark.
@InProceedings{Liu_2023_ICCV,
author = {Liu, Tianqi and Ye, Xinyi and Zhao, Weiyue and Pan, Zhiyu and Shi, Min and Cao, Zhiguo},
title = {When Epipolar Constraint Meets Non-Local Operators in Multi-View Stereo},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {18088-18097}
Our work is partially based on these opening source work: MVSNet, cascade-stereo, MVSTER.
We appreciate their contributions to the MVS community.