Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

This codebase implements the system described in the paper:

UnOS: Unified Unsupervised Optical-flow and Stereo-depth Estimation by Watching Videos

More information can also be found in an earlier version of the paper Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

Yang Wang, Peng Wang, Zhenheng Yang, Yi Yang, Chenxu Luo and Wei Xu

Please contact Yang Wang ([email protected]) if you have any questions.

Prerequisites

This codebase was developed and tested with Tensorflow 1.2, CUDA 8.0 and Ubuntu 14.04.

Acknowledgement

Some of the codes were borrowed from the excellent works of Tinghui Zhou, Clément Godard, Huangying Zhan, Chelsea Finn, Ruoteng Li and Martin Kersner. The files borrowed from Clément Godard and Huangying Zhan are licensed under their original license respectively.

Preparing training data

You would need to download all of the KITTI raw data and calibration files to train the model. You would also need the training files of KITTI 2012 and KITTI 2015 for validating the models.

Pretrained models

The pretrained models described in the paper can be downloaded here.

Training

As described in the paper, the training are organized into three stages sequentially.

Stage 1: only train optical flow

In Stage 1, we only train the PWC-Flow net to learn the optical flow.

python main.py --data_dir=/path/to/your/kitti_raw_data --batch_size=4 --mode=flow --train_test=train  --retrain=True  --train_file=./filenames/kitti_train_files_png_4frames.txt --gt_2012_dir=/path/to/your/kitti_2012_gt --gt_2015_dir=/path/to/your/kitti_2015_gt --trace=/path/to/store-your-model-and-logs

After around 200K iterations, you should be able to reach the performance of Ours(PWC-Only) described in the paper. You can also download our pretrained model model-flow.

Stage 2: only train depth and pose

In Stage 2, we train the PWC-Disp and MotionNet to learn the depth and pose.

python main.py --data_dir=/path/to/your/kitti_raw_data --batch_size=4  --mode=depth --train_test=train  --retrain=True  --train_file=./filenames/kitti_train_files_png_4frames.txt --gt_2012_dir=/path/to/your/kitti_2012_gt --gt_2015_dir=/path/to/your/kitti_2015_gt --pretrained_model=/path/to/your/pretrained-flow-model-in-stage1  --trace=/path/to/store-your-model-and-logs

After around 200K iterations, you sould be able to reach the performance of Ours(Ego-motion) described in the paper. You can also download our pretrained model model-depth

Stage 3: train optical flow, depth, pose and motion segmentation together

In Stage 3, we train everything together.

python main.py --data_dir=/path/to/your/kitti_raw_data --batch_size=4  --mode=depthflow --train_test=train  --retrain=True  --train_file=./filenames/kitti_train_files_png_4frames.txt --gt_2012_dir=/path/to/your/kitti_2012_gt --gt_2015_dir=/path/to/your/kitti_2015_gt --pretrained_model=/path/to/your/pretrained-depth-model-in-stage2 --trace=/path/to/store-your-model-and-logs

After around 200K iterations, you should be able to reach the performance of Ours(Full) described in the paper. You can also download our pretrained model model-depthflow

Aside: Only train depth using stereo

If you would like to only train depth using the stereo pairs, you can run the following script. This is different from Stage 2 that it only trains PWC-Disp net using stereo pairs.

python main.py --data_dir=/path/to/your/kitti_raw_data --batch_size=4 --mode=stereo --train_test=train  --retrain=True  --train_file=./filenames/kitti_train_files_png_4frames.txt --gt_2012_dir=/path/to/your/kitti_2012_gt --gt_2015_dir=/path/to/your/kitti_2015_gt --trace=/path/to/store-your-model-and-logs

After around 100K iterations, you should be able to reach the performance of Ours(Stereo-only) described in the paper. You can also download our pretrained model model-stereo

Notes

You can specify multiple GPUs training with flag --num_gpus
You can switch to KITTI odometry split by setting --train_file=./filenames/odo_train_files_png_4frames.txt
If you would like to continue to train a model from a previous checkpoint from the same mode, you can set --retrain=False

Evaluation

The evaluation has already been performed while doing the training. The evaluation results will be printed to the screen.

If you would like to only do a evaluation run, you can set --train_test=test.

You can test the pose estimations on sequences 09 and 10 by setting --eval_pose=09,10 which only works for modes depth and depthflow.

Disclaimer

This is the authors' implementation of the system described in the paper and not an official Baidu product.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
eval		eval
filenames		filenames
nets		nets
pose_gt_data		pose_gt_data
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
LICENSE_monodepth		LICENSE_monodepth
README.md		README.md
__init__.py		__init__.py
flowlib.py		flowlib.py
loss_utils.py		loss_utils.py
main.py		main.py
models.py		models.py
monodepth_dataloader.py		monodepth_dataloader.py
monodepth_model.py		monodepth_model.py
optical_flow_warp_fwd.py		optical_flow_warp_fwd.py
optical_flow_warp_old.py		optical_flow_warp_old.py
test.py		test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

Prerequisites

Acknowledgement

Preparing training data

Pretrained models

Training

Stage 1: only train optical flow

Stage 2: only train depth and pose

Stage 3: train optical flow, depth, pose and motion segmentation together

Aside: Only train depth using stereo

Notes

Evaluation

Disclaimer

About

Releases

Packages

Languages

License

baidu-research/UnDepthflow

Folders and files

Latest commit

History

Repository files navigation

Joint Unsupervised Learning of Optical Flow and Depth by Watching Stereo Videos

Prerequisites

Acknowledgement

Preparing training data

Pretrained models

Training

Stage 1: only train optical flow

Stage 2: only train depth and pose

Stage 3: train optical flow, depth, pose and motion segmentation together

Aside: Only train depth using stereo

Notes

Evaluation

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages