Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models (NeurIPS2024)

Official PyTorch implementation of the method OLIVINE. More details can be found in the paper:

Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models, NeurIPS2024 [arXiv] by Yifan Zhang and Junhui Hou.

Dependencies

Please install the required required packages. Some libraries used in this project, including MinkowskiEngine and Pytorch-lightning are known to have a different behavior when using a different version; please use the exact versions specified in requirements.txt.

Datasets

The code provided is compatible with nuScenes and semantic KITTI. Put the datasets you intend to use in the "datasets" folder (a symbolic link is accepted).

datasets/
├── nuscenes
    ├── camseg (semantic labels infered by Grounded-SAM)
    ├── lidarseg (decompress nuScenes-lidarseg-all-v1.0.tar)
    ├── maps
    ├── samples
    ├── sweeps
    ├── v1.0-mini
    ├── v1.0-test
    ├── v1.0-trainval
    └── zip_files
└── semantic_kitti
    ├── dataset
        ├── poses
        └── sequences
└── other datasets...

Reproducing the results

Predict the weak semantic labels (required)

First we use the SEEM to obtain weak semantic labels of RGB images. If you do not want to execute the following steps by yourself, you can also obtain the labels by directly downloading the files we provide in Baidu netdisk or Google Drive.

Install necessary libraries in demo_code/requirements.txt
Link nuScenes dataset to demo_code/data/sets. Command: ln -s datasets/nuscenes demo_code/data/sets/
Go to demo_code directory and run the script bash semantic_label_generation.sh
Organize the generated files and put them in data/nuscenes/camseg.

Pre-training a 3D backbone

To launch a pre-training of the Minkowski SR-UNet (minkunet) on nuScenes:

python pretrain.py --cfg config/olivine_minkunet.yaml

You can alternatively replace minkunet with voxelnet to pre-train a PV-RCNN backbone.
Weights of the pre-training can be found in the output folder, and can be re-used during a downstream task. If you wish to use multiple GPUs, please scale the learning rate and batch size accordingly.

TIPs: The pre-trained weights in the final epoch of pre-training may not always be the best; it's worth considering saving the weights from other rounds, such as the 40th epoch.

Semantic segmentation

To launch a semantic segmentation, use the following command:

python downstream.py --cfg_file="config/semseg_nuscenes.yaml" --pretraining_path="output/pretrain/[...]/model.pt"

with the previously obtained weights, and any config file. The default config will perform a finetuning on 1% of nuScenes' training set, with the learning rates optimized for the provided pre-training.

To re-evaluate the score of any downstream network, run:

python evaluate.py --resume_path="output/downstream/[...]/model.pt" --dataset="nuscenes"

If you wish to reevaluate the linear probing, the experiments in the paper were obtained with lr=0.05, lr_head=null and freeze_layers=True.

Object detection

All experiments for object detection have been done using OpenPCDet.

Published results

All results are obtained with weights pre-trained on nuScenes.

Few-shot semantic segmentation

Results on the validation set using Minkowski SR-Unet:

Method	nuScenes lin. probing	nuScenes Finetuning with 1% data	KITTI Finetuning with 1% data
Random init.	8.1	30.3	39.5
PointContrast	21.9	32.5	41.1
DepthContrast	22.1	31.7	41.5
PPKT	36.4	37.8	43.9
SLidR	38.8	38.3	44.6
OLIVINE	50.0	50.5	49.3

Semantic Segmentation on nuScenes

Results on the validation set using Minkowski SR-Unet with a fraction of the training labels:

Method	1%	5%	10%	25%	100%
Random init.	30.3	47.7	56.6	64.8	74.2
SLidR	39.0	52.2	58.8	66.2	74.6
OLIVINE	50.6	60.2	65.0	70.1	76.5

Object detection on KITTI

All results are obtained with a pre-training on nuScenes.

Results on the validation set using PV-RCNN:

Method	Car	Pedestrian	Cyclist	mAP@40
Random init.	84.5	57.9	71.3	71.3
STRL*	84.7	57.8	71.9	71.5
PPKT	83.2	55.5	73.8	70.8
SLidR	84.4	57.3	74.2	71.9
OLIVINE	84.8	59.3	74.2	72.8

*STRL has been pre-trained on KITTI, while SLidR and PPKT were pre-trained on nuScenes

Results on the validation set using SECOND:

Method	Car	Pedestrian	Cyclist	mAP@40
Random init.	81.5	50.9	66.5	66.3
DeepCluster*				66.1
SLidR	81.9	51.6	68.5	67.3
OLIVINE	82.0	53.2	69.8	68.3

*As reimplemented in ONCE

Acknowledgment

We implement the method based on SLidR. Part of the codebase has been adapted from PointContrast. Computation of the lovasz loss used in semantic segmentation follows the code of PolarNet.

License

OLIVINE is released under the Apache 2.0 license.

Citation

If you find OLIVINE useful in your research, please consider citing:

@inproceedings{zhang2024fine,
  title={Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models},
  author={Zhang, Yifan and Hou, Junhui},
  booktitle={Advances in Neural Information Processing Systems},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Grounded-SAM @ fba1ec3		Grounded-SAM @ fba1ec3
assets		assets
config		config
demo_code		demo_code
downstream		downstream
model		model
pretrain		pretrain
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
downstream.py		downstream.py
evaluate.py		evaluate.py
pretrain.py		pretrain.py
requirements.txt		requirements.txt
superpixel_segmenter.py		superpixel_segmenter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models (NeurIPS2024)

Dependencies

Datasets

Reproducing the results

Predict the weak semantic labels (required)

Pre-training a 3D backbone

Semantic segmentation

Object detection

Published results

Few-shot semantic segmentation

Results on the validation set using Minkowski SR-Unet:

Semantic Segmentation on nuScenes

Results on the validation set using Minkowski SR-Unet with a fraction of the training labels:

Object detection on KITTI

Results on the validation set using PV-RCNN:

Results on the validation set using SECOND:

Acknowledgment

License

Citation

About

Releases

Packages

Languages

License

Eaphan/OLIVINE

Folders and files

Latest commit

History

Repository files navigation

Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models (NeurIPS2024)

Dependencies

Datasets

Reproducing the results

Predict the weak semantic labels (required)

Pre-training a 3D backbone

Semantic segmentation

Object detection

Published results

Few-shot semantic segmentation

Results on the validation set using Minkowski SR-Unet:

Semantic Segmentation on nuScenes

Results on the validation set using Minkowski SR-Unet with a fraction of the training labels:

Object detection on KITTI

Results on the validation set using PV-RCNN:

Results on the validation set using SECOND:

Acknowledgment

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages