Scan-and-Tell [Report]
This is the repository for the Scan and Tell project. The goal of the project is to improve the state-of-the-art 3D Dense Captioning architecture using sparse convolutions.
- Python 3.7.0
- Pytorch 1.2.0
- CUDA 10.0
conda create -n scan-and-tell python==3.7
source activate scan-and-tell
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch
(1) Clone the Scan-and-Tell repository.
git clone --recurse-submodules https://github.com/Barsovich/Scan-and-Tell.git
cd Scan-and-Tell
(2) Install the dependent libraries.
pip install -r requirements.txt
conda install -c bioconda google-sparsehash
(3) For the SparseConv, we apply the implementation of spconv. The repository is recursively downloaded at step (1). We use the version 1.0 of spconv.
Note: We further modify spconv\spconv\functional.py
to make grad_output
contiguous. Make sure you use our modified spconv
.
- To compile
spconv
, firstly install the dependent libraries.
conda install libboost
conda install -c daleydeng gcc-5 # need gcc-5.4 for sparseconv
Add the $INCLUDE_PATH$
that contains boost
in lib/spconv/CMakeLists.txt
. (Not necessary if it could be found.)
include_directories($INCLUDE_PATH$)
- Compile the
spconv
library.
cd lib/spconv
python setup.py bdist_wheel
- Run
cd dist
and use pip to install the generated.whl
file.
(4) Compile the pointgroup_ops
library.
cd lib/pointgroup_ops
python setup.py develop
If any header files could not be found, run the following commands.
python setup.py build_ext --include-dirs=$INCLUDE_PATH$
python setup.py develop
$INCLUDE_PATH$
is the path to the folder containing the header files that could not be found.
(5) Compile PointNet2
cd lib/pointnet2
python setup.py install
(6) Before moving on to the next step, please don't forget to set the project root path to the CONF.PATH.BASE
in config/config_votenet.py
.
For downloading the ScanRefer dataset, please fill out this form. Once your request is accepted, you will receive an email with the download link.
Note: In addition to language annotations in ScanRefer dataset, you also need to access the original ScanNet dataset. Please refer to the ScanNet Instructions for more details.
Download the dataset by simply executing the wget command:
wget <download_link>
-
Download the ScanRefer dataset and unzip it under
data/
.a) Run
scripts/organize_scanrefer.py
-
Download the preprocessed GLoVE embeddings (~990MB) and put them under
data/
. -
Download the ScanNetV2 dataset and put (or link)
scans/
under (or to)data/scannet/scans/
(Please follow the ScanNet Instructions for downloading the ScanNet dataset).
After this step, there should be folders containing the ScanNet scene data under the
data/scannet/scans/
with names likescene0000_00
- Pre-process ScanNet data. A folder named
scannet_data/
will be generated underdata/scannet/
after running the following command. Depending on the backbone you would like to use, set the model tovotenet
orpointgroup
. You can also do both.
cd data/scannet/
python batch_load_scannet_data.py --model <model>
-
Pre-process the multiview features from ENet.
a. Download the ENet pretrained weights (1.4MB) and put it under
data/
b. Download and decompress the extracted ScanNet frames (~13GB).
c. Change the data paths in
config.py
marked with TODO accordingly.d. Extract the ENet features:
python script/compute_multiview_features.py
e. Project ENet features from ScanNet frames to point clouds; you need ~36GB to store the generated HDF5 database:
python script/project_multiview_features.py --maxpool
You can check if the projections make sense by projecting the semantic labels from image to the target point cloud by:
python script/project_multiview_labels.py --scene_id scene0000_00 --maxpool
- (Optional) Configure the desired settings in
config/pointgroup_run1_scannet.yaml
- Run the training script
python train_pointgroup.py --config config/pointgroup_run1_scannet.yaml
- (Optional) Configure the desired settings in
config/votenet_args.yaml
- Run the training script
python train_votenet.py --config config/votenet_args.yaml
- (Optional) Configure the desired settings in
config/pointgroup_run1_scannet.yaml
- Run the training script
python eval_pointgroup.py --config config/pointgroup_run1_scannet.yaml
If you use the same configuration file as the config parameter, it will automatically resume the training from the last saved checkpoint.
- (Optional) Configure the desired settings in
config/votenet_eval_args.yaml
- Run the training script
python eval_votenet.py --config config/votenet_eval_args.yaml
The main
should be used for all steps. The rg
branch is currently experimental and aims to utilize a graph message passing network to further improve the results.
This repository uses the PointGroup implementation from https://github.com/Jia-Research-Lab/PointGroup and the VoteNet implementation from https://github.com/daveredrum/ScanRefer. We would like to thank Dave Z. Chen and Jia-Research-Lab for their implementations.