Created by Wanhua Li*, Xiaoke Huang*, Zheng Zhu, Yansong Tang, Xiu Li, Jiwen Lu†, Jie Zhou
This repository contains PyTorch implementation of "OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression".
[Project Page] [arXiv] [Zhihu]
git clone --recursive https://github.com/xk-huang/OrdinalCLIP.git
Download links: [Google Drive] [Baidu Drive]
Download data and save them to data/MORPH
according to the config files.
Download checkpoints (Pytorch/CLIP/Custom) and save them to .cache/
accordingly.
Weights download links: [Google Drive] [Baidu Drive]
.cache
├── clip
│ ├── RN50.pt
│ └── ViT-B-16.pt
├── resnet
│ └── resnet50_imagenet.pt
└── vgg
├── vgg16_imagenet.pt
└── vgg_imdb_wiki.pt
There are two options to set up environment.
Recommend Docker environment.
[Docker installation guideline]
- Install Docker:
- Ubuntu >= 18.04: https://docs.docker.com/engine/install/ubuntu/
- Ubuntu == 16.04: https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-16-04
- Then install nvidia-docker: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#setting-up-nvidia-container-toolkit
docker build -t ordinalclip:latest .
By .yaml
file:
conda env create -f environment.yaml
conda activate ordinalclip
pip install -r requirements.txt
# git submodule update --init # git submodule add [email protected]:openai/CLIP.git
pip install -e CLIP/
pip install -e .
Or manually install:
conda create --name ordinalclip python=3.8
conda activate ordinalclip
conda install pytorch==1.10.1 torchvision==0.11.2 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
# git submodule update --init # git submodule add [email protected]:openai/CLIP.git
pip install -e CLIP/
pip install -e .
# pip install setuptools==59.5.0 # https://github.com/pytorch/pytorch/pull/69904
For dev:
pip install bandit==1.7.0 black==22.3.0 flake8-docstrings==1.6.0 flake8==3.9.1 flynt==0.64 isort==5.8.0 mypy==0.902 pre-commit==2.13.0 pytest ipython
pre-commit install
[If you run scripts in docker, click me]
Start Docker container
docker run -itd --gpus all \
-v $(realpath .cache/):/workspace/OrdinalCLIP/.cache \
-v $(realpath data/):/workspace/OrdinalCLIP/data \
-v $(realpath results/):/workspace/OrdinalCLIP/results \
-v $(realpath configs/):/workspace/OrdinalCLIP/configs \
--name ordinalclip \
--shm-size 8gb \
ordinalclip bash
docker exec -it ordinalclip bash
# In the container, run `python ...`
After running, remove container and release resources:
exit # or Ctrl^D
docker rm -f ordinalclip
Single-run mode:
python scripts/run.py --config configs/default.yaml --config configs/base_cfgs/*.yml --config ...
Multi-run mode:
python scripts/experiments/meta_config_generator.py -c $meta_config_file
python scripts/experiments/config_sweeper.py --max_num_gpus_used_in_parallel 8 --num_jobs_per_gpu 1 -d $sweep_config_dir
Visualizing and quantifying ordinality:
CUDA_VISIBLE_DEVICES=-1 find results/ -name 'config.yaml' -exec python scripts/vis_ordinality.py -c {} \;
Parsing results:
python scripts/experiments/parse_results.py -d $result_dir -p 'test_stats.json'
python scripts/experiments/parse_results.py -d $result_dir -p 'ordinality.json'
# or
python scripts/experiments/parse_results.py -T <(find -name 'test_stats.json') -p 'test_stats.json'
python scripts/experiments/parse_results.py -T <(find -name 'ordinality.json') -p 'ordinality.json'
Each experiment has its own name, with several config component names split by "_"
.
The experiment folder includes:
ckpt/
: checkpoints*_logger/
: the logs frompytorch_lightning
config.yaml
: configrun.log
: log the status of model during runningval/test_stats.json
: metrics to be parsedval/val_stats.json
: metrics to be parsedval/ordinality.json
: metrics to be parsed
Please check out releases: https://github.com/xk-huang/OrdinalCLIP/releases/tag/train_eval_logs
Structure of the Codebase (click to expand)
-
ordinalclip
- models
attributes: image_encoder (torchvision model: fp32, CLIP image encoder: fp16), text_encoder (float32, but layer norm are computed in float32), all converted to float32
- prompt learner
- plain prompt learner args: num_ranks, num_tokens_per_rank, num_tokens_for_context, rank_tokens_position, init_rank_path, init_context, rank_specific_context attributes: context_embeds, rank_embeds, pseudo_sentence_tokens,
- rank prompt learner (inherited from plain prompt learner) args: num_ranks, num_tokens_per_rank, num_tokens_for_context, rank_tokens_position, init_rank_path, init_context, rank_specific_context,interpolation_type attributes: weights for interpolation
- prompt learner
- runner
- runner: A wrapper using pl.LightningModule, defines: loss_computation, metrics computation, create_optimizer, lr_scheduler
- data: pl.LightningDataModule
- utils: model io, parameter (un)freeze
- utils
- logging & registry from MMCV
- models
attributes: image_encoder (torchvision model: fp32, CLIP image encoder: fp16), text_encoder (float32, but layer norm are computed in float32), all converted to float32
-
scripts
- run.py To prepare args, use OmegaConf, logging and wandb logger, train/val/test dataloader, model (runner), setup trainer
-
configs
MIT License
Many thanks to the following repositories:
Check out these amazing works leveraging CLIP for number problems!
- CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
- L2RCLIP: Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification
If you find this codebase helpful, please consider to cite:
@article{Li2022OrdinalCLIP,
title={OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression},
author={Wanhua Li and Xiaoke Huang and Zheng Zhu and Yansong Tang and Xiu Li and Jiwen Lu and Jie Zhou},
journal={ArXiv},
year={2022},
volume={abs/2206.02338}
}