Skip to content

[ECCV 2024] Teach CLIP to Develop a Number Sense for Ordinal Regression

Notifications You must be signed in to change notification settings

xmed-lab/NumCLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NumCLIP

This repository contains PyTorch implementation of "Teach CLIP to Develop a Number Sense for Ordinal Regression (ECCV2024)".

Created by Du Yao, Zhai Qiang, Dai Weihang, Li Xiaomeng*

Overview of NumCLIP

The framework of NumCLIP, aiming to teach CLIP to develop a strong number sense for ordinal regression.

intro

Quick Preview

1. Img2Lang Concept

NumCLIP mimics human numerical cognition: mapping an image feature to a language concept first, and then reasoning the number.


This paradigm can be condcuted in a coarse-to-fine manner. From that we elegantly convert an dense regression task into a simple and coarse classification problem, which not only smoothly mitigates the insufficient number caption issue, but also effectively utilises/recalls the pre-trained/available concept alignment learned by CLIP.

2. Cross-modal Ranking-based Feature Regularization

The cross-modal negative samples are pushed away with ordinal label distance alignment.

    def compute_ce_dis_loss(self,logits,y,d):

        list_target = list(range(d))
        target = torch.Tensor(list_target).to('cuda:0')
        target = torch.unsqueeze(target,1)
        ls_weight = []
        for i in range(len(y)):
            label_inv_ranks = (torch.abs(y[i] - target).transpose(0,1))
            label_inv_ranks_norm = (torch.abs(y[i] - target).transpose(0,1)) / torch.sum(label_inv_ranks,dim=1) * (d-1)
            label_inv_ranks_norm = torch.squeeze(label_inv_ranks_norm,0)
            label_inv_ranks_norm[y[i]] = 1.0
            ls_label_inv_ranks_norm = label_inv_ranks_norm.detach().cpu().numpy().tolist()
            ls_weight.append(ls_label_inv_ranks_norm)

        weight = torch.Tensor(ls_weight).to('cuda:0')

        logits_weight = logits * weight
        loss = self.ce_loss_func(logits_weight, y)

        return loss

Requirements

We utilize the code base of OrdinalCLIP. Please follow their instructions to prepare the environment and datasets.

Model Training

Before training the model, move regclipssr.py to ./ordinalclip/models/, and runner_ssr.py to ./ordinalclip/runner/ accordingly.

sh scripts/run_regclipssr.sh

What's More

Check out these amazing works leveraging CLIP for number problems!

Citation

If you find this codebase helpful, please consider to cite:

@article{du2024teach,
  title={Teach CLIP to Develop a Number Sense for Ordinal Regression},
  author={Du, Yao and Zhai, Qiang and Dai, Weihang and Li, Xiaomeng},
  journal={arXiv preprint arXiv:2408.03574},
  year={2024}
}