Classification Matters: Improving Video Action Detection with Class-Specific Attention

Paper | Project Page

This repository is the official implementation of "Classification Matters: Improving Video Action Detection with Class-Specific Attention" (ECCV 2024 Oral)

Classification Matters: Improving Video Action Detection with Class-Specific Attention
Jinsung Lee¹, Taeoh Kim², Inwoong Lee², Minho Shim², Dongyoon Wee², Minsu Cho¹, Suha Kwak¹
POSTECH¹, NAVER Cloud²
accepted to ECCV 2024 as an oral presentation

Detection Result	Classification Attention Map 1	Classification Attention Map 2
$~~~~~~$ $~~~~~~$	(talk to)	(listen to)
$~~~~~~$ $~~~~~~$	(answer phone)	$~~~~$ (listen to) $~~~~$

Installation

The code works on

Ubuntu 20.04
CUDA 11.7.0
CUDNN 8.0.5
NVIDIA A100 / V100

Install followings,

Python: 3.8.10
GCC 9.4.0
PyTorch: 2.0.0

and run the installation commands below:

pip install -r requirements.txt
cd ops
pip install .

Data Preparation

Refer here for AVA preparation. We use updated annotations (v2.2) of AVA. Download annotation assets and place it outside the project folder (../assets).

Refer here for UCF101-24 preparation.

Refer here for JHMDB51-21 preparation.

Running commands

Our model is trained in two steps: (following TubeR)

First, it is trained from scratch. Second, it is trained again, but it uses the transformer weights acquired from the first stage.

For convenience, we provide the pre-trained transformer weights of the first stage that are used to train the model.

Evaluation Code

## Evaluate

# AVA 2.2
python3 evaluate.py --pretrained_path={path to the model to evalute} --config-file=./configuration/AVA22_CSN_152.yaml
python3 evaluate.py --pretrained_path={path to the model to evalute} --config-file=./configuration/AVA22_ViT-B.yaml
python3 evaluate.py --pretrained_path={path to the model to evalute} --config-file=./configuration/AVA22_ViT-B_v2.yaml

# UCF
python3 evaluate.py --pretrained_path={path to the model to evalute} --config-file=./configuration/UCF_ViT-B.yaml

# JHMDB (split 0)
python3 evaluate.py --pretrained_path={path to the model to evalute} --config-file=./configuration/JHMDB_ViT-B.yaml --split 0

Model Zoo

Backbone .pth files are the same ones from here (CSN152) and here (ViT-B). We offer this link for the aggregated backbone .pth files.

Dataset	Backbone	Backbone pretrained on	transformer weights	f-mAP	v-mAP	config	checkpoint
AVA 2.2	CSN-152	K400	link	33.5	-	config	link
AVA 2.2	ViT-B	K400	link	32.9	-	config	link
AVA 2.2	ViT-B	K400, K710	link	38.4	-	config	link
UCF	ViT-B	K400	link	85.9	61.7	config	link
JHMDB (split 0)	ViT-B	K400	link	88.1	90.6	config	link

Acknowledgments

Our code is based on DETR, DAB-DETR, Deformable-DETR, and TubeR. If you use our model, please consider citing them as well.

License

Class Query
Copyright (c) 2024-present NAVER Cloud Corp.
CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
configuration		configuration
datasets		datasets
evaluates		evaluates
models		models
ops		ops
pipelines		pipelines
utils		utils
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
evaluate.py		evaluate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification Matters: Improving Video Action Detection with Class-Specific Attention

Paper | Project Page

Installation

Data Preparation

Running commands

Evaluation Code

Model Zoo

Acknowledgments

License

About

Releases

Packages

Languages

License

naver-ai/class-query-vad

Folders and files

Latest commit

History

Repository files navigation

Classification Matters: Improving Video Action Detection with Class-Specific Attention

Paper | Project Page

Installation

Data Preparation

Running commands

Evaluation Code

Model Zoo

Acknowledgments

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages