This repository implements the BMVC2021 paper Zero-Shot Action Recognition from Diverse Object-Scene Compositions.
This repository includes:
- object and scene predictions for UCF-101, UCF-Sports, J-HMDB
- script to retrieve object and scene predictions for Kinetics
- scripts to obtain word and sentence embeddings for all datasets used and for object-scene compositions
- script to obtain action predictions from any given action dataset, given the object and scene predictions and the respective action labels
- python 3.8.8
- pytorch 1.7.1
- numpy 1.19.2
- fasttext 0.9.2
- sentence-transformers 1.2.0
- scikit-learn 0.24.1
While the action labels and video annotations for Kinetics are already present in the repo, the object and scene predictions need to be retrieved using:
bash kineticsdownload.sh
To compute the word and sentence embeddings for all the video and image datasets run:
python getfasttextembs.py; python getbertembs.py
This will additionally compute the embeddings for all object-scene compositions and the similarities between all action labels and objects-scene compositions.
The main script can be run using the default arguments as follows:
python zero-shot-actions.py
There are several flags that can be used. Descriptions for these can be shown by running:
python zero-shot-actions.py --help
Lastly, a helper function to compute results for different datasets and for different flag values is available:
python make_results.py