This repository includes some classical network architecture of video classification(action recognition). Because of the scale of Kinetics, most of the architectures in this repo have not be tested on kenetics. But the training loss curve seems normal in the training procedure. This project can be regard as a gather of some implementations by PyTorch of the corresponding paper.
PyTorch1.0 visdom (the training procedure can be track in explorer) PIL
To make sure all the sub_modules is clone correctly, use
git clone https://github.com/zeal-github/video_classification_pytorch.git
All the data list is pre-computed in the .json
file, and the dataset loader will first load the .json
file
as the avaliable data samples. Up to now, UCF101
, Kinetics_400
, Kinetics_200
are all avaliable in the dataset loader.
Notice: so far, only RGB frames is consider in this repo.
You can download the preprocessed data directly from [https://github.com/feichtenhofer/twostreamfusion]
cd ./videos_dataset/UCF101
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.001
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.002
wget http://ftp.tugraz.at/pub/feichtenhofer/tsfusion/data/ucf101_jpegs_256.zip.003
cat ucf101_jpegs_256.zip* > jpegs_256.zip
unzip jpegs_256.zip
Since the Kinetics is too large and we can only download the data use the official crawler.
You can refer to this repo, download the .avi
data using the official crawler and
convert the videos data to .jpg
frames.
You can use code in ./pt_dataset
to create a json
data which contains the datalist of the
training and validation data. Each element in datalist.json
files contains 4 items:
['path':frame directory, 'class_name':classname, 'label':label, 'num_frames':number of frames of this video].
All the networks and the pretrained models are contained in the ./models
.
Original paper: "Quo Vadis,Action Recognition? A New Model and the Kinetics Dataset" This code is based on : piergiaj/pytorch-i3d
The pretrained model is provided by piergiaj/pytorch-i3d. The pretrained model
is pretrained on ImageNet
and Kinetics
as reported in the paper.
Original paper: "Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification This code is based on : qijiezhao/s3d.pytorch
The pretrained model is the model in 4.1 I3D
. Only the 2D filters will transfer parameters from the pretrained I3D
model.
Original paper: "Non-local Neural Networks"
To stay the same as the original paper, I inflate the Residual block
for every two Residual block
to saving the computation cost.
3 inflate mode can be choosed(must mannualy specify in config.py):
0 : is the baseline model in the paper.
1 : inflate the 1x1 convolution in Residual block to 3x1x1 convolution
2 : inflate the 1x1 convolution in Residual block to 3x3x3 convolution
Original paper: "Temporal Segment Networks for Action Recognition in Videos" This code is mostly copy from yjxiong/tsn-pytorch
All the avaliable options in contains in ./opts.py
file. All the options can be specify as argument when start training use python main.py
.
By default, all gpus will be used for training. To specify avaliable GPU, use:
CUDA_AVALIBLE_DEIVDES=0,1...
If you are a researcher of video learning and is interesting in share some code. It's welcome to pull request. If you find come unreasonable arangement in this code, or some new architecture you want, just raise a issue.