In this work we present an original, publicly available dataset for film shot type classification that is associated with the distinction across 10 types of camera movements that cover the vast majority of types of shots in real movies. We propose two distinct classification methods that can give an intuition about the separability of these categories.
Two different methods are evaluated; one static, which is based on aggregated statistics on the feature sequence, and one sequential, that tries to predict the target class based on the input frame sequence. The former adopts an SVM algorithm with the appropriate data normalization and parameter tuning, while for the latter, an LSTM architecture was chosen. In order to obtain features representing the visual characteristics of a movie shot (for the .mp4 files), the multimodal_movie_analysis repo was used.
git submodule init
git submodule update
Clones the "multimodal_movie_analysis" repo for the feature extraction process
sudo apt install ffmpeg
pip3 install -r requirements.txt
By combining different shot categories; four (one binary and three multi-label) classification tasks are defined.
Tasks
Task | Classes |
---|---|
2_class | Non_Static (818 shots) Static (985 shots) |
3_class | Zoom (152 shots) Static (985 shots) Vertical_and_horizontal_movements (342 shots) |
4_class | Tilt (89 shots) Panoramic (253 shots) Static (985 shots) Zoom (152 shots) |
10_class | Static (985 shots) Panoramic (207 shots) Zoom in (51 shots) Travelling_out (46 shots) Vertical_static (52 shots) Aerial (51 shots) Travelling_in (55 shots) Vertical_moving (37 shots) Handheld (273 shots) Panoramic_lateral (46 shots) |
The experiments were conducted using the dataset mentioned above after creating 4 different tasks:
-
The
2_class
task includes the Static and Non-static classes. The former consists of shots that have been annotated as static, while the latter contains all the classes from the original dataset that are associated with any type of camera movement. That is the corresponding sub-classes are: Panoramic Lateral, Vertical Static, Zoom-in, Handheld, Aerial, Vertical Moving, Panoramic, Travelling-in, Travelling-out. -
The
3_class
task includes the Static, Zoom and Vertical & Horizontal Movements. The Static class is the one that was used in the aforementioned binary task. The Zoom class consists of the Zoom-in, Travelling-in and Travelling-out sub-classes, which all contain shots in which the perimeter image changes at very fast intervals, while the centre image remains static or changes at a slower rate. The Vertical & Horizontal Movements class consists of the Vertical Static, Vertical Moving, Panoramic and Panoramic Lateral sub-classes from the original dataset, where the position of the camera is moving either in a vertical or in a horizontal way. -
The
4_class
task includes the Static and Zoom classes of the 3-class problem, while the Vertical & Horizontal Movements class was separated into 2 sub-classes; Tilt, which includes all vertical movements and consists of the Vertical Static and Vertical Moving original classes, and Panoramic that contains shots with lateral movements and consists of the Panoramic and Panoramic Lateral original classes. -
The
10-class
task includes all provided classes from the original dataset; Static, Panoramic, Zoom-in, Travelling-out, Vertical Static, Aerial, Travelling-in, Vertical Moving, Handheld and Panoramic Lateral.
The LSTM model was trained for the 4 classification tasks that were mentioned above using the sequential_features.
i.e. For the 3-class task:
cd src
python3 train.py -v home/3_class/Zoom home/3_class/Static home/3_class/Vertical_and_horizontal_movements
where "home/3_class/<class_name>" is the full path of the class-folder, containing the .mp4 files
To get aggregated results for a specific number of folds use the flag "-f". For example, for 10-folds:
python3 train.py -v home/3_class/Zoom home/3_class/Static home/3_class/Vertical_and_horizontal_movements -f 10
The following files will be saved:
best_checkpoint.pt
the best model3_class_best_model.pkl
the model's parameteres & hyperparameters
Four pretrained models are saved in the pretrained_models folder. Each one can be loaded for inference. While in /src
folder:
python3 inference.py -i <input> -m <../pretrained_models/2_class_best_checkpoint.pt>
,where <input> is the full path of the .mp4 file or a folder of .mp4 files that you want to classify, and <../pretrained_models/2_class_best_checkpoint.pt> is the path of the pretrained model you want to use for the prediction.