Author: Ignacio Heredia (CSIC)
Project: This work is part of the DEEP Hybrid-DataCloud project that has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 777435.
This is a plug-and-play tool to train and evaluate an image classifier on a custom dataset using deep neural networks.
You can find more information about it in the DEEP Marketplace.
Table of contents
- Installing this module
- Train a new image classifier
- Test an image classifier
- More info
- Acknowledgements
Requirements
This project has been tested in Ubuntu 18.04 with Python 3.6.5. Further package requirements are described in the
requirements.txt
file.
- It is a requirement to have Tensorflow>=1.14.0 installed (either in gpu or cpu mode). This is not listed in the
requirements.txt
as it breaks GPU support.- Run
python -c 'import cv2'
to check that you installed correctly theopencv-python
package (sometimes dependencies are missed inpip
installations).
To start using this framework clone the repo and download the default weights:
# First line installs OpenCV requirement
apt-get update && apt-get install -y libgl1
git clone https://github.com/ai4os-hub/ai4os-image-classification-tf
cd image-classification-tf
pip install -e .
curl -o ./models/default_imagenet.tar.xz https://api.cloud.ifca.es:8080/swift/v1/imagenet-tf/default_imagenet.tar.xz
cd models && tar -xf default_imagenet.tar.xz && rm default_imagenet.tar.xz
now run DEEPaaS:
deepaas-run --listen-ip 0.0.0.0
and open http://0.0.0.0:5000/ui and look for the methods belonging to the imgclas
module.
We have also prepared a ready-to-use Docker container to run this module. To run it:
docker search deephdc
docker run -ti -p 5000:5000 -p 6006:6006 -p 8888:8888 ai4oshub/ai4os-image-classification-tf
Now open http://0.0.0.0:5000/ui and look for the methods belonging to the imgclas
module.
You can train your own audio classifier with your custom dataset. For that you have to:
The first step to train you image classifier if to have the data correctly set up.
Put your images in the./data/images
folder. If you have your data somewhere else you can use that location by setting
the image_dir
parameter in the training args.
Please use a standard image format (like .png
or .jpg
).
First you need add to the ./data/dataset_files
directory the following files:
Mandatory files | Optional files |
---|---|
classes.txt , train.txt |
val.txt , test.txt , info.txt |
The train.txt
, val.txt
and test.txt
files associate an image name (or relative path) to a label number (that has
to start at zero).
The classes.txt
file translates those label numbers to label names.
Finally the info.txt
allows you to provide information (like number of images in the database) about each class.
You can find examples of these files at ./data/demo-dataset_files
.
Tip: Training is usually depend on the training args you use. Although the default ones work reasonable well, you can explore how to modify them with the dataset exploration notebook.
Go to http://0.0.0.0:5000/ui and look for the TRAIN
POST method. Click on 'Try it out', change whatever training args
you want and click 'Execute'. The training will be launched and you will be able to follow its status by executing the
TRAIN
GET method which will also give a history of all trainings previously executed.
If the module has some sort of training monitoring configured (like Tensorboard) you will be able to follow it at http://0.0.0.0:6006.
Go to http://0.0.0.0:5000/ui and look for the PREDICT
POST method. Click on 'Try it out', change whatever test args
you want and click 'Execute'. You can either supply a:
- a
data
argument with a path pointing to an image.
OR
- a
url
argument with an URL pointing to an image. Here is an example of such an url that you can use for testing purposes.
You can have more info on how to interact directly with the module (not through the DEEPaaS API) by examining the
./notebooks
folder:
-
dataset exploration notebook: Visualize relevant statistics that will help you to modify the training args.
-
computing predictions notebook: Test the classifier on a number of tasks: predict a single local image (or url), predict multiple images (or urls), merge the predictions of a multi-image single observation, etc.
-
predictions statistics notebook: Make and store the predictions of the
test.txt
file (if you provided one). Once you have done that you can visualize the statistics of the predictions like popular metrics (accuracy, recall, precision, f1-score), the confusion matrix, etc. -
saliency maps notebook: Visualize the saliency maps of the predicted images, which show what were the most relevant pixels in order to make the prediction.
Finally you can launch a simple webpage to use the trained classifier to predict images (both local and urls) on your favorite browser.
If you consider this project to be useful, please consider citing the DEEP Hybrid DataCloud project:
García, Álvaro López, et al. A Cloud-Based Framework for Machine Learning Workloads and Applications. IEEE Access 8 (2020): 18681-18692.