- About me
- What is it?
- Why?
- My schedule
- Methodology
- Implementation
- My result
- After this camp
- Summary
- References
- Acknowledgement
Welcome to my MLC GitHub page. My name is Kiet Nguyen Tuan (Ambrose) . I wrote these lines when I was a senior student at Hoasen University, Vietnam. Because of special love with Machine Learning, so I have come to this ML Camp hosted at Jeju National University to do my project. I also want to meet new friends from other countries. It was a experiement I never forget.
In this project, I would like to develop a collection of model that can use to recognize food and its ingredients. For me, I create these models for Vietnamese food first. Developers in other countries can contribute their national food models to my collection. After that, every people can use these models free, so that apply to their app.
When I found a idea for this camp, I realised that food is an important part of country's culture so that every people should have knowledge about food of place where they stay at. Moreover, disable people, specially blind people also need help to know what food they eat. For foreigner, I will be an useful food's guideline at place their visit and also be an assitant to track what their eat and suggest what they should or should not eat.
Due to shortly time of this camp (July 22 to August 7, 2018) and I do not have enough data, I cannot finish all project at the camp. But I still continue to finish.
Here is my plan
Day | Date | What I did |
---|---|---|
1 | July 22, 2018 | I came to Jeju island lately due to a storm at Shanghai. |
2 | July 23, 2018 | I prepared an introduction presentation and attended MLC Opening Ceremony. |
3 | July 24, 2018 | I visited Jeju Development Center and had a seminar about blockchain with AI. |
4 | July 25, 2018 | I reviewed all my project which I have prepared at home and viewed Google Cloud Platform tutorial. |
5 | July 26, 2018 | I re-designed UI for Android App and applied Firebase. |
6 | July 27, 2018 | I checked my dataset for food detection and downloaded more images from Google Image. |
7 | July 28, 2018 | I retrained food detection model, it could recognize about 10 Vietnamese food (Bun, pho, com,...). |
8 | July 29, 2018 | It was a weekend so I had around trip at Jeju island. |
9 | July 30, 2018 | I prepared dataset for ingredient detection. |
10 | July 31, 2018 | I tried to train object detection model (ingredient detection) on local PC but it was slowly. |
11 | Aug 1, 2018 | I used Google Cloud Platform, I created two Virtual Machine, one at Japan and the other at Taiwan. The VM at Taiwan had large CPU and Memory (vCPU: 8, Memory: 52GB) so it was a little bit faster than my laptop. |
12 | Aug 2, 2018 | I returned to my laptop. Installed tensorflow-gpu and tried to use my GPU card NVIDIA GT 740M. I trained on my laptop and it was the fastest (about 1 second per training step). After 2 hours, my GPU card was broken down, I could not use it after. |
After researched machine learning and implemented libraries, I refer to use TensorFlow with Python programming language and Java/Android for my mobile app. Nowaday, I suppose that TensorFlow is one of the best frameworks for Machine Learning and also Deep Learning. Moreover, TensorFlow had large developer community which ready to help when I go to problems.
Project has 3 parts:
- Food Recognition: It is a model to recognize food in a photo in summary. In my case, it returns result like Bun, Pho, Com, Banh-mi etc. I have downloaded above 600 photo per food on Google Image. After that, I have deleted unneccesary images and just keep correct images. I used these images to train the model.
- Ingredient Detection: According to result which I have after applied Food Recognition model I use ingredient Detection model to detect ingredient in food one by one, so that I can calculate its nutrition, predict its taste. Because of kind variation of food, each food has Ingredient Detection model differently. For example, Pho is a popular food in Vietnam, and its ingredient change its nutrition a lot.
Pho-bo | Pho-ga |
---|---|
The first photo is Pho-bo means Pho with beef. The second is Pho-ga means Pho with chicken. All of them is Pho, but they have different ingredient. Because of nutrition difference of beef and chicken, so Pho-bo and Pho-ga have different nutrition. So Ingredient Detection is an important step to solve this problem which is common in anothor food's culture.
- Mobile App: This is a place where I apply these models all together so that reach project target. I developed an Android app with TensorFlow Lite for mobile. The mobile app capture food photo, then, it uses these models to extract what food and ingredient, finally, it returns all food's information (Name, original, ingredient, taste prediction, nutrition values,...). To help blind people, I use module text-to-speed, so that translate these information into voice, and blind people can hear it.
In this section, I would like to show you what I have done and how to continue development this project step by step.
In this project, you should install some application which is shown in below table.
Tool | Description | Link |
---|---|---|
Python is main language to train models | https://www.python.org/downloads/ | |
TensorFlow framework supports all things in ML/DL. If your PC have GPU card, you should install TensorFlow-GPU version to get high performance | https://www.tensorflow.org/install/ | |
In Windows, some package in unvailable, so you should have Anaconda to install them | https://conda.io/docs/user-guide/install/index.html |
There are three important tools you have to install first, some small tool I will show you after.
There are a lot of ways to collect photos. For me, I refer to collect them on Google Image, because it is the largest search engine, so it contains a lot of photos. The simple way is use a tool that let you download images automatically based on keywords. I have used this tool
. Each food should have different folders. Notice that the folder name is also the label of food, so please check it carefully.
└───vietnamese_food
├───background
├───banh bao
├───banh mi
├───bo bit tet
├───bun bo
├───com dia
├───dau hu
├───mi xao
├───rau xao
├───thit kho tau
└───trung op la
There are 10 common food in Vietnam and backgroud to recognize uneatable things. Here is my sample config
{
"Records": [
{
"keywords": "bun bo",
"limit":600,
"type":"photo",
"format":"jpg",
"output_directory":"vietnamese_food/",
"chromedriver":"chromedriver.exe"
},
{
},...
]
}
After downloading process complete. You need to review all photo, delete error photos or out of topic photo before start training process.
In this step, you will crop ingredients in food based on photos which you have downloaded in previous step. Before that, you should classify food into groups which have common features. Each group will have a different model. For example, Bun (or noodle rice) in Vietnam has a lot of kinds, so I grouped them into rice-noodle group. This group contains Bun-bo (beef rice noodle), Bun-moc (meatball rice noodle), Bun-ga (chicken rice noodle), etc.. So I would like to put all images of rice-noodle group into rice-noodle folder. After that, I have used LabelImg to crop ingredients in these images.
After that, dataset folder should have original images and its .xml files which save all information about ingredient boundary rectangle. These .xml files will be converted to csv files. You should take a subset of dataset for test, its about 10% to 20% of dataset. Move these test data into test folder and train data into train folder.
Please download FoodRecognition branch in this Git Repository. I have prepared neccessary python script to train Food Recognition model. Download and unzip it, you will have folder structure below
│
├───ImageRecognizer
│ classify_image.py
│ label_image.py
│ retrain.py
Run PowerShell and type this command
cd <YOUR_IMAGE_RECOGNIZER_DIRECTORY>
To start training process
python retrain.py --image_dir <DIRECTORY_TO_DATASET> --tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/1
I haved used mobilenet v2 model, so at tfhub_module I used this link to download model. When training process complete, you will have a model in folder /tmp/ at root directory. Your model contains two file "output_graph.pb" and "output_labels.txt". You need copy them to another place. To use it, run label_image.py
python label_image.py --graph=<DIRECTORY_TO_GRAPH_FILE> --labels=<DIRECTORY_TO_LABELS_FILE> --input_layer=Placeholder --output_layer=final_result --input_height=224 --input_width=224 --image=<YOUR_IMAGE_FILE>
Download IngredientDetection branch to your PC and unzip it. You should focus to folder models/object_detection.
Edit file models/object_detection/training/labelmap.pbtxt. This file contains all ingredient label so you should edit it to suitable with your case.
item {
id: 1
name: '<Label 1>'
}
item {
id: 2
name: '<Label 2>'
}
.
.
.
item {
id: n
name: '<Label n>'
}
Run command to create csv file from dataset.
# In object_detection folder
python xml_to_csv.py
Edit file models/generate_tfrecord.py from line 32. This file help you create tfrecord file which is structured file TensorFlow can understand.
# TO-DO replace this with label map
def class_text_to_int(row_label):
if row_label == 'Label 1':
return 1
elif row_label == 'Label 2':
return 2
.
.
.
elif row_label == 'Label n':
return n
else:
return 0
Run these command to create tfrecord files.
python generate_tfrecord.py --csv_input=<TRAIN_FILE_CSV> --image_dir=<TRAIN_FOLDER> --output_path=train.record
python generate_tfrecord.py --csv_input=<TEST_FILE_CSV> --image_dir=<TEST_FOLDER> --output_path=test.record
Edit config file of your model which you want to train. For me, I used Inception V2, so I download model from Here, copy unzip folder to models/object_detection, then I edit file models/object_detection/training/faster_rcnn_inception_v2_pets.config at:
- Line 9 : num_classes: <NUMBER_OF_LABEL>
- Line 106: fine_tune_checkpoint: "object_detection/<MODEL_DIRECTORY>/model.ckpt"
- Line 122: input_path: "<TRAIN_RECORD_FILE>"
- Line 124: label_map_path: "<LABELMAP_FILE>"
- Line 136: input_path: "<TEST_RECORD_FILE>"
- Line 138: label_map_path: "<LABELMAP_FILE>"
Before start training, you should compile protobuf file. Follow these commands to do that.
conda create -n tensorflow1 pip python=3.5
activate tensorflow1
pip install --ignore-installed --upgrade tensorflow-gpu
conda install -c anaconda protobuf
In models/ folder
protoc --python_out=. .\object_detection\protos\anchor_generator.proto .\object_detection\protos\argmax_matcher.proto .\object_detection\protos\bipartite_matcher.proto .\object_detection\protos\box_coder.proto .\object_detection\protos\box_predictor.proto .\object_detection\protos\eval.proto .\object_detection\protos\faster_rcnn.proto .\object_detection\protos\faster_rcnn_box_coder.proto .\object_detection\protos\grid_anchor_generator.proto .\object_detection\protos\hyperparams.proto .\object_detection\protos\image_resizer.proto .\object_detection\protos\input_reader.proto .\object_detection\protos\losses.proto .\object_detection\protos\matcher.proto .\object_detection\protos\mean_stddev_box_coder.proto .\object_detection\protos\model.proto .\object_detection\protos\optimizer.proto .\object_detection\protos\pipeline.proto .\object_detection\protos\post_processing.proto .\object_detection\protos\preprocessor.proto .\object_detection\protos\region_similarity_calculator.proto .\object_detection\protos\square_box_coder.proto .\object_detection\protos\ssd.proto .\object_detection\protos\ssd_anchor_generator.proto .\object_detection\protos\string_int_label_map.proto .\object_detection\protos\train.proto .\object_detection\protos\keypoint_box_coder.proto .\object_detection\protos\multiscale_anchor_generator.proto .\object_detection\protos\graph_rewriter.proto
Then run this command
protoc object_detection/protos/*.proto --python_out=.
We are ready for training. To start, run this command
python train.py --logtostderr --train_dir=object_detection/training --pipeline_config_path=object_detection/training/<YOUR_MODEL_CONFIG_FILE>
Wait for training, In my case, I haved use GPU card, it consumes about 1 second per step. I run above 3000 steps and stop. Here is my result
To finish, we need extract model from checkpoint file by using export_inference_graph.py
python export_inference_graph.py --input_type image_tensor --pipeline_config_path training/<YOUR_MODEL_CONFIG_FILE> --trained_checkpoint_prefix training/model.ckpt-<HIGHEST_NUMBER> --output_directory inference_graph
Your final model will be saved in folder models/inference_graph
To use the model, edit file Python models/object_detection/Object_detection_image.py at lines
- line 34: IMAGE_NAME = '<INPUT_DIRECTORY>'
- line 50: NUM_CLASSES = <NUMBER_OF_INGREDIENTS>
Save and run it
python Object_detection_image.py
Project is developing, I show you current result. It will be updated continuously.
- Food Recognition: I have trained food recogntion model for 10 basic Vietnamese food (Bun, Com, Pho,...). Here is a cross entropy graph.
Training process have done with result:
Test accuracy is 78.8%, it is not the best, because I do not have enough dataset. To improve it, I would like to increase number of photo in dataset about 1000 photos per food.
Below is test result:
Photo | Target | Output | Result |
---|---|---|---|
bun bo | bun bo: 0.99073255 | ||
banh mi | banh mi 0.99794585 | ||
com | com 0.85801125 | ||
banh bao | banh bao 0.99786466 |
- Ingredient Detection: Because I do not have enough photo (just 100 photos) so model still has low accuracy.
Here is graphs
- Mobile App: It is unfinish. Now, it can capture photo and recognize food, then show food information. Here is some screenshots
After this camp, my project is still unfinish. So I have to continue my work. What I will do is shown below
.No | Task |
---|---|
1 | Prepare more data for Food Recognition. Not only 10 Vietnamese food, but also more food. |
2 | Prepare more data for Ingredient Detection. I will classify these food into a lot of subclass. I will collect 500 photos per subclass. |
3 | Try anothor models such as mobilenet v2,... |
4 | Finish food tracking function in Android app. |
5 | Create app for iOS platform. |
6 | Improve accuracy and performance. I would like to serve as a service via Web API, so these model will be stored and ran on server. |
All in all, This project was complete 70%. During the camp, I had more experience in work with Machine Learning and learnt how to do teamwork effeciently. Moreover, I had new friends from other countries, so we can share knowledge and experience together. The camp was really worthy and interesting.
- Image recogntion: https://www.tensorflow.org/hub/tutorials/image_retraining
- Object detection: https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10
- NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment Simon Mezgec 1,* and Barbara Koroušic´ Seljak 2
- Deep Learning Based Food Recognition (Qian Yu Stanford University [email protected], Dongyuan Mao, Stanford University [email protected], Jingfan Wang Stanford University [email protected])
- FOODIMAGERECOGNITIONUSINGDEEPCONVOLUTIONALNETWORK WITHPRE-TRAININGANDFINE-TUNING (Keiji Yanai, Yoshiyuki Kawano)
This project is not possible without the overwhelming suppport from Jeju National University, Jeju Developement Center and other selfless sponsors. I would like to specifically give a big thanks to Prof. Yungcheol Byun for being the best host ever and my mentor Dr.Lap Nguyen Trung for the help and guidance.