Authors of the project : Kai Yung TAN (Adam) & Jean Christophe Meunier
- Learning how to design and evaluate a custom made convolutional neural network for practical purposes
- Using CNN models to analyse x ray images
- Designing a CNN capable of recognising pneumonia in x-rays of patients
- Consolidate the knowledge in Python, specifically in : Tensorflow/kerras, NumPy, Pandas, Matplotlib,...
- To be able to search and implement new librairies
- Consolidate knowledge of data science and machine/deep learning algorithm for developping an accurate regression prediction model
- To be able perform appropriate model hyperparametrisation
- A CNN trained on a large x ray dataset (>5k) that can recognise new images outside of the training set
- Proper model evaluation (split dataset, confusion matrix, etc)
- Visualisations of model results (properly labeled, titled...)
- A visualisation of the feature maps of the model
- Comparison with other CNN model structures
- Assessing and comparing
- All the work achieved was done during the BeCode's AI/data science bootcamp 2020-2021
- Research and understand the term, concept and requirement of the project.
- Discover new libraries that can serve the project purposes
- Developing, using and testing machine learning algorithm (i.a. tensorflow/kerras,...)
- Consolidating knowledge on model building and model hyperparametrisation (e.g. type of layers, pooling, dropout, batch normalization, type of activation functions,...)
- Data augmentation
- Aside from that, we also searched documentation on the internet on existing published work and/or studies on x ray data manipulation and modelization, as for example :
The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).
Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.
For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.
https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia
- Examples of data input
-
Image size reduction: original jpg were reduced to size 128 x 128 in order to accelerate data processing during models training
-
Standardisation of the images
-
Data augmentation using CV2 library and the 'ImageDataGenerator' function in order to increase training quality
In total, a number of 17 models were build, trained and compared using various hyperparametrisation (see notebook section):
- depth of the neural network
- type of layers (dense, convolutional,...)
- filters (number, size, padding, etc.)
- type of activation (i.a. relu, leaky-relu, sigmoid, softmax,...)
- dropout
- pooling
- batch normalization
For each model, hyperparametrisation was fine-tuned based on the performance indices on the test data set (624 pictures). When a model reached a satifying accuracy, he was finally rerun on the validation set (16 pictures)
The best fitted model was choosen partly based on previous good performance on train and test data set but mostly on performance on validation data set.
- 8 convolution layers (filters=32/32/32/64/64/64/128/128, kernel_size=(3, 3) activation='Leaky-relu')
- MaxPool2D((2, 2)
- Dropout(0.25) on all layers excepting the last one
- Flatten
- 1 dense layer (1024, activation='relu')
- model.add(Dense(2, activation='sigmoid'))
- Dropout(0.5)
- loss='binary_crossentropy', optimizer='adam'
- shuffle = True
- data augmentation: rotation_range = 20, zoom_range = 0.2, width_shift_range = 0.2, height_shift_range = 0.2, horizontal_flip = True, vertical_flip = True
- Batch size : 16
- Epochs : 100
- Loss and accuracy
- Confusion matrix on test set
- Performance indices on test set
- Confusion matrix on validation set
- Performance indices on validation set
- Further train the model on additional data
- Model optimization: constructing simpler models that reach similar metric performance
- Building a RESTfull API to be deployed on a web based environment (e.g. Heroku, Azure, etc.)
- Completing the API with a web-based interface (e.g. using streamlit) allowing for uploading x ray images to get pneumonia diagnose
- Extending model to include other types of pathologies (i.e. multiclass classification including other respiratory diseases)