Implementation for the TReNDS neuroimaging Kaggle Competition.
I decided to concentrate on the images dataset. Unfortunately, they are not well suited to gain very high scores - not without the tabular data. This is why I decided to move on to another competition, where the visual inputs are effectively used to understand the problem. For this reason, this repository's state is stopped.
However, I was glad to learn many concept with which I was unfamiliar with and I'm going to explain the main gains I obtained.
The whole data is made up of ~460GB of 3D fMRI ICA scans.
This data is therefore composed of 5.877 train and 5.877 test volumes of dimension (53,52,63,53)
- spatial dimension (52,63,53)
.
Each volume is stored inside a h5py
dataset, float64
datatype, which is a greater limitation with parallel reading from disk:
I decided to convert the whole dataset to a float32
torch.Tensor
.
In this way I was able to limit the overall dimension (~500GB) and to use the torch.load
method - which is way more faster then the h5py
one.
In addition, I normalized and standardized the dataset with norm and variance, so there is no need to calculate it online.
I also tried other approaches - as discussed here - but they were less efficient.
I managed the dataset with the file manage_dataset.py
.
To load the entire dataset I used the Dataloader
PyTorch API, which let me load and transform the dataset in a very fast and customizable way.
The dataset is customized so only the strictly necessary data is passed to the model at a time - keeping the change to load very different type of data with a single flag. I tried many different approaches with the MONAI library, which let me modify the images without effort:
- translations;
- pixel shifting and scaling;
- random rotating;
- Gaussian noise;
- resizing;
- cropping.
I found that the CPU was severely used in these processes, so the use of a secondary GPU would have been better in order to gain speed.
The approaches are easily accessible in the file dataset.py
.
I followed and trained three different network flavours: straight CNN, CNN with siamese, and their sparse variation. In addition, I tried a classification with a VAE as regularization term, but I didn't invest much time in its training since it was a complex approach to the problem.
I used various complexity of the ResNet architecture to train my model, taking as "features channel" the dimension of the independent components stacked on the first dimension of the tensor (53)
.
This approach led me to the most promising results - 0.714
on LB score -, with a ResNet10 3D.
In this approach, I interpreted the independent components as different images, from which the network would be able to understand the differences - and the correlations - between them. This approach is relatively more difficult to train and very GPU memory demanding. The main bottleneck was the GPU memory, which is filled even with small CNN networks.
The fMRI ICA images are relatively sparse - with a threshold of (-3,3), they are ~95% zeros - so I decided to implement the FacebookResearch SparseConvNet library. Unfortunately I definitely didn't find any advantage over the use of its dense representation. This experiment has been really helpful in the understanding of the sparse representation - and in how to deal with color channels and batch size while using those kind of libraries.
The custom collate_fn
to produce the right data are attached to the network class inside the SparseResNet.py
file, together with the networks implementations.
To understand how to find the optimal learning rate - given a set of other hyperparameters - I decided to follow the indication of the paper regarding the cyclic learning rates for neural networks. This research gave me an important boost in the development of new architectures - and to rapidly understand the learning capacity of each one. In particular, I made use of the library torch-lr-finder, which is a easy-to-learn tool to apply the method described in the paper.
I decided to use the half precision for my models in order to guarantee higher batch sizes. Therefore, I used apex which is a tools for easy mixed precision and distributed training in Pytorch. More on the attached link.
In order to test and train with those experimental frameworks, I decided to embrace the docker philosophy and I used the NVidia-docker with the help of the PyTorch image.
I tested the above configuration on this machine:
- CPU: AMD 3600X
- GPU: NVidia 2070s
- SSD: 1TB Samsung 970 evo plus
- RAM: 64GB GSkill 3200MHz