During my internship, I worked on brain age prediction [1, 2] with the Omega dataset [3, 4]. Unfortunately, we didn't obtain good scores for this task with Omega (see my internship report for details here). We wanted to understand why age prediction didn't work: did we apply wrong preprocessing or was the dataset just not adapted for this task?
We thus decided to reproduce some results from other papers using Omega with our preprocessing. To do this, we chose [5] in which the authors use MEG resting-state recordings from Omega to detect ADHD based on Brain Functional Connectivity.
In [5], the authors present a classification between Control subjects and subjects with ADHD. However, in the version of Omega that I used, I didn't have any information on ADHD and the different classes were Control, Parkinson and Chronic Pain. I thus decided to make a classification between Control and Parkinson's using Omega.
To do that, I first wrote a function (in get_data_omega.py) to collect Omega data, compute the coherence (connectivity) and get the class for each subject. At first, I used the function spectral_connectivity_epochs from MNE [8] and MNE-Connectivity [9] but the resulting coherence matrices were not as expressive as the in the paper. I thus wrote a function in compute_coh.py to compute the coherence with the same formula as in [5] :
For each band
with
With the coherence computed like this, the matrices were still pretty bad. To see if it was a dataset problem, I wanted to test other ones. Some other studies have been made on the same task with MEG datasets [11, 12] but the datasets were not available. We thus decided to test the model on an EEG dataset that we had [10]. The code to get the data is available in get_data_ds.py. Furthermore, other studies using EEG datasets for classification between Control subjects and subjects with Parkinson's are available [13, 14], which makes it possible to compare the results found in this second task.
In select_features.py, I use Neighborhood Component Analysis (NCA) for feature selection. I use the package ncafs from [6] that implements NCA feature selection presented in [7] and which is used in [5]. To select the features, I run a leave-one-out and fit the NCA on the training set. I store the 5 selected features at each fold and, at the end, select the features that appear in at least 50% of the folds. If fewer than 5 features are selected at the end, I take the most selected features to have 5 features to use.
I made a function to get the name of the selected features in get_features.py to compare the features with the paper.
Finally, in model_classif.py, I wrote the function to test the three different models : SVM with RBF, KNN with k = 3 and Decision trees with a leave-one-out cross-validation.