This repository regroups works on the intermediate machine learning SIB course "Ensuring More Accurate, Generalisable, and Interpretable Machine Learning Models for Bioinformatics".
The course is targeted to life scientists who are already familiar with the Python programming language and have a good grasp of Machine Learning concepts such as K-fold cross-validation, grid-search for hyper-parameter tuning, or tree models.
The course uses jupyter notebooks to go through examples and exercises. See the intructions on installing prerequisite libraries to help you stup you environment for the course.
We advocate the use of conda environment for tidy management of the libraries needed for the course, but the participant may use other methods as long as they are able to make them work.
The course is organized in several "chapters" where the theory is covered in slides, and jupyter notebooks interleave code demo, and exercises.
- Chapter1 : XGBoost models
- Chapter2 : Hyper-parameter tuning with hyperopt
- Chapter3 : Model generalization with nested Cross-validation
- Chapter4 : Model interpretation with SHAP values or lIME
- data : contains the datasets
- notebooks : contains the code demo and exercise notebooks
- slides : pptx / pdf of the course theory
Feel free to re-use and adapt this material for your own purposes.
We only ask that you cite us:
Mueller, M., Tran, T. V. D., & Duchemin, W. (2024, October 15). Ensuring More Accurate, Generalisable, and Interpretable Machine Learning Models for Bioinformatics - October 2024. Zenodo. https://doi.org/10.5281/zenodo.14196882