This repository can be used to run and compare different models for the classification and segmentation of Snow Micro Pen (SMP) profiles.
The SMP is a fast, high-resolution, portable snow hardness measurement device. The automatic classification and segmentation models can be used for the fast analysis of vast numbers of SMP profiles. For more information about the background of snow layer segmentation and grain type classification please refer to the related publicatoins. Throughout the project the SMP dataset collected during the MOSAiC expedition was used. The plots and results of the different models can be reproduced with this repository.
- SMP data: Available on PANGAEA
- Trained models: Available on Zenodo
- About the SMP: SLF Website
- About MOSAiC: MOSAiC Website
- Contact: [email protected]
- Bsc thesis: "Automatic Snow Classification − A Comparison of Machine Learning Algorithms for the Segmentation and Classification of Snow Micro Penetrometer Profiles" by Julia Kaltenborn
- “A Comparison of Machine Learning Algorithms for the Segmentation and Classification of Snow Micro Penetrometer Profiles on Arctic Sea Ice”, J. Kaltenborn, V. Clay, A. R. Macfarlane, J. M. King, M. Schneebeli, Data Science and Machine Learning for Cryosphere and Climate, EGU General Assembly, 2021. (Abstract)
- “ML for Snow Stratigraphy Classification”, J. Kaltenborn, V. Clay, A. R. Macfarlane and M. Schneebeli, Tackling Climate Change with AI workshop, NeurIPS, 2021. (Presentation & Paper)
data/
: Preprocessed SMP profiles as npz filesdata_handling/
: Scripts to preprocess and clean the raw SMP profilesmodels/
: Scripts to create, use and store modelsoutput/
: Output and results are stored here. Subdirevaluation/
contains plots for each model and profile.plots_data/
contains plots giving an overview over the data.plots_results/
contains plotted results.predictions/
is where predictions are stored.scores/
contains all the scores.tuning/
: Scripts to tune modelsvisualization/
: Scripts to create plots
This repository runs on Python 3.6. For a quick setup run pip install -e .
The required packages can also be installed with pip install -r requirements.txt
. If wished, create an environment beforehand (eg: conda create --name=snowdragon python=3.6
).
The repository does not contain the MOSAiC data used here. The data is publicly available here: https://doi.org/10.1594/PANGAEA.935934. Any other SMP dataset can be used as well with this repository to train models for automatic SMP classification and segmentation.
If you would like to use our pre-trained models instead of training the models yourself, you can download them here: https://doi.org/10.5281/zenodo.7063520.
To preprocess all SMP profiles, run:
python -m data_handling.data_loader [path_npz_file] --smp_src [path_raw_smp_data] --exp_loc [path_dir_smp_preprocessed]
[path_npz_file]
: Path and name of the npz file where the complete preprocessed SMP dataset is stored. For example:data/all_smp_profiles.npz
[path_raw_smp_data]
: Path to the directory where the raw SMP data is stored[path_dir_smp_preprocessed]
: Path to the directory where each single preprocessed SMP profile will be stored. For example:data/smp_profiles
To get information about an already preprocessed data set, run:
python -m data_handling.data_loader [path_npz_file] --load_only --test_print
For explanations of further preprocessing options run python -m data_handling.data_loader -h
.
For smooth default usage, set the SMP_LOC
in data_handling/data_parameters.py
to the path where all raw SMP data is stored.
Tuning can be skipped. The default hyperparameters of all models are set to the values which produced the best results for the MOSAiC SMP dataset.
To run tuning, run first model evaluation to create a split up (training, validation, testing) and normalized dataset. The results are saved e.g. in data/preprocessed_data_k5.txt
. To tune all models simply run the prepared bash script:
bash tuning/tune_models.sh [path_results_csv]
[path_results_csv]
could be e.g. tuning/tuning_results/tuning_run01_all.csv
.
To tune a single model run:
python -m tuning.tuning --model_type [wished_model] [path_results_csv]
See help options for more information.
After tuning, run python -m tuning.eval_tuning
to aggregate and sort the tuning results for each model. The results are stored in the folder tuning/tuning_results/tables
.
Model evaluation consists of preprocessing the complete dataset (in contrast to the single smp profiles as in the first step); evaluating each model; and if desired validating each model.
Preprocessing the complete data set (including data splits and preparing it for model usage) only needs to be done one time:
python -m models.run_models --preprocess
Afterwards one can just include a flag for evaluating or validating: (All results are stored for each model in the folder output/evaluation
.)
python -m models.run_models --evaluate --validate
Here is the full command, where the smp file and the preprocessed dataset file can be set:
python -m --smp_npz [path_npz_file] --preprocess_file [path_txt_file] --preprocess --validate --evaluate
[path_npz_file]
: Path to the npz file where the complete SMP dataset was stored. For example:data/all_smp_profiles_updated.npz
[path_txt_file]
: Path to the txt file where the SMP dataset is or will be stored and the different splits of the dataset can be accessed. For example:data/preprocessed_data_k5.txt
preprocess
: Preprocesses the[path_npz_file]
data and stores the split, model-ready data in[path_txt_file]
.evaluate
: Evaluates each model based on the dataset[path_txt_file]
. (Go intorun_models
, functionevaluate_all_models
to choose between different models and which evaluation information you want to have from them). Results are stored inoutput/evaluation/
validate
: Validates each model based on the dataset[path_txt_file]
. Results are stored inouput/tables/
.
The data, preprocessing and results are also visualized. The plots are stored in outcome
and can already be found there. There are three sets of plots that can be created: Visualizations of the original data, the normalized data and the results. Look into the code to see which plots are shown and comment out specific plots in run_visualization.py
if desired.
python -m visualization.run_visualization --original_data --normalized_data --results
.
├── data
│ └── smp_profiles
├── data_handling
├── models
│ └── stored_models
├── output
│ ├── evaluation
│ │ ├── baseline
│ │ ├── blstm
│ │ ├── bmm
│ │ ├── easy_ensemble
│ │ ├── enc_dec
│ │ ├── gmm
│ │ ├── kmeans
│ │ ├── knn
│ │ ├── label_spreading
│ │ ├── lstm
│ │ ├── rf
│ │ ├── rf_bal
│ │ ├── self_trainer
│ │ └── svm
│ ├── plots_data
│ │ ├── normalized
│ │ └── original
│ ├── plots_results
│ ├── predictions
│ └── scores
├── tuning
│ └── tuning_results
│ └── tables
└── visualization