Skip to content

This is a github repository for automatic snow grain classification and snow layer segmentation of Snow Micro Pen profiles

License

Notifications You must be signed in to change notification settings

liellnima/snowdragon

Repository files navigation

snowdragon

This repository can be used to run and compare different models for the classification and segmentation of Snow Micro Pen (SMP) profiles.

The SMP is a fast, high-resolution, portable snow hardness measurement device. The automatic classification and segmentation models can be used for the fast analysis of vast numbers of SMP profiles. For more information about the background of snow layer segmentation and grain type classification please refer to the related publicatoins. Throughout the project the SMP dataset collected during the MOSAiC expedition was used. The plots and results of the different models can be reproduced with this repository.

Related publications

Overview

  • data/: Preprocessed SMP profiles as npz files
  • data_handling/: Scripts to preprocess and clean the raw SMP profiles
  • models/: Scripts to create, use and store models
  • output/: Output and results are stored here. Subdir evaluation/ contains plots for each model and profile. plots_data/ contains plots giving an overview over the data. plots_results/ contains plotted results. predictions/ is where predictions are stored. scores/ contains all the scores.
  • tuning/: Scripts to tune models
  • visualization/: Scripts to create plots

Setup

This repository runs on Python 3.6. For a quick setup run pip install -e . The required packages can also be installed with pip install -r requirements.txt. If wished, create an environment beforehand (eg: conda create --name=snowdragon python=3.6).

The repository does not contain the MOSAiC data used here. The data is publicly available here: https://doi.org/10.1594/PANGAEA.935934. Any other SMP dataset can be used as well with this repository to train models for automatic SMP classification and segmentation.

If you would like to use our pre-trained models instead of training the models yourself, you can download them here: https://doi.org/10.5281/zenodo.7063520.

Usage

Data Preprocessing

To preprocess all SMP profiles, run:

python -m data_handling.data_loader [path_npz_file] --smp_src [path_raw_smp_data] --exp_loc [path_dir_smp_preprocessed]
  • [path_npz_file]: Path and name of the npz file where the complete preprocessed SMP dataset is stored. For example: data/all_smp_profiles.npz
  • [path_raw_smp_data]: Path to the directory where the raw SMP data is stored
  • [path_dir_smp_preprocessed]: Path to the directory where each single preprocessed SMP profile will be stored. For example: data/smp_profiles

To get information about an already preprocessed data set, run:

python -m data_handling.data_loader [path_npz_file] --load_only --test_print

For explanations of further preprocessing options run python -m data_handling.data_loader -h.

For smooth default usage, set the SMP_LOC in data_handling/data_parameters.py to the path where all raw SMP data is stored.

Tuning

Tuning can be skipped. The default hyperparameters of all models are set to the values which produced the best results for the MOSAiC SMP dataset.

To run tuning, run first model evaluation to create a split up (training, validation, testing) and normalized dataset. The results are saved e.g. in data/preprocessed_data_k5.txt. To tune all models simply run the prepared bash script:

bash tuning/tune_models.sh [path_results_csv]

[path_results_csv] could be e.g. tuning/tuning_results/tuning_run01_all.csv.

To tune a single model run:

python -m tuning.tuning --model_type [wished_model] [path_results_csv]

See help options for more information.

After tuning, run python -m tuning.eval_tuning to aggregate and sort the tuning results for each model. The results are stored in the folder tuning/tuning_results/tables.

Model Evaluation

Model evaluation consists of preprocessing the complete dataset (in contrast to the single smp profiles as in the first step); evaluating each model; and if desired validating each model.

Preprocessing the complete data set (including data splits and preparing it for model usage) only needs to be done one time:

python -m models.run_models --preprocess

Afterwards one can just include a flag for evaluating or validating: (All results are stored for each model in the folder output/evaluation.)

python -m models.run_models --evaluate --validate

Here is the full command, where the smp file and the preprocessed dataset file can be set:

python -m --smp_npz [path_npz_file] --preprocess_file [path_txt_file] --preprocess --validate --evaluate
  • [path_npz_file]: Path to the npz file where the complete SMP dataset was stored. For example: data/all_smp_profiles_updated.npz
  • [path_txt_file]: Path to the txt file where the SMP dataset is or will be stored and the different splits of the dataset can be accessed. For example: data/preprocessed_data_k5.txt
  • preprocess: Preprocesses the [path_npz_file] data and stores the split, model-ready data in [path_txt_file].
  • evaluate: Evaluates each model based on the dataset [path_txt_file]. (Go into run_models, function evaluate_all_models to choose between different models and which evaluation information you want to have from them). Results are stored in output/evaluation/
  • validate: Validates each model based on the dataset [path_txt_file]. Results are stored in ouput/tables/.

Visualization

The data, preprocessing and results are also visualized. The plots are stored in outcome and can already be found there. There are three sets of plots that can be created: Visualizations of the original data, the normalized data and the results. Look into the code to see which plots are shown and comment out specific plots in run_visualization.py if desired.

python -m visualization.run_visualization --original_data --normalized_data --results

Structure

.
├── data
│   └── smp_profiles
├── data_handling
├── models
│   └── stored_models
├── output
│   ├── evaluation
│   │   ├── baseline
│   │   ├── blstm
│   │   ├── bmm
│   │   ├── easy_ensemble
│   │   ├── enc_dec
│   │   ├── gmm
│   │   ├── kmeans
│   │   ├── knn
│   │   ├── label_spreading
│   │   ├── lstm
│   │   ├── rf
│   │   ├── rf_bal
│   │   ├── self_trainer
│   │   └── svm
│   ├── plots_data
│   │   ├── normalized
│   │   └── original
│   ├── plots_results
│   ├── predictions
│   └── scores
├── tuning
│   └── tuning_results
│       └── tables
└── visualization

About

This is a github repository for automatic snow grain classification and snow layer segmentation of Snow Micro Pen profiles

Resources

License

Stars

Watchers

Forks

Packages

No packages published