Update Documentation

se4ai2324-uniba · Jan 7, 2024 · a9bd6f9 · a9bd6f9
1 parent b0c7f24
commit a9bd6f9
Show file tree

Hide file tree

Showing 113 changed files with 91 additions and 56 deletions.
diff --git a/README.md b/README.md
@@ -82,13 +82,15 @@ Project Organization
     |   ├── index.html     <- Frontend html
     │   ├── logo.png       <- Web Page logo
     │   ├── nginx.conf     <- Configuration file for nginx.
-    │   └── script.js      <- Frontend script
+    │   ├── script.js      <- Frontend script
+    │   └── README.md      
     |
     ├── models             <- Trained and serialized models, model predictions, or model summaries
     │   ├── validation_a.pkl
     │   ├── validation_b.pkl
     │   ├── train_a.pkl
-    │   └── train_b.pkl
+    │   ├── train_b.pkl
+    │   └── README.md      
     │
     ├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
     │                         the creator's initials, and a short `-` delimited description, e.g.
@@ -98,52 +100,50 @@ Project Organization
     |   |
     │   ├── deploy_doc
     │   │   └── README.md 
-    |   ├
     │   ├── docker_doc
     │   │   └── README.md
-    │   │
+    │   ├── dvc_mlflow_doc
+    │   │   └── README.md
     │   ├── great_expectations_doc
     │   │   ├── expectations
     │   │   ├── static
     │   │   └── index.html
-    │   │
     │   ├── monitoring_doc
     │   │   └── README.md
-    │   │
-    │   └── images_doc
+    │   ├── images_doc
+    │   └── READMME.md
     │
     ├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
     │   ├── alibi_detect_logs            <- Logs generated after data drift analysis.
     │   │    ├── model_category.txt
     │   │    └── model_sexsism.txt
-    │   │
-    │   ├── locust            <- Logs generated after locust analysis.
+    │   ├── locust                       <- Logs generated after locust analysis.
     │   │    ├── report_exceptions.csv
     │   │    ├── report_stats_history.csv
     │   │    ├── report_stats.csv
     │   │    └── report_failures.csv
-    │   │
-    │   └── figures        <- Generated graphics and figures to be used in reporting
+    │   ├── output_codecarbon           <- Logs generated after code carbon analysis.
+    │   │    ├── output_train_a.csv
+    │   │    ├── output_train_a.csv.bak
+    │   │    ├── output_train_b.csv
+    │   │    └── output_train_b.csv.bak
+    │   ├── mlruns                      <- Logs generated after mlflow runs.
+    │   └── figures                     <- Generated graphics and figures to be used in reporting
     │
     ├── src                <- Source code for use in this project.
     │   ├── __init__.py    <- Makes src a Python module
-    │   │
+    │   ├── README.md
     │   ├── api            <- Scripts to crate Api using FastAPI
     │   │   ├── corpus_endpoint.py
     │   │   ├── prometheus_monitoring.py
-    │   │   ├── README.md
     │   │   ├── server_api.py
     │   │   └── dashboards
     │   │       └── grafana.json
-    │   │
     │   ├── data           <- Scripts to download or generate data
     │   │   └── make_dataset.py
-    │   │
     │   ├── features       <- Scripts to turn raw data into features for modeling
     │   │   ├── drift_detection.py
-    │   │   ├── build_features.py
-    │   │   └── README.md 
-    │   │
+    │   │   └── build_features.py
     │   ├── models         <- Scripts to train models and then use trained models to make
     │   │   │                 predictions
     │   │   ├── test_a.py
@@ -152,50 +152,38 @@ Project Organization
     │   │   ├── train_b.py
     │   │   ├── validation_a.py
     │   │   ├── validation_b.py
-    │   │   ├── mlruns
-    │   │   ├── output_codecarbon
-    │   │   │   ├── output_train_a.csv
-    │   │   │   ├── output_train_a.csv.bak
-    │   │   │   ├── output_train_b.csv
-    │   │   │   ├── output_train_b.csv.bak
-    │   │   │   └── README.md
-    │   │   │
+    │   │   ├── .codecarbon.config
     │   │   └── MLflow
-    │   │       ├── test_a.py
-    │   │       ├── test_b.py
-    │   │       ├── train_a.py
-    │   │       ├── train_b.py
-    │   │       ├── validation_a.py
-    │   │       └── validation_b.py
-    │   │  
+    │   │       ├── mlflow_test_a.py
+    │   │       ├── mlflow_test_b.py
+    │   │       ├── mlflow_train_a.py
+    │   │       ├── mlflow_train_b.py
+    │   │       ├── mlflow_validation_a.py
+    │   │       └── mlflow_validation_b.py
     │   └── visualization  <- Scripts to create exploratory and results oriented visualizations
     │       └── visualize.py
     │   
     ├── tests         <- Scripts to test using Pytest
     │   ├── api_testing
     │   │   └── test_api.py
-    │   │
     │   ├── dataset_testing
     │   │   ├── test_dataset_model_a.py
     │   │   └── test_dataset_model_b.py
-    │   │
     │   ├── model_training_testing
     │   │    └── test_overfit.py
-    │   │
     │   ├── preprocessing_testing
     │   │    └── test_preprocessing.py
-    │   │
-    │   └── behavioral_testing
-    │       ├── test_directional_model_a.py
-    │       ├── test_directional_model_b.py
-    │       ├── test_invariance_model_a.py
-    │       ├── test_invariance_model_b.py
-    │       ├── test_minimum_funcionality_model_a.py
-    │       └── test_minimum_funcionality_model_b.py
+    │   ├── behavioral_testing
+    │   │   ├── test_directional_model_a.py
+    │   │   ├── test_directional_model_b.py
+    │   │   ├── test_invariance_model_a.py
+    │   │   ├── test_invariance_model_b.py
+    │   │   ├── test_minimum_funcionality_model_a.py
+    │   │   └── test_minimum_funcionality_model_b.py
+    │   └── README.md
     |
     ├── .dockerignore           <- Docker ignore file.
     ├── .dvcignore              <- Data Version Control ignore file.
-    ├── .flake8                 <- Flake8 ignore file.
     ├── .gitignore              <- Specifications of files to be ignored by Git.
     ├── docker-compose.yaml     <- Docker Compose configuration.
     ├── Dockerfile              <- Docker file for the backend.

diff --git a/references/dvc_mlflow_doc/README.md b/references/dvc_mlflow_doc/README.md
@@ -1,20 +1,25 @@
-# DVC and MLflow Integration for Machine Learning Projects
+# DVC, MLflow, and DagsHub Integration for Machine Learning Projects
 
-In order to manage and track our machine learning project we use DVC (Data Version Control) and MLflow. DVC is an open-source version control system for data science and machine learning projects, while MLflow is an open-source platform for managing the end-to-end machine learning lifecycle.
+This project employs DVC (Data Version Control), MLflow, and DagsHub to manage and track the machine learning lifecycle. DVC is an open-source version control system tailored for data science and machine learning projects. MLflow is an open-source platform that handles the end-to-end machine learning lifecycle. DagsHub complements these tools by providing a platform for collaboration on data science projects.
 
 ## Overview
 
-Integrating DVC and MLflow offers a robust solution for handling large datasets, versioning data & models, experiment tracking, and model deployment. DVC assists in versioning data and models, while MLflow tracks and manages the machine learning experiments and deployments.
+The integration of DVC, MLflow, and DagsHub provides a comprehensive solution for dataset management, versioning, experiment tracking, and model deployment. This synergy enhances the reproducibility, monitoring, and collaboration of machine learning projects.
 
 ## Features
 
-- **Data Versioning with DVC**: Efficiently handle large datasets and version control models.
-- **Experiment Tracking with MLflow**: Track experiments, log parameters, and compare results.
-- **Model Deployment**: Utilize MLflow's model registry for model deployment.
+- **Data Versioning with DVC**: Manages and version-controls large datasets and machine learning models, facilitating data sharing and collaboration.
+- **Experiment Tracking with MLflow**: Records and compares experiments, parameters, and results, streamlining the model development process.
+- **Model Deployment**: Leverages MLflow's model registry for consistent and organized deployment across various environments.
+- **Collaboration with DagsHub**: Integrates with DVC and MLflow, offering a collaborative platform for team members to share, discuss, and track progress.
 - **Reproducibility**: Ensure experiments are reproducible with version-controlled data and models.
 
 ## Installation
 
+Before starting, ensure Python is installed. Then, install DVC, MLflow, and the necessary dependencies:
+
+## Installation
+
 Before you begin, ensure you have Python installed on your system. Then, install DVC and MLflow using pip:
 
 ```bash
@@ -30,14 +35,13 @@ pip install dvc mlflow
    dvc init
    git status
    git commit -m "Initialize DVC"
-
    ```
 
 2. **Add Data to DVC**:
    Track large datasets or models with DVC:
    ```bash
    dvc add data/Raw/dataset.csv
-   git add data/.gitignore data/dataset.csv.dvc
+   git add data/.gitignore data/Raw/dataset.csv.dvc
    git commit -m "Add dataset to DVC"
    ```
 
@@ -61,27 +65,50 @@ pip install dvc mlflow
        mlflow.log_artifact("path/to/artifact")
    ```
 
-### Combining DVC and MLflow
+## Integrating with DagsHub
+DagsHub seamlessly integrates with DVC and MLflow, offering a platform for hosting and visualizing DVC-tracked datasets and MLflow experiments. Create a DagsHub repository to push and share your DVC and MLflow configurations and results. [DagsHub Repository](https://dagshub.com/se4ai2324-uniba/DetectionOfOnlineSexism)
+
+1. **Set Up a DagsHub Repository**:
+Create a repository on DagsHub and link it with your project.
+
+2. **Push Changes to DagsHub**:
+Commit and push your changes to the DagsHub repository to share your progress.
 
-Use DVC to manage data and models, and MLflow for experiment tracking. For example, use DVC to pull the latest data version before running an experiment with MLflow.
+## Combining DVC, MLflow, and DagsHub
+
+Use DVC for data and model management, MLflow for experiment tracking, and DagsHub for collaboration:
 
 ```bash
 dvc pull data/Raw/dataset.csv.dvc
 python mlflow_experiment.py
+git add .
+git commit -m "Update experiment"
+git push origin main
 ```
 
+![image](../images_doc/PipelineA.png)
+
+![image](../images_doc/PipelineB.png)
+
 ## Versioning Data and Models
 
 DVC tracks changes in your data and models. Use `dvc push` and `dvc pull` commands to synchronize your large files with remote storage, ensuring consistency across environments.
 
+![image](../images_doc/RegisteredModels.png)
+
 ## Experiment Tracking
 
 MLflow tracks each experiment's parameters, metrics, and output models, making it easy to compare different runs and select the best model for deployment.
+![image](../images_doc/Mlflow.png)
 
 ## Model Deployment
 
 Utilize MLflow's model registry for deploying models to various production environments, ensuring a smooth transition from experimentation to deployment.
 
+## Collaboration and Sharing
+
+DagsHub provides a platform for sharing experiments, data, and progress with team members, enhancing collaboration and transparency in the project.
+
 ## Best Practices
 
 - Regularly commit changes in data and code to ensure reproducibility.

diff --git a/references/images_doc/Mlflow.png b/references/images_doc/Mlflow.png
diff --git a/references/images_doc/RegisteredModels.png b/references/images_doc/RegisteredModels.png
diff --git a/...01b127d28994cfab2325c5c5f5ede57/meta.yaml → ...01b127d28994cfab2325c5c5f5ede57/meta.yaml b/...01b127d28994cfab2325c5c5f5ede57/meta.yaml → ...01b127d28994cfab2325c5c5f5ede57/meta.yaml
diff --git a/...fab2325c5c5f5ede57/metrics/accuracy_val_A → ...fab2325c5c5f5ede57/metrics/accuracy_val_A b/...fab2325c5c5f5ede57/metrics/accuracy_val_A → ...fab2325c5c5f5ede57/metrics/accuracy_val_A
diff --git a/...28994cfab2325c5c5f5ede57/metrics/f1_val_A → ...28994cfab2325c5c5f5ede57/metrics/f1_val_A b/...28994cfab2325c5c5f5ede57/metrics/f1_val_A → ...28994cfab2325c5c5f5ede57/metrics/f1_val_A
diff --git a/...ab2325c5c5f5ede57/metrics/precision_val_A → ...ab2325c5c5f5ede57/metrics/precision_val_A b/...ab2325c5c5f5ede57/metrics/precision_val_A → ...ab2325c5c5f5ede57/metrics/precision_val_A
diff --git a/...4cfab2325c5c5f5ede57/metrics/recall_val_A → ...4cfab2325c5c5f5ede57/metrics/recall_val_A b/...4cfab2325c5c5f5ede57/metrics/recall_val_A → ...4cfab2325c5c5f5ede57/metrics/recall_val_A
diff --git a/...94cfab2325c5c5f5ede57/tags/mlflow.runName → ...94cfab2325c5c5f5ede57/tags/mlflow.runName b/...94cfab2325c5c5f5ede57/tags/mlflow.runName → ...94cfab2325c5c5f5ede57/tags/mlflow.runName
diff --git a/...ab2325c5c5f5ede57/tags/mlflow.source.name → ...ab2325c5c5f5ede57/tags/mlflow.source.name b/...ab2325c5c5f5ede57/tags/mlflow.source.name → ...ab2325c5c5f5ede57/tags/mlflow.source.name
diff --git a/...ab2325c5c5f5ede57/tags/mlflow.source.type → ...ab2325c5c5f5ede57/tags/mlflow.source.type b/...ab2325c5c5f5ede57/tags/mlflow.source.type → ...ab2325c5c5f5ede57/tags/mlflow.source.type
diff --git a/...28994cfab2325c5c5f5ede57/tags/mlflow.user → ...28994cfab2325c5c5f5ede57/tags/mlflow.user b/...28994cfab2325c5c5f5ede57/tags/mlflow.user → ...28994cfab2325c5c5f5ede57/tags/mlflow.user
diff --git a/...062d64dd5114776ba7277bccef68e9e/meta.yaml → ...062d64dd5114776ba7277bccef68e9e/meta.yaml b/...062d64dd5114776ba7277bccef68e9e/meta.yaml → ...062d64dd5114776ba7277bccef68e9e/meta.yaml
diff --git a/...76ba7277bccef68e9e/metrics/accuracy_val_B → ...76ba7277bccef68e9e/metrics/accuracy_val_B b/...76ba7277bccef68e9e/metrics/accuracy_val_B → ...76ba7277bccef68e9e/metrics/accuracy_val_B
diff --git a/...d5114776ba7277bccef68e9e/metrics/f1_val_B → ...d5114776ba7277bccef68e9e/metrics/f1_val_B b/...d5114776ba7277bccef68e9e/metrics/f1_val_B → ...d5114776ba7277bccef68e9e/metrics/f1_val_B
diff --git a/...6ba7277bccef68e9e/metrics/precision_val_B → ...6ba7277bccef68e9e/metrics/precision_val_B b/...6ba7277bccef68e9e/metrics/precision_val_B → ...6ba7277bccef68e9e/metrics/precision_val_B
diff --git a/...4776ba7277bccef68e9e/metrics/recall_val_B → ...4776ba7277bccef68e9e/metrics/recall_val_B b/...4776ba7277bccef68e9e/metrics/recall_val_B → ...4776ba7277bccef68e9e/metrics/recall_val_B
diff --git a/...14776ba7277bccef68e9e/tags/mlflow.runName → ...14776ba7277bccef68e9e/tags/mlflow.runName b/...14776ba7277bccef68e9e/tags/mlflow.runName → ...14776ba7277bccef68e9e/tags/mlflow.runName
diff --git a/...6ba7277bccef68e9e/tags/mlflow.source.name → ...6ba7277bccef68e9e/tags/mlflow.source.name b/...6ba7277bccef68e9e/tags/mlflow.source.name → ...6ba7277bccef68e9e/tags/mlflow.source.name
diff --git a/...6ba7277bccef68e9e/tags/mlflow.source.type → ...6ba7277bccef68e9e/tags/mlflow.source.type b/...6ba7277bccef68e9e/tags/mlflow.source.type → ...6ba7277bccef68e9e/tags/mlflow.source.type
diff --git a/...d5114776ba7277bccef68e9e/tags/mlflow.user → ...d5114776ba7277bccef68e9e/tags/mlflow.user b/...d5114776ba7277bccef68e9e/tags/mlflow.user → ...d5114776ba7277bccef68e9e/tags/mlflow.user
diff --git a/...45afe4212da4530979ce1d4ca0d7c79/meta.yaml → ...45afe4212da4530979ce1d4ca0d7c79/meta.yaml b/...45afe4212da4530979ce1d4ca0d7c79/meta.yaml → ...45afe4212da4530979ce1d4ca0d7c79/meta.yaml
diff --git a/...30979ce1d4ca0d7c79/metrics/accuracy_val_B → ...30979ce1d4ca0d7c79/metrics/accuracy_val_B b/...30979ce1d4ca0d7c79/metrics/accuracy_val_B → ...30979ce1d4ca0d7c79/metrics/accuracy_val_B
diff --git a/...12da4530979ce1d4ca0d7c79/metrics/f1_val_B → ...12da4530979ce1d4ca0d7c79/metrics/f1_val_B b/...12da4530979ce1d4ca0d7c79/metrics/f1_val_B → ...12da4530979ce1d4ca0d7c79/metrics/f1_val_B
diff --git a/...0979ce1d4ca0d7c79/metrics/precision_val_B → ...0979ce1d4ca0d7c79/metrics/precision_val_B b/...0979ce1d4ca0d7c79/metrics/precision_val_B → ...0979ce1d4ca0d7c79/metrics/precision_val_B
diff --git a/...4530979ce1d4ca0d7c79/metrics/recall_val_B → ...4530979ce1d4ca0d7c79/metrics/recall_val_B b/...4530979ce1d4ca0d7c79/metrics/recall_val_B → ...4530979ce1d4ca0d7c79/metrics/recall_val_B
diff --git a/...a4530979ce1d4ca0d7c79/tags/mlflow.runName → ...a4530979ce1d4ca0d7c79/tags/mlflow.runName b/...a4530979ce1d4ca0d7c79/tags/mlflow.runName → ...a4530979ce1d4ca0d7c79/tags/mlflow.runName
diff --git a/...0979ce1d4ca0d7c79/tags/mlflow.source.name → ...0979ce1d4ca0d7c79/tags/mlflow.source.name b/...0979ce1d4ca0d7c79/tags/mlflow.source.name → ...0979ce1d4ca0d7c79/tags/mlflow.source.name
diff --git a/...0979ce1d4ca0d7c79/tags/mlflow.source.type → ...0979ce1d4ca0d7c79/tags/mlflow.source.type b/...0979ce1d4ca0d7c79/tags/mlflow.source.type → ...0979ce1d4ca0d7c79/tags/mlflow.source.type
diff --git a/...12da4530979ce1d4ca0d7c79/tags/mlflow.user → ...12da4530979ce1d4ca0d7c79/tags/mlflow.user b/...12da4530979ce1d4ca0d7c79/tags/mlflow.user → ...12da4530979ce1d4ca0d7c79/tags/mlflow.user
diff --git a/...7ddedcdb56a4c11b97da0df90c575be/meta.yaml → ...7ddedcdb56a4c11b97da0df90c575be/meta.yaml b/...7ddedcdb56a4c11b97da0df90c575be/meta.yaml → ...7ddedcdb56a4c11b97da0df90c575be/meta.yaml
diff --git a/...11b97da0df90c575be/metrics/accuracy_val_A → ...11b97da0df90c575be/metrics/accuracy_val_A b/...11b97da0df90c575be/metrics/accuracy_val_A → ...11b97da0df90c575be/metrics/accuracy_val_A
diff --git a/...b56a4c11b97da0df90c575be/metrics/f1_val_A → ...b56a4c11b97da0df90c575be/metrics/f1_val_A b/...b56a4c11b97da0df90c575be/metrics/f1_val_A → ...b56a4c11b97da0df90c575be/metrics/f1_val_A
diff --git a/...1b97da0df90c575be/metrics/precision_val_A → ...1b97da0df90c575be/metrics/precision_val_A b/...1b97da0df90c575be/metrics/precision_val_A → ...1b97da0df90c575be/metrics/precision_val_A
diff --git a/...4c11b97da0df90c575be/metrics/recall_val_A → ...4c11b97da0df90c575be/metrics/recall_val_A b/...4c11b97da0df90c575be/metrics/recall_val_A → ...4c11b97da0df90c575be/metrics/recall_val_A
diff --git a/...a4c11b97da0df90c575be/tags/mlflow.runName → ...a4c11b97da0df90c575be/tags/mlflow.runName b/...a4c11b97da0df90c575be/tags/mlflow.runName → ...a4c11b97da0df90c575be/tags/mlflow.runName
diff --git a/...1b97da0df90c575be/tags/mlflow.source.name → ...1b97da0df90c575be/tags/mlflow.source.name b/...1b97da0df90c575be/tags/mlflow.source.name → ...1b97da0df90c575be/tags/mlflow.source.name
diff --git a/...1b97da0df90c575be/tags/mlflow.source.type → ...1b97da0df90c575be/tags/mlflow.source.type b/...1b97da0df90c575be/tags/mlflow.source.type → ...1b97da0df90c575be/tags/mlflow.source.type
diff --git a/...b56a4c11b97da0df90c575be/tags/mlflow.user → ...b56a4c11b97da0df90c575be/tags/mlflow.user b/...b56a4c11b97da0df90c575be/tags/mlflow.user → ...b56a4c11b97da0df90c575be/tags/mlflow.user
diff --git a/...2beb89ba71f467590727c823e48e8b1/meta.yaml → ...2beb89ba71f467590727c823e48e8b1/meta.yaml b/...2beb89ba71f467590727c823e48e8b1/meta.yaml → ...2beb89ba71f467590727c823e48e8b1/meta.yaml
diff --git a/...590727c823e48e8b1/metrics/accuracy_test_A → ...590727c823e48e8b1/metrics/accuracy_test_A b/...590727c823e48e8b1/metrics/accuracy_test_A → ...590727c823e48e8b1/metrics/accuracy_test_A
diff --git a/...71f467590727c823e48e8b1/metrics/f1_test_A → ...71f467590727c823e48e8b1/metrics/f1_test_A b/...71f467590727c823e48e8b1/metrics/f1_test_A → ...71f467590727c823e48e8b1/metrics/f1_test_A
diff --git a/...90727c823e48e8b1/metrics/precision_test_A → ...90727c823e48e8b1/metrics/precision_test_A b/...90727c823e48e8b1/metrics/precision_test_A → ...90727c823e48e8b1/metrics/precision_test_A
diff --git a/...67590727c823e48e8b1/metrics/recall_test_A → ...67590727c823e48e8b1/metrics/recall_test_A b/...67590727c823e48e8b1/metrics/recall_test_A → ...67590727c823e48e8b1/metrics/recall_test_A
diff --git a/...f467590727c823e48e8b1/tags/mlflow.runName → ...f467590727c823e48e8b1/tags/mlflow.runName b/...f467590727c823e48e8b1/tags/mlflow.runName → ...f467590727c823e48e8b1/tags/mlflow.runName
diff --git a/...590727c823e48e8b1/tags/mlflow.source.name → ...590727c823e48e8b1/tags/mlflow.source.name b/...590727c823e48e8b1/tags/mlflow.source.name → ...590727c823e48e8b1/tags/mlflow.source.name
diff --git a/...590727c823e48e8b1/tags/mlflow.source.type → ...590727c823e48e8b1/tags/mlflow.source.type b/...590727c823e48e8b1/tags/mlflow.source.type → ...590727c823e48e8b1/tags/mlflow.source.type
diff --git a/...a71f467590727c823e48e8b1/tags/mlflow.user → ...a71f467590727c823e48e8b1/tags/mlflow.user b/...a71f467590727c823e48e8b1/tags/mlflow.user → ...a71f467590727c823e48e8b1/tags/mlflow.user
diff --git a/...72b8cb7552f4351ac97e7d899d8f5b4/meta.yaml → ...72b8cb7552f4351ac97e7d899d8f5b4/meta.yaml b/...72b8cb7552f4351ac97e7d899d8f5b4/meta.yaml → ...72b8cb7552f4351ac97e7d899d8f5b4/meta.yaml
diff --git a/...2f4351ac97e7d899d8f5b4/metrics/accuracy_B → ...2f4351ac97e7d899d8f5b4/metrics/accuracy_B b/...2f4351ac97e7d899d8f5b4/metrics/accuracy_B → ...2f4351ac97e7d899d8f5b4/metrics/accuracy_B
diff --git a/...51ac97e7d899d8f5b4/metrics/accuracy_val_B → ...51ac97e7d899d8f5b4/metrics/accuracy_val_B b/...51ac97e7d899d8f5b4/metrics/accuracy_val_B → ...51ac97e7d899d8f5b4/metrics/accuracy_val_B
diff --git a/...8cb7552f4351ac97e7d899d8f5b4/metrics/f1_B → ...8cb7552f4351ac97e7d899d8f5b4/metrics/f1_B b/...8cb7552f4351ac97e7d899d8f5b4/metrics/f1_B → ...8cb7552f4351ac97e7d899d8f5b4/metrics/f1_B
diff --git a/...552f4351ac97e7d899d8f5b4/metrics/f1_val_B → ...552f4351ac97e7d899d8f5b4/metrics/f1_val_B b/...552f4351ac97e7d899d8f5b4/metrics/f1_val_B → ...552f4351ac97e7d899d8f5b4/metrics/f1_val_B
diff --git a/...f4351ac97e7d899d8f5b4/metrics/precision_B → ...f4351ac97e7d899d8f5b4/metrics/precision_B b/...f4351ac97e7d899d8f5b4/metrics/precision_B → ...f4351ac97e7d899d8f5b4/metrics/precision_B
diff --git a/...1ac97e7d899d8f5b4/metrics/precision_val_B → ...1ac97e7d899d8f5b4/metrics/precision_val_B b/...1ac97e7d899d8f5b4/metrics/precision_val_B → ...1ac97e7d899d8f5b4/metrics/precision_val_B
diff --git a/...552f4351ac97e7d899d8f5b4/metrics/recall_B → ...552f4351ac97e7d899d8f5b4/metrics/recall_B b/...552f4351ac97e7d899d8f5b4/metrics/recall_B → ...552f4351ac97e7d899d8f5b4/metrics/recall_B
diff --git a/...4351ac97e7d899d8f5b4/metrics/recall_val_B → ...4351ac97e7d899d8f5b4/metrics/recall_val_B b/...4351ac97e7d899d8f5b4/metrics/recall_val_B → ...4351ac97e7d899d8f5b4/metrics/recall_val_B
diff --git a/...f4351ac97e7d899d8f5b4/tags/mlflow.runName → ...f4351ac97e7d899d8f5b4/tags/mlflow.runName b/...f4351ac97e7d899d8f5b4/tags/mlflow.runName → ...f4351ac97e7d899d8f5b4/tags/mlflow.runName
diff --git a/...1ac97e7d899d8f5b4/tags/mlflow.source.name → ...1ac97e7d899d8f5b4/tags/mlflow.source.name b/...1ac97e7d899d8f5b4/tags/mlflow.source.name → ...1ac97e7d899d8f5b4/tags/mlflow.source.name
diff --git a/...1ac97e7d899d8f5b4/tags/mlflow.source.type → ...1ac97e7d899d8f5b4/tags/mlflow.source.type b/...1ac97e7d899d8f5b4/tags/mlflow.source.type → ...1ac97e7d899d8f5b4/tags/mlflow.source.type
diff --git a/...552f4351ac97e7d899d8f5b4/tags/mlflow.user → ...552f4351ac97e7d899d8f5b4/tags/mlflow.user b/...552f4351ac97e7d899d8f5b4/tags/mlflow.user → ...552f4351ac97e7d899d8f5b4/tags/mlflow.user
diff --git a/...380d386eac148aea17ee4ab50817c74/meta.yaml → ...380d386eac148aea17ee4ab50817c74/meta.yaml b/...380d386eac148aea17ee4ab50817c74/meta.yaml → ...380d386eac148aea17ee4ab50817c74/meta.yaml
diff --git a/...aea17ee4ab50817c74/metrics/accuracy_val_B → ...aea17ee4ab50817c74/metrics/accuracy_val_B b/...aea17ee4ab50817c74/metrics/accuracy_val_B → ...aea17ee4ab50817c74/metrics/accuracy_val_B
diff --git a/...eac148aea17ee4ab50817c74/metrics/f1_val_B → ...eac148aea17ee4ab50817c74/metrics/f1_val_B b/...eac148aea17ee4ab50817c74/metrics/f1_val_B → ...eac148aea17ee4ab50817c74/metrics/f1_val_B
diff --git a/...ea17ee4ab50817c74/metrics/precision_val_B → ...ea17ee4ab50817c74/metrics/precision_val_B b/...ea17ee4ab50817c74/metrics/precision_val_B → ...ea17ee4ab50817c74/metrics/precision_val_B
diff --git a/...48aea17ee4ab50817c74/metrics/recall_val_B → ...48aea17ee4ab50817c74/metrics/recall_val_B b/...48aea17ee4ab50817c74/metrics/recall_val_B → ...48aea17ee4ab50817c74/metrics/recall_val_B
diff --git a/...148aea17ee4ab50817c74/tags/mlflow.runName → ...148aea17ee4ab50817c74/tags/mlflow.runName b/...148aea17ee4ab50817c74/tags/mlflow.runName → ...148aea17ee4ab50817c74/tags/mlflow.runName
diff --git a/...ea17ee4ab50817c74/tags/mlflow.source.name → ...ea17ee4ab50817c74/tags/mlflow.source.name b/...ea17ee4ab50817c74/tags/mlflow.source.name → ...ea17ee4ab50817c74/tags/mlflow.source.name
diff --git a/...ea17ee4ab50817c74/tags/mlflow.source.type → ...ea17ee4ab50817c74/tags/mlflow.source.type b/...ea17ee4ab50817c74/tags/mlflow.source.type → ...ea17ee4ab50817c74/tags/mlflow.source.type
diff --git a/...eac148aea17ee4ab50817c74/tags/mlflow.user → ...eac148aea17ee4ab50817c74/tags/mlflow.user b/...eac148aea17ee4ab50817c74/tags/mlflow.user → ...eac148aea17ee4ab50817c74/tags/mlflow.user
diff --git a/...c0f1689ae104598a7b8e915a1d679a5/meta.yaml → ...c0f1689ae104598a7b8e915a1d679a5/meta.yaml b/...c0f1689ae104598a7b8e915a1d679a5/meta.yaml → ...c0f1689ae104598a7b8e915a1d679a5/meta.yaml
diff --git a/...98a7b8e915a1d679a5/metrics/accuracy_val_A → ...98a7b8e915a1d679a5/metrics/accuracy_val_A b/...98a7b8e915a1d679a5/metrics/accuracy_val_A → ...98a7b8e915a1d679a5/metrics/accuracy_val_A
diff --git a/...ae104598a7b8e915a1d679a5/metrics/f1_val_A → ...ae104598a7b8e915a1d679a5/metrics/f1_val_A b/...ae104598a7b8e915a1d679a5/metrics/f1_val_A → ...ae104598a7b8e915a1d679a5/metrics/f1_val_A
diff --git a/...8a7b8e915a1d679a5/metrics/precision_val_A → ...8a7b8e915a1d679a5/metrics/precision_val_A b/...8a7b8e915a1d679a5/metrics/precision_val_A → ...8a7b8e915a1d679a5/metrics/precision_val_A
diff --git a/...4598a7b8e915a1d679a5/metrics/recall_val_A → ...4598a7b8e915a1d679a5/metrics/recall_val_A b/...4598a7b8e915a1d679a5/metrics/recall_val_A → ...4598a7b8e915a1d679a5/metrics/recall_val_A
diff --git a/...04598a7b8e915a1d679a5/tags/mlflow.runName → ...04598a7b8e915a1d679a5/tags/mlflow.runName b/...04598a7b8e915a1d679a5/tags/mlflow.runName → ...04598a7b8e915a1d679a5/tags/mlflow.runName
diff --git a/...8a7b8e915a1d679a5/tags/mlflow.source.name → ...8a7b8e915a1d679a5/tags/mlflow.source.name b/...8a7b8e915a1d679a5/tags/mlflow.source.name → ...8a7b8e915a1d679a5/tags/mlflow.source.name
diff --git a/...8a7b8e915a1d679a5/tags/mlflow.source.type → ...8a7b8e915a1d679a5/tags/mlflow.source.type b/...8a7b8e915a1d679a5/tags/mlflow.source.type → ...8a7b8e915a1d679a5/tags/mlflow.source.type
diff --git a/...ae104598a7b8e915a1d679a5/tags/mlflow.user → ...ae104598a7b8e915a1d679a5/tags/mlflow.user b/...ae104598a7b8e915a1d679a5/tags/mlflow.user → ...ae104598a7b8e915a1d679a5/tags/mlflow.user
diff --git a/...f747f6984254241899cbb4ed99a5c86/meta.yaml → ...f747f6984254241899cbb4ed99a5c86/meta.yaml b/...f747f6984254241899cbb4ed99a5c86/meta.yaml → ...f747f6984254241899cbb4ed99a5c86/meta.yaml
diff --git a/...41899cbb4ed99a5c86/metrics/accuracy_val_A → ...41899cbb4ed99a5c86/metrics/accuracy_val_A b/...41899cbb4ed99a5c86/metrics/accuracy_val_A → ...41899cbb4ed99a5c86/metrics/accuracy_val_A
diff --git a/...84254241899cbb4ed99a5c86/metrics/f1_val_A → ...84254241899cbb4ed99a5c86/metrics/f1_val_A b/...84254241899cbb4ed99a5c86/metrics/f1_val_A → ...84254241899cbb4ed99a5c86/metrics/f1_val_A
diff --git a/...1899cbb4ed99a5c86/metrics/precision_val_A → ...1899cbb4ed99a5c86/metrics/precision_val_A b/...1899cbb4ed99a5c86/metrics/precision_val_A → ...1899cbb4ed99a5c86/metrics/precision_val_A
diff --git a/...4241899cbb4ed99a5c86/metrics/recall_val_A → ...4241899cbb4ed99a5c86/metrics/recall_val_A b/...4241899cbb4ed99a5c86/metrics/recall_val_A → ...4241899cbb4ed99a5c86/metrics/recall_val_A
diff --git a/...54241899cbb4ed99a5c86/tags/mlflow.runName → ...54241899cbb4ed99a5c86/tags/mlflow.runName b/...54241899cbb4ed99a5c86/tags/mlflow.runName → ...54241899cbb4ed99a5c86/tags/mlflow.runName
diff --git a/...1899cbb4ed99a5c86/tags/mlflow.source.name → ...1899cbb4ed99a5c86/tags/mlflow.source.name b/...1899cbb4ed99a5c86/tags/mlflow.source.name → ...1899cbb4ed99a5c86/tags/mlflow.source.name
diff --git a/...1899cbb4ed99a5c86/tags/mlflow.source.type → ...1899cbb4ed99a5c86/tags/mlflow.source.type b/...1899cbb4ed99a5c86/tags/mlflow.source.type → ...1899cbb4ed99a5c86/tags/mlflow.source.type
diff --git a/...84254241899cbb4ed99a5c86/tags/mlflow.user → ...84254241899cbb4ed99a5c86/tags/mlflow.user b/...84254241899cbb4ed99a5c86/tags/mlflow.user → ...84254241899cbb4ed99a5c86/tags/mlflow.user
diff --git a/...19098399e2f403f96fca038c235ce47/meta.yaml → ...19098399e2f403f96fca038c235ce47/meta.yaml b/...19098399e2f403f96fca038c235ce47/meta.yaml → ...19098399e2f403f96fca038c235ce47/meta.yaml
diff --git a/...f96fca038c235ce47/metrics/accuracy_test_A → ...f96fca038c235ce47/metrics/accuracy_test_A b/...f96fca038c235ce47/metrics/accuracy_test_A → ...f96fca038c235ce47/metrics/accuracy_test_A
diff --git a/...e2f403f96fca038c235ce47/metrics/f1_test_A → ...e2f403f96fca038c235ce47/metrics/f1_test_A b/...e2f403f96fca038c235ce47/metrics/f1_test_A → ...e2f403f96fca038c235ce47/metrics/f1_test_A
diff --git a/...96fca038c235ce47/metrics/precision_test_A → ...96fca038c235ce47/metrics/precision_test_A b/...96fca038c235ce47/metrics/precision_test_A → ...96fca038c235ce47/metrics/precision_test_A
diff --git a/...03f96fca038c235ce47/metrics/recall_test_A → ...03f96fca038c235ce47/metrics/recall_test_A b/...03f96fca038c235ce47/metrics/recall_test_A → ...03f96fca038c235ce47/metrics/recall_test_A
diff --git a/...f403f96fca038c235ce47/tags/mlflow.runName → ...f403f96fca038c235ce47/tags/mlflow.runName b/...f403f96fca038c235ce47/tags/mlflow.runName → ...f403f96fca038c235ce47/tags/mlflow.runName
diff --git a/...f96fca038c235ce47/tags/mlflow.source.name → ...f96fca038c235ce47/tags/mlflow.source.name b/...f96fca038c235ce47/tags/mlflow.source.name → ...f96fca038c235ce47/tags/mlflow.source.name
diff --git a/...f96fca038c235ce47/tags/mlflow.source.type → ...f96fca038c235ce47/tags/mlflow.source.type b/...f96fca038c235ce47/tags/mlflow.source.type → ...f96fca038c235ce47/tags/mlflow.source.type
diff --git a/...9e2f403f96fca038c235ce47/tags/mlflow.user → ...9e2f403f96fca038c235ce47/tags/mlflow.user b/...9e2f403f96fca038c235ce47/tags/mlflow.user → ...9e2f403f96fca038c235ce47/tags/mlflow.user
diff --git a/...de4f676a3104c25bafd36b02a6db5ca/meta.yaml → ...de4f676a3104c25bafd36b02a6db5ca/meta.yaml b/...de4f676a3104c25bafd36b02a6db5ca/meta.yaml → ...de4f676a3104c25bafd36b02a6db5ca/meta.yaml
diff --git a/...104c25bafd36b02a6db5ca/metrics/accuracy_B → ...104c25bafd36b02a6db5ca/metrics/accuracy_B b/...104c25bafd36b02a6db5ca/metrics/accuracy_B → ...104c25bafd36b02a6db5ca/metrics/accuracy_B
diff --git a/...f676a3104c25bafd36b02a6db5ca/metrics/f1_B → ...f676a3104c25bafd36b02a6db5ca/metrics/f1_B b/...f676a3104c25bafd36b02a6db5ca/metrics/f1_B → ...f676a3104c25bafd36b02a6db5ca/metrics/f1_B
diff --git a/...04c25bafd36b02a6db5ca/metrics/precision_B → ...04c25bafd36b02a6db5ca/metrics/precision_B b/...04c25bafd36b02a6db5ca/metrics/precision_B → ...04c25bafd36b02a6db5ca/metrics/precision_B
diff --git a/...a3104c25bafd36b02a6db5ca/metrics/recall_B → ...a3104c25bafd36b02a6db5ca/metrics/recall_B b/...a3104c25bafd36b02a6db5ca/metrics/recall_B → ...a3104c25bafd36b02a6db5ca/metrics/recall_B
diff --git a/...04c25bafd36b02a6db5ca/tags/mlflow.runName → ...04c25bafd36b02a6db5ca/tags/mlflow.runName b/...04c25bafd36b02a6db5ca/tags/mlflow.runName → ...04c25bafd36b02a6db5ca/tags/mlflow.runName
diff --git a/...5bafd36b02a6db5ca/tags/mlflow.source.name → ...5bafd36b02a6db5ca/tags/mlflow.source.name b/...5bafd36b02a6db5ca/tags/mlflow.source.name → ...5bafd36b02a6db5ca/tags/mlflow.source.name
diff --git a/...5bafd36b02a6db5ca/tags/mlflow.source.type → ...5bafd36b02a6db5ca/tags/mlflow.source.type b/...5bafd36b02a6db5ca/tags/mlflow.source.type → ...5bafd36b02a6db5ca/tags/mlflow.source.type
diff --git a/...a3104c25bafd36b02a6db5ca/tags/mlflow.user → ...a3104c25bafd36b02a6db5ca/tags/mlflow.user b/...a3104c25bafd36b02a6db5ca/tags/mlflow.user → ...a3104c25bafd36b02a6db5ca/tags/mlflow.user
diff --git a/src/models/mlruns/0/meta.yaml → reports/mlruns/0/meta.yaml b/src/models/mlruns/0/meta.yaml → reports/mlruns/0/meta.yaml
diff --git a/...dels/output_codecarbon/output_train_a.csv → reports/output_codecarbon/output_train_a.csv b/...dels/output_codecarbon/output_train_a.csv → reports/output_codecarbon/output_train_a.csv
diff --git a/.../output_codecarbon/output_train_a.csv.bak → .../output_codecarbon/output_train_a.csv.bak b/.../output_codecarbon/output_train_a.csv.bak → .../output_codecarbon/output_train_a.csv.bak
diff --git a/...dels/output_codecarbon/output_train_b.csv → reports/output_codecarbon/output_train_b.csv b/...dels/output_codecarbon/output_train_b.csv → reports/output_codecarbon/output_train_b.csv
diff --git a/.../output_codecarbon/output_train_b.csv.bak → .../output_codecarbon/output_train_b.csv.bak b/.../output_codecarbon/output_train_b.csv.bak → .../output_codecarbon/output_train_b.csv.bak
diff --git a/tests/README.md b/tests/README.md
@@ -14,7 +14,27 @@ The tests conducted are categorized as follows:
 
 - **Preprocessing Testing**: These tests are aimed at the preprocessing steps of our data pipeline. We validate the methods used for cleaning, normalizing, and transforming data to ensure they are correctly implemented and contribute positively to the performance of our models.
 
-Through the use of `pytest`, a powerful testing framework, and `Great Expectations`, an advanced tool for validating and documenting data quality, we strive to build a project that is functional, dependable, and efficient. The subsequent sections will delve into each testing category, detailing their implementation and execution within our project.
+## Tools
+
+In our project, we place a strong emphasis on the reliability and quality of our software and data. To achieve this, we utilize two key tools: `Pytest` and `Great Expectations`. These tools form the backbone of our testing and validation framework, ensuring that our project meets high standards of functionality, dependability, and efficiency.
+
+### Pytest
+
+`Pytest` is a powerful and flexible testing framework for Python. It is used extensively for writing simple unit tests as well as complex functional tests. It offers features such as:
+
+* A simple syntax for writing tests.
+* The ability to run tests in parallel, significantly improving test execution time.
+* Extensive support for fixtures, allowing for reusable test configurations.
+* Easy integration with other tools and services for enhanced testing capabilities.
+
+### Great Expectations
+
+`Great Expectations` is an advanced tool that plays a crucial role in validating, documenting, and profiling our data quality. Great Expectations helps us by:
+
+* Validating data against a predefined set of rules and criteria, ensuring that it meets the quality standards required for accurate analysis and modeling.
+* Creating clear and understandable documentation of our data.
+* Profiling data to provide insights into its characteristics, distribution, and structure of the data.
+
 
 ## Behavioral tests
 ### Directional Test