Alix, Paramveer, Susannah, Zoe
This project aims to analyze and predict the quality of wine based on various physicochemical properties. Using the UCI Wine Quality dataset, we conduct data preprocessing, exploratory data analysis, and build machine learning models to predict wine quality. The dataset includes multiple features, such as acidity, alcohol content, and sugar levels, which are critical in determining the quality score of wines. The project utilizes cross-validation and hyperparameter tuning to optimize model performance.
Dataset: The dataset was sourced from the UCI Machine Learning Repository.
Preprocessing: Standardization of numerical features. One-hot encoding for binary categorical features (e.g., color).
Exploratory Data Analysis: Distribution of wine quality scores. Correlation heatmaps to identify relationships between features. Key insights on influential features.
Modeling: Logistic regression was used as the base model. RandomizedSearchCV was applied for hyperparameter optimization. The model was evaluated using metrics such as accuracy, precision, recall, and F1-score.
If you are using Windows or Mac, then please ensure that Docker Desktop is running. The user can be check if they have Docker by running the following command in a bash terminal:
docker --version
.
- Clone this GitHub repository.
- Make sure
docker-compose.yml
is using the image with the tag you wish to run it with. No changes are necessary if there is not a specific image tag you would like to run.
-
Run the following command in a terminal in the root of the local repository to use the Docker image to run the analysis:
docker compose up
This command will automatically start up a Jupyter Lab session using the image listed in the
docker-compose.yml
file and mount the current project in the Docker container. -
In the terminal, look for the Jupyter Lab link which starts with
http://127.0.0.1:8888/
. Copy and paste the URL into the browser to open up Jupyter Lab. -
To run the analysis, open the notebook from
notebooks/wine_quality.ipynb
in this new Jupyter Lab session and in the "Kernel" menu, click "Restart Kernel and Run All".
If the above instructions fail to work, the user can download the packages locally using the provided environment.yml
file and create a conda environment.
-
In the root directory of the project, run the following command in a bash terminal to create a conda environment named
522
:conda env create -f environment.yml
-
Activate the conda environment using the following command:
conda activate 522
-
Run Jupyter Lab using the following command:
jupyter lab
-
Once the Jupyter Lab session starts, navigate to the appropriate URL (
http://127.0.0.1:8888/
) and open the notebook fromnotebooks/wine_quality.ipynb
, select the522
kernel and run all the cells.
Hit Ctrl + C
in the terminal to end the Jupyter Lab session. Run the following command after the session ends to free up the resources used by Docker: docker compose rm
.
conda
(version 24.9.1 or higher)conda-lock
(version 2.5.7 or higher)- Python package
ucimlrepo
(version 0.0.7) jupyterlab
(version 4.2.0 or higher)nb_conda_kernels
(version 2.5.1 or higher)- Python and packages listed in
environment.yml