Authors: Hrayr Muradyan, Azin Piran, Sopuruchi Chisom, Shengjia Yu.
In this project we try to predict US airline customer satisfaction based on several factors like: gender, age, travel class, etc. Understanding customer satisfaction is very important for airlines as it provides directions to improve the service and equipment. The right improvement, subsequently, will translate to an increase in revenue.
Additionally, it will be easier to build customer loyalty. This is essential as loyal customers often promote the airline through word-of-mouth or positive reviews, reducing the cost of acquiring new customers (Sadegh Eshaghi, 2024).
Thus, the reasons to conduct the this analysis are abundant.
The main question is: can we accurately predict the customer satisfaction from the information we have?
The dataset we use to answer this question was sourced in Kaggle, posted by @teejmahal20 (TJ Klein). It is important to note that the dataset was originally posted by @johndddddd, which is then modified and cleaned by @teejmahal20. The full dataset can be found here. Additionally, the dataset contains only US airline data, as mentioned in the original source.
The final report can be found: here
This project requires the following Python packages and versions:
- ipykernel: Used for interactive computing in Jupyter notebooks.
- matplotlib: A library for creating static, animated, and interactive visualizations in Python.
- numpy: A package for numerical computing and handling arrays.
- pandas: A powerful data manipulation and analysis library.
- python: The programming language required to run the project.
- scikit-learn: A library for machine learning algorithms and data mining.
- seaborn: A Python visualization library based on matplotlib that provides a high-level interface for drawing attractive statistical graphics.
- conda-lock: A tool for generating deterministic, reproducible conda environments.
- jupyterlab: An Interactive Development Environment to write, debug, and test code.
For the recent versions of the dependencies, view the environment file.
To ensure a reproducible environment with exact dependency versions, you can use the conda-lock
file. Follow these steps to set up the environment using the lock file:
First, make sure that conda-lock
is installed on your system. If you don't have it installed, you can install it via Conda:
conda install conda-lock
After conda-lock
is installed, use the conda-lock
file to install the environment. Run the following command in the directory containing the 'conda-lock.yml' file:
conda-lock install
Once the dependencies are installed, create the environment using:
conda env create --file conda-lock.yml
Then, activate the environment:
conda activate <your-environment-name>
The steps below outline how to set up and run the analysis. Currently, the analysis requires a Docker-based computational environment, which is initialized first.
Note: The instructions contained in this section assume the commands are executed in a unix-based shell.
-
Install Docker: Install Docker and ensure that the docker engine is running.
- To confirm that the docker engine is running open a terminal/command line and execute the following command:
docker run hello-world
- The generated output should begine with a line Hello from Docker!
-
Clone this Repository:
- Next, clone this repository to your local machine.
git clone <repo_url>
-
Start the Docker container locally:
- In the terminal/command line navigate to the root directory of your local copy of this project.
cd <repo_directory>
- Launch the docker container image for the computational environment.
docker-compose up
- The terminal logs should display an output similar to: Jupyter Server 2.14.2 is running at:
- Locate and click on the http address in the logs to access the Jupyter application from your web browser.
- Search for and your token(in the logs) if prompted for one.
The Jupyter environment allows for interactive execution of the analysis.
- In the Jupyter notebook interface, open the file
airline_passenger_satisfaction_predictor.ipynb
from thenotebook
folder. - Click "Run All" to execute the entire analysis.
The results of the analysis will be displayed within the notebook as it runs each cell.
The code in this repository is licensed under the MIT license. Refer to the LICENSE file for more details.
TJ, Klein. (2020, February). "Airline Passenger Satisfaction". Retrieved November 20, 2024 from Kaggle Dataset.
M. Sadegh Eshaghi, Mona Afshardoost, Gui Lohmann, Brent D Moyle, Drivers and outcomes of airline passenger satisfaction: A Meta-analysis, Journal of the Air Transport Research Society, Volume 3, 2024, 100034, ISSN 2941-198X, https://doi.org/10.1016/j.jatrs.2024.100034.