In the interest of exploring the potential for purchasing a condo in NYC, I dug into Airbnb's datasets listings & reviews datasets for the last 4 years (2015-2019) to better understand:
- Overall Market (trends, average prices, segments)
- Potential Neighborhoods (which neighborhoods is best?)
- Configuration (how many bedrooms & bathrooms?)
- Rating Categories (which ones matter, how to do well)
This repo contains all the details of the CRISPR-DM analysis performed to generate key insights.
To run the scripts in the Jupyter notebooks, you will only require the Anaconda distribution of Python (v3.0+).
Notebooks
There are 5 notebooks that comprise this analysis:
- Airbnb NYC Data Exploration.ipynb: The bulk of our analysis
- Calendar - Booking Rate.ipynb: Data mining to generate the average monthly booking rates for each listing in 2018-2019.
- Alternative Models.ipynb: Work to test alternative modelling options (Lasso, Ridge, Decision Trees)
- Listings Time Series.ipynb: Time series plot of # of listings (NYC & by region).
- Reviews Time Series.ipynb: Time series plot of # of reviews (NYC & by region).
Output
In addition to the above notebooks, there also are a few key data files:
- mBookRates.csv: Monthly booking rates by listing, generated by Calendar - Booking Rate.ipynb
- model_X.csv: Variables for training & testing.
- model_y.csv: Parameters for training & testing.
A blog post that details the results & findings of this code can be found here.
Kudos to Inside Airbnb for the incredible data! Other than crediting Inside Airbnb for use of their NYC datasets, feel free to use the code as you like!