Predicting Bike Share Demand: Learning by Eking Out Maximum Performance

The objective of this competition is to accurately predict demand for bikes in a bike sharing business using only historical data and a few other variables like the weather, season and working/holiday. This is a comprehensive description on obtaining a top 20 entry on the leaderboard for the bike share competition on Kaggle. Note that there is no practical difference between the top 100 entries or so. My motivation in trying to place as high as I did was to learn as many techniques as I could.

Detailed Walkthroughs (Jupyter Notebooks)

Detailed Walkthrough 1: Exploratory Data Analysis

Detailed Walkthrough 2: Model Exploration

Detailed Walkthrough 3: Hyperparameter Optimization

Detailed Walkthrough 4: Autocorrelative Models

Detailed Walkthrough 5: Ensemble Models

Road to Top 20

Getting to top 20 on this competition took a lot of work. I had to engineer new features, learn new modeling approaches, explore new hyperparameter optimization methods and apply ensemble learning techniques. Here is a partial summary of the things that I tried:

Feature engineering the date and time variables
Tried a huge array of ML models including random forests, extremely randomized trees, generalized linear models, elastic nets, generalized additive models and autoregressive models
Applied bayesian hyperparameter optimization techniques to eke out every last ounce (micro-ounce?) of performance
Used ensemble techniques to reduce variance and improve predictions

Dataset

The dataset consisted of the following variable:

datetime - hourly date + timestamp
season - 1 = spring, 2 = summer, 3 = fall, 4 = winter
holiday - whether the day is considered a holiday
workingday - whether the day is neither a weekend nor holiday
weather -
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp - temperature in Celsius
atemp - "feels like" temperature in Celsius
humidity - relative humidity
windspeed - wind speed
casual - number of non-registered user rentals initiated
registered - number of registered user rentals initiated
count - number of total rentals

Metric

The metric is root mean square logarithmic error. I've reproduced the metric from the competition for reference.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
eda		eda
hyperopt		hyperopt
images		images
otherModels		otherModels
reports		reports
savedModels		savedModels
.gitignore		.gitignore
DetailedWalkThrough3.ipynb		DetailedWalkThrough3.ipynb
DetailedWalkthrough.ipynb		DetailedWalkthrough.ipynb
DetailedWalkthrough2.ipynb		DetailedWalkthrough2.ipynb
DetailedWalkthrough4.ipynb		DetailedWalkthrough4.ipynb
DetailedWalkthrough5.ipynb		DetailedWalkthrough5.ipynb
README.md		README.md
bikeShareResults.ods		bikeShareResults.ods
ensemble.R		ensemble.R
kaggleBikeShare.Rproj		kaggleBikeShare.Rproj
modelPerformance		modelPerformance
support.R		support.R
xgbFit.R		xgbFit.R
xgbSeparate.R		xgbSeparate.R
xgbWithPrevPreds.R		xgbWithPrevPreds.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Bike Share Demand: Learning by Eking Out Maximum Performance

Detailed Walkthroughs (Jupyter Notebooks)

Road to Top 20

Dataset

Metric

About

Releases

Packages

Languages

ganesh-krishnan/bikeshareDemand

Folders and files

Latest commit

History

Repository files navigation

Predicting Bike Share Demand: Learning by Eking Out Maximum Performance

Detailed Walkthroughs (Jupyter Notebooks)

Road to Top 20

Dataset

Metric

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages