This is the final project for UC Berkeley's Stat 154: Modern Statistical Prediction and Machine Learning.
You can access our project write-up detailing our data exploration and modeling process here. The models produced in this project have been submitted to the class Kaggle competition hosted here.
The purpose of this project is to predict the severity of traffic accidents in the United States, using real-time traffic, location, and weather data from around 3 million accidents spanning 49 states. Our objective was to design a binary classifier for severe accident detection.
- Data Visualization
- Feature Engineering
- Machine Learning
- Sampling Methods
- R
- Packages:
tidyverse
,dplyr
,ggplot2
,caret
,glmnet
,lubridate
,e1071
,MASS
- Packages:
- Python
- Packages:
pandas
,numpy
,scikit-learn
,seaborn
,matplotlib
,pytorch
- Packages:
- Jupyter Notebook
- Clone this repo (for help see this tutorial).
- The datasets used and created during this project can be accessed here. Place the training, validation, and test sets in the general
~/accidents
directory. - Data processing/transformation scripts are kept here
- The models are kept here. To run a model, navigate to the
~/accidents/models
directory and run the model of choice.
- Devan Jaganath
- Ritvik Iyer
- Ryan Chien