NFL Big Data Bowl 2024 Pipeline: Predicting Forced Fumbles

Project Overview

The NFL Big Data Bowl is an annual competition where participants analyze NFL data and create innovative solutions to various football-related problems. In this project, my team and I train a random forest classification model to predict when a tackle will result in a forced fumble based on field position, charge caracteristics (speed, acceleration, etc.), player stature (height, weight, etc), and more. The model achieved an accuracy of 98.61% with a low rate of false positives/false negatives.

The data manipulation and analysis pipeline was created using Dagster. The package management is performed with rye.

Repository Conents

setup_venv.sh: This bash script helps download and set up Rye.

src/assets.py: This python script contains all the assets used in the Dagster pipeline, including those for data retrieval, data manipulation, and model training.

test/test_assets.py: This python script contains a few tests our team wrote to ensure proper pipeline functionality.

model_results/random_forest_results.json: This json contains the model accuracy and confusion matrix.

Getting Started

Set up the virtual environment

In a directory of your choice, run git clone https://github.com/pkirti33/nfl_bigdata_pipeline.git
Open a VS Code window with this project and open a new terminal.
Run bash setup_venv.sh.
Run source .venv/bin/activate

Set up the Kaggle API

Follow the Installation and Authentication instructions here.
Within the nfl_bigdata_pipeline directory, create a new file called .env. In this file, type in:

KAGGLE_USERNAME = "your kaggle username"
KAGGLE_KEY = "your kaggle API key"

Launch the Dagster tool

In the terminal window, run dagster dev

Authors:

Pranav Kirti, Will Pagliaro, Ryan Lo, Joseph Lee @ Washington University in St. Louis

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
model_results		model_results
src/nfl_bigdata_pipeline		src/nfl_bigdata_pipeline
test		test
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.lock		requirements-dev.lock
requirements.lock		requirements.lock
setup_venv.sh		setup_venv.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NFL Big Data Bowl 2024 Pipeline: Predicting Forced Fumbles

Project Overview

Repository Conents

Getting Started

Set up the virtual environment

Set up the Kaggle API

Launch the Dagster tool

Authors:

About

Releases

Packages

Languages

pkirti33/nfl_bigdata_pipeline

Folders and files

Latest commit

History

Repository files navigation

NFL Big Data Bowl 2024 Pipeline: Predicting Forced Fumbles

Project Overview

Repository Conents

Getting Started

Set up the virtual environment

Set up the Kaggle API

Launch the Dagster tool

Authors:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages