Author: David Pinto
2020-10-21
This project implements a recommender system for similar movies based on content and collaborative filtering embedding features.
Create a conda
environment and install all required packages listed in the env_requirements.txt
file.
# Create environment
conda create -n movie-similarity -y python=3.7
# Activate environment
conda activate movie-similarity
# Append conda-forge to the list of channels
conda config --append channels conda-forge
# Install dependencies
conda install -y --file env_requirements.txt
# Add environment to Jupyter
python -m ipykernel install --user --name=movie-similarity
numpy
andpandas
for data cleaning, manipulation and transformation.scipy
for sparse matrices and correlation measures.unidecode
andnltk
for text manipulation.scikit-learn
for data normalization and text vectorization.vaex
for manipulation of large DataFrames.matplotlib
andplotnine
for data visualization.lightfm
for collaborative filtering with matrix factorization.faiss
for fast Approximate Nearest Neighbors algorithms.
Take a look at the data/raw
folder to get instructions on how to download the dataset.
The project is organized on Jupyter notebooks. Each notebook is self-contained and well documented:
- 1. Data Preparation.
- 2. Exploratory Analysis.
- 3. User Based Similarity.
- 4. Content Based Embedding.
- 5. Collaborative Fltering Embedding.
- 6. Similarity Match with ANN.
- 7. Performance Evaluation.
- 8. Hybrid Approach.
You can play with the movie embedding features using the Embedding Projector here. It can take a few seconds to start. But it will be worth it!
Take a look at the projector
folder to see some results.
The project provides a Streamlit application to play with the movie recommender.
To run it locally:
make docker-build
make docker-run
Congratulations! You have it running on 127.0.0.1:8501
:
Choose an recommendation algorithm and a movie title to get recommendations of similar movies. I hope you enjoy it!