This repository contains a collection of scripts that help convert some of the data from the MovieLens 100k Dataset into JSON files that are easier to handle.
Each script has a --help
(alt. -h
) command that should help with using it. Ideally, you'll find that the scripts will be used in the following order:
process_ml100k.py
- Generates the initial JSON file.correct_title.py
- Separates the release year from the movies' titles, putting it in a separatemovie_year
field.get_imdb.py
- Enriches each movie entry with a few IMDb data; requires an internet connection and may take a while.prune.py
- Removes movies for which the IMDb data couldn't be retrieved.
These scripts are released under the terms of the MIT license.