Visualize your personal netflix statistics, a small sample project utilizing new features found in Java 8, 9 and 10.
Before you ask, no, I did not accumulate all those hours and episodes on my own. While the viewing history of sub accounts is seperated some people tend to use the service utilizing my profile.
If you create your own graphics and stumble upon a great looking color theme feel free to send me a message and I'll add a small collection of presets.
- Java 10 Runtime Environment.
Used to parse the viewing activity file and download additional information for movies and series. - R 3.5.1 and RStudio (optional) for visualization
- A Trakt account to gain access to a movie metadata database.
- Create an account at Trakt.tv
- Create an "App" to retrieve your api key. https://trakt.tv/oauth/applications/new. All you need to do is choose a random name and put a redirect uri doesn't need to be valid (e.g. "https://localhost.de")
- Copy the client id and save it somewhere for later usage
- Download the distribution.zip archive and extract the files
Check your java version by opening the terminal and type java -version
First we need to retrieve the viewing history file from Netflix. Sadly Netflix offers only a very limited data set, namely the series/movie title as well as the date it was watched. To generate more interesting statistics additional information like the runtime, genre, actors ... are needed. The java program will query the trakt database and do it's best to collect whatever material it can get it's hands on.
Go to https://www.netflix.com/viewingactivity scroll to the bottom and download your viewing activity file.
Locate the NetflixAnalyzer.jar file and place the downloaded csv alongside the jar
Open the terminal and type
cd PathToJarFile
java -jar NetflixAnalyzer.jar traktClientId
Hint: on windows you only need to type cd
and drag and drop the .jar file in the terminal. This will copy and paste the file location automatically for you. Alternatively you can click the url bar of the explorer and copy paste the path.
Click enter and after a minute 3 additional csv files will appear. Warnings are perfectly fine.
Now R comes into play. Fire up R Studio and open the CreateInfographics.r file. (Open the R folder and double click the file).
Important: After the R file opened click File -> Reopen with encoding
and choose UTF-8.
Scroll down to the settings section (around line 43) and adjust the paths (optinally adjust the color settings). Now you are good to go. Select the entire code block and click run. Ctrl + A -> Ctrl + Enter
After a few seconds the infographics should be generated.
If you wish to modify the code go ahead and clone the repository
git clone https://github.com/KilianB/NetflixViewingActivityVisualizer.git
.
mvn package
will run the tests compile classes and bundle the binaries in the distribution.zip.
The netflix viewing activity data can be described as minimalistic at best. Once you started watching an episode/movie it will appear in the history file. We have no way to distinguish if someone just peeked at an item or fully watched it therefor the runtime will be overestimated. On the flipside, if an episode/movie was watched a second time the first entry will be removed from the history file resulting in an underestimation. All you can do is to reguarily download your viewing activity file and merge it to get a better representation of the data.
Selenium + a daily batch job + h2database anyone? Maybe a great next weekend project.
A small amount of items are misclassified (movie as a show, show as a movie), either due to the fact that Trakt is not aware of those shows or because the parser's regex isn't good enough. Netflix doesn't make it easy either. Movies may have multiple colons, quotation marks, series may have titles without season number etc. A range of examples can be found in the Unit test TestNetflixParser.java
- Attemp to improve the parser (have a look at NetflixParser.java)
So far I used a rather easy regex, be my guest and improve it:
private final static Pattern splitLine = Pattern.compile("\"(?<Title>.*)\""+INPUT_DELIMITER+"\"(?<Date>.*)\"");
/**
* If we have a show try to separate the series, season and episode title
*/
private final static Pattern showPattern = Pattern.compile(
"(?<series>.*(?:(?:Season|Staffel|Part) [0-9]+)(?=:)): (?<epTitle>.*)",
java.util.regex.Pattern.UNICODE_CHARACTER_CLASS);
/**
* Once we get the season extract the season number
*/
private final static Pattern seasonPattern = Pattern.compile(
"(?<series>.*(?=:)): (?<seasonText>[^0-9]*)(?<season>[0-9]*)",
java.util.regex.Pattern.UNICODE_CHARACTER_CLASS);
A catch. Some series don't have season numbers which this regex assumes they do!.
- If Trakt does not return a result when querying for a movie/show try the opposite and see if we receive anything useful.
Take a look at NetflixAnalyzer.java
The source was never intended to go public therefore no time was spend on the code being understandable or optimized. The main objective was to use some new concepts I haven't had much exposure to recently.
The interesting stuff happens in the gigantic blob NetflixAnalyzer.java
- Java 8
- Method reference ::
- Predicates
- Java 9:
- streams
- lambda expressions
- Java 10 *local variable type inference
I have no idea how to use R. Everything in the R file should not be considered good coding. I am aware of the glitches in gridTextMulticolor
. I just wrote it to be good enough for the use cases I encountered.
The project is licensed under GPLv3. Icons used were downloaded from flaticon and freepik and are licensed by CC 3.0 BY. Individual authors :
The basic theme and layout is based on a blog post by Al-Ahmadgaid Asaad.