- About the project and my own data
- Data
- Access
- Findings
- File descriptions
- How to interact
- Acknowledgements
- Author
- License
Two of my favorite hobbies are watching movies and going to the opera. Inspired by some friends who had good old analogue diaries, I started writing down some data every time I watched a movie or went to the opera. First I used a Word document, but changed to Excel very soon (wise choice). The data I currently gather is as follows:
- date
- title (normally the original title as in IMDB)
- creator (the director or with operas the composer)
- release year (first release, which is important to parse the data with the TMDB data)
- place (which cinema, which opera, etc.)
- company (who was with me; this information does not appear in my dashboards, Datenschutz you know ;))
- category (Netflix, Blu Ray, cinema, etc.)
- evaluation (my own evaluation between 0 and 100)
- imdb_id (I had a phase where I stored every IMDB ID, but this was to much work)
- comment (sometimes I write some thoughts about the movie or opera of if it was some special occasion)
Apart from movies and operas I also gather information about books and concerts. I even started to write down new Whiskeys I tried but it seems that I am not a sufficiently heavy drinker for this...
After doing some Data Science courses in Codecademy and Udacity, I first analyzed my diary with a Jupyter Notebook and decided then - after learning about Web-Development - that I should make a dashboard.
Additional to my own data (see above), I have currently the following external data sources:
- The Movie Databaes (TMDB): A community built database since 2008. I access TMDB over their own API using a very practical wrapper.
I plan to add more data sources for some recommender engines and will update this list.
You can have a look at the dashboard via https://movie-opera-dashboard.onrender.com.
Currently I have three subpages:
- Movie diary: It shows how many movies I watched in which category since the beginning of 2017.
- Me vs. TMDB: It shows the differences between my evaluation and the TMDB community.
- Opera diary: The operas I saw and their composers.
- Top 50: Showing the 50 movies with the highest evaluation.
Movie diary
- Clearly I have watched a lot of movies at home during March, April and December 2020 (Corona-time).
- My movie selection in airplanes I not the best and it makes sense: Normally I prefer to watch some easy and entertaining movies during long flights (Marvel, Pixar). Furthermore, I watch movies in airplanes that I would probably not watch at home or pay for them in the cinema.
- Considering the three main streaming platforms (Netflix, Mubi and Amazon Prime), I watch most stuff on Netflix, but the average evaluation is highest on Amazon Prime and Mubi.
- Peter Jackson, Christopher Nolan and Quentin Tarantino are some directors that I could watch every week. I think this is clear in the stats.
Me vs. TMDB
- I tend to evaluate the movies slightly better than the TMDB community.
- With some of my favorite movies (The Host, Cherry Blossoms) I am much more generous than the TMDB community.
Opera diary
- I see a lot of Wagner and Verdi operas.
- Unfortunately, here is missing a lot of data from 2003 until 2013, which were also very heavy opera years.
The webapp relies on some files that make up the environment for it using Flask and Gunicorn:
- init, myapp and routes: Initiate the Flask app and render the html templates with the plots as JSON data.
- wrangel_data_movies/tmdb/opera: Data wrangling files that prepare the data and return Plotly graph objects
- CSV-files: These files are exports from the original diary with a few changes (e.g. without the company information)
Every idea or contribution is welcome. My next project will be to design a recommender.
Thanks to the following packages or service providers:
- The TMDB community, especially for providing an easy to use API.
- The already mentioned TMDB API wrapper by Celiao, which I use to access the TMDB database.
- Derek Eders tutorial about converting CSV to HTML tables helped me a lot when preparing the tables.
- Of course, without Github, Render and Stackoverflow, the dashboard would not exist.
Maximilian Müller, Development Manager, Consultant and Account Executive for the smart energy transition. Strong interest in movies, opera and Data Science.
Copyright 2022 Maximilian Müller
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.