Team Monkey Divers

Team Members

Project

Our take on the IMDb challenge, using open movie APIs to answer interesting questions by large data analysis.

Our aim was to use data gathered from around the web to answer that age old question: in any given film, how many actors share the same birthday?!

Our plan was to use a web front end along with a PHP back end, drawing from databases most likely built on NoSQL solutions to provide a quick set up and rapid query responses.

Provided IMDb data was difficult to come by due to web connection problems and poor schemas, so we wrote a Python scraper to get birthdays off the large pages (one for each day) provided by IMDb, importing this into a Mongo database and providing a query layer through PHP.

We found open data from freebase.org for film and actor tie ins, using Ruby to condense large files and align actors with films, finding problems again with key referencing in the data but eventually getting enough information from the provider to get a proper import. The problem proved to be then getting a working import into the existing Mongo database.

We planned to run the system on Amazon, and put a lot of work into setting up two flexible EC2 instances, one to respond to user requests and another to store the database. This also presented problems due to load balancing failures but these were overcome, leaving us with stable servers for the project.

A PHP back end was designed and written with full tests for all use cases and objects to carry out major functions. This was extnded with MongoDB functionality but again it proved hard to test as there seemed to be discrepancies between what the CLI gave access to and what PHP could search.

In the end we nearly had an interesting data analysis product, but fell just short due to unforseen database issues and early difficulties getting the right data together.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
assets		assets
imdb_scraper		imdb_scraper
parsing		parsing
tests		tests
.gitignore		.gitignore
README.md		README.md
actor.php		actor.php
compiling.php		compiling.php
index.html		index.html
index.php		index.php
processing.php		processing.php
processing_actors.php		processing_actors.php
processing_dates.php		processing_dates.php
processing_films.php		processing_films.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team Monkey Divers

Team Members

Project

About

Releases

Packages

Contributors 3

Languages

hackmanchester/test_can_rename_repo

Folders and files

Latest commit

History

Repository files navigation

Team Monkey Divers

Team Members

Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages