This project has been developed to address the task of retrieving and analyzing tweets from Twitter API to match the requests of the Big Data Analytics course from the Master Degree in Artificial Intelligence at UB/UPC/URV.
It consists of different kind of analysis made on Tweets objects, such as:
- the languages they're written in;
- the sources(i.e. the devices/apps) they're written through;
- their number of retweets/responses;
- a deeper analysis on the text(s) features.
All the tasks have been done by using Python, with its useful pandas, numpy, scikit-learn libraries, and MongoDB for storing the retrieved tweets (both queried in real time or previously tweeted ones). Finally, to address the assignment of analyzing texts, it's been crucial the usage of nltk package.
For further explanations, please read the Wiki.