Web Scraping Project: Biography of country leaders. (Source: Wikipedia.)
Project created in the trainee program of BeCode. The goal is to query an API for a list of countries and their past leaders. Then extract and sanitize their short bio from Wikipedia. Finally, save the data to disk.
Here I explored topics such as: scraping, data structures, regular expressions, concurrency and file handling.
The aim is to practice coding skills according to the following steps:
- create a self-contained development environment.
- retrieve some information from an API
- leverage it to scrape a website that does not provide an API
- save the output for later processing.
Rafaella PORTO, Junior Data Scientist at BeCode.