Author- Kanishka Narayan ([email protected])
Description- The python module can be used to scrape data and process data from different sources. The python module can output data as either as a dataframe in the country year format or it will output data in excel files This module has primarily been created for processing data for the International Futures (IFs) Project however, it can be used to process data in general. The module can be used to process data from the following sources,
- World Bank World Development Indicators (WDI)
- UNESCO Education indicators(UIS)
- FAO Food Balance Sheets consolidated data (FAO)
- IMF Global Finance Statistics (IMF GFS) (Revenue and Expenditure Data)
- Health detailed data file from IHME
- Health data from the Institute for Health and Metric Evaluation (IHME)
- Water data from FAO AQUASTAT
- Energy data from EIA
- Fish detailed agricultural data (FAO FBS Fish)
Instructions for users new to Python:
- Download and install the latest version of Pycharm for your computer here- https://www.jetbrains.com/pycharm/
- Download and install Python version 3.7 from here- https://www.python.org/downloads/
Instructions on general use:
- First download the zip file Pythonfiles.zip from this source below- https://drive.google.com/file/d/1aD2Zi_CEsunQbJBkhbDvhwNX82EaGCU2/view?usp=sharing
- Place the zip file as is in the 'input/' folder in the same folder as your DataUpdate.py file
- Create a new project in your Pycharm and copy the two Python files DataUpdate.py and DataforIFsFirstTimeInstallations.py to this project. (You have the option to copy the third file DataforIfs.py as well.
- I have added a third python file called DataForIFs.py which can be used to run all the commands necessary to process data. However, using this file is optional.
- First to set up run the below code,
import DataforIFsFirstTimeInstallations DataforIFsFirstTimeInstallations.InstallAll()
This will install all the modules required by the DataUpdate module.Please note that you need to run the installation commands only once.
- Now you are set up to run the DataUpdate module
- As mentioned above, the DataUpdate module can output direct dataframes. For example, to output data from the FAO Food Balance Sheets, and save to a dataframe "AgData", run the code below,
import DataUpdate AgData= DataUpdate.FAOFBS()
- Similarly, to save an excel file of the FAOFBS data, run the below,
import DataUpdate DataUpdate.FAOFBSFile()
- The users will see that the base data for many of these pulls is located in the PythonFiles folder under the path "C:\Users\Public\Pythonfiles"
- Currently, only the WDI data pull and the UIS data pull scrape data using APIs. For the rest, just update the Base data in the Python files folder.
- Similarly, users can make changes to individual concordance tables as well
- Use the CreateSQLTable.py file to generate the MasterData.db database. A total of 978 data series will be generated!