Welcome to craigslist-housing-miner, a web scraping tool that extracts information from up to every Craigslist Housing post around the world.
Note: This tool should only be used for personal use and data analysis.
This project leverages asynchronous execution of processes to rapidly mine information from Craigslist Housing posts. The data is written to CSV in the following format:CraigslistHousing_{country/state}_{region/subregion}.csv
An example of a CSV file for Dothan, Alabama would look like this:
CraigslistHousing_alabama_dothan.csv
Another example of a CSV file for Tokyo, Japan would look like this:
CraigslistHousing_japan_tokyo.csv
- Clone this github repository.
- Install the required dependencies:
pip install -r requirements.txt
- Run
main.py
:
python main.py
The user is given two prompts:
Input a list of appropriate countries.
If no list is provided, a global search will be conducted:
Input a list of appropriate country keywords in which you would like to search. You may find the full list of country keywords here. For example:
['united_states', 'japan', 'canada']
Would you like to include geotags of your Craigslist posts [y/n]:
Type y
if you would like to receive geographic coordinates for every craigslist post:
Note: acquiring geotags will take a considerable amount of time.
To mitigate this, you can omit geotags by typing n
:
The application will exit once the process is finished; otherwise, you may have to repeatedly press CTRL + C
in your operating terminal to properly exit the application.
All data is stored in the craigslist-housing-miner/data/{date data acquired}
directory.
For example:
craigslist-housing-miner/data/2020-06-14
craigslist-housing-miner is a useful tool if you are intereted in studying housing posts on Craigslist. However, there are two limitations in the tool's current state:
- The current state of the project is not a PyPI library.
- Lack of GUI to facilitate easy selection of housing type(s), countries, regions, etc.
These features are scheduled to be implemented in the near future.
Fin