Getting proxies from public sources

How to install

pip install proxy-crawler/

How to use

Usage

import requests
from scrapeproxy import proxies
# example of filters
# proxy = proxies.get_proxies(limit=1, filter_sources=["best", "hmn", "pub"])
proxy = proxies.get_proxies()
requests.get(url, proxies=proxy[0]["proxy"])
...

proxy output example (info fields are custom for each source)

[{'info': {'IP Address': '35.221.107.127',
   'Port': '3128',
   'Code': 'US',
   'Country': 'United States',
   'Anonymity': 'anonymous',
   'Google': 'no',
   'Https': 'yes',
   'Last Checked': '1 minute ago'},
  'proxy': {'http': '35.221.107.127:3128', 'https': '35.221.107.127:3128'},
  'proxy_string': '35.221.107.127:3128',
  'source': 'https://www.us-proxy.org/'}]

NOTE : by default it holds cache(session-wide, so new import new cache) and does not return the same proxy multiple times, but you can clear cache with empty_cache param, for more details typeproxies.get_proxies?

NOTE : we are trying to crawl large number of proxies so the quality may be not the best, we suggest to set short timeouts (ex. 3 seconds) and use that to ignore the proxies which are too slow or unresponsive.

Supported sources

Check this file

How to contribute

Add more sources for proxies.

Add file for your source in handlers folder.
Write class with method get_proxy_list, with params (limit=-1, anonymous=True, https=True, google=False) which returns list of dicts with format.

{
  "info" : dict with info about proxy,
  "proxy" : dict with http and https keys
  "proxy_string" : string with format host:port
  "source" : source of proxy
}

Go to proxy_sources.py and add your source with same format in proxy_sources list. [NOTE] order is very important and denotes importance and qualiry of source.
Nothing else, thanks ;).
One more step, please write docstrings and comment your code.

TODO

We have decided to try having proxy pool instead of module (to easier overcome the limits)
add more sources and maybe verify them
test and compare proxy-pool vs proxy-module approaches

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
scrapeproxy		scrapeproxy
README.md		README.md
__init__.py		__init__.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting proxies from public sources

How to install

How to use

Supported sources

How to contribute

TODO

About

Languages

teamableresearch/proxy-crawler

Folders and files

Latest commit

History

Repository files navigation

Getting proxies from public sources

How to install

How to use

Supported sources

How to contribute

TODO

About

Resources

Stars

Watchers

Forks

Languages