crawler_old

本科毕设, 模仿Shodan，对比ZoomEye

针对中文网页爬虫使用Gevent Pool，Redis分布式 cms识别因为时间问题，没用到机器学习，根据robots.txt首页meta等关键字分类内置massscan，可以分布式扫IP 端口，百兆带宽，40多分钟可扫全国4亿IP。

运行2天，共识别了20多万cms，约20%的ZoomEye数量，dedecms 30% Wordpress 20%

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
diedcmds		diedcmds
fingerprint		fingerprint
ipscan		ipscan
Error.py		Error.py
README.md		README.md
RedisQueue.py		RedisQueue.py
chpwd.py		chpwd.py
count.py		count.py
daemon.py		daemon.py
daemon_dnsfilter.py		daemon_dnsfilter.py
dbfile_inc.py		dbfile_inc.py
del.py		del.py
dnsfilter.py		dnsfilter.py
dnsserver.py		dnsserver.py
extractdoneurls.py		extractdoneurls.py
extractseeds.py		extractseeds.py
getdomainip.py		getdomainip.py
init.py		init.py
master_daemon_filter.py		master_daemon_filter.py
master_daemon_filter_robots.py		master_daemon_filter_robots.py
master_getcmsip.py		master_getcmsip.py
rcmd.py		rcmd.py
redis_inc.py		redis_inc.py
rexec.py		rexec.py
rscan.py		rscan.py
scanret.py		scanret.py
seeddeal.py		seeddeal.py
test_bf.py		test_bf.py
test_ch.py		test_ch.py
test_crawler.py		test_crawler.py
test_dnsserver.py		test_dnsserver.py
test_filter.py		test_filter.py
test_gevent.py		test_gevent.py
test_gevent_event.py		test_gevent_event.py
test_geventtimeout.py		test_geventtimeout.py
test_multiprocess.py		test_multiprocess.py
test_mysql.py		test_mysql.py
test_queue.py		test_queue.py
test_redis.py		test_redis.py
test_sqlite.py		test_sqlite.py
test_top1m.py		test_top1m.py
uploadfile.py		uploadfile.py
worker_crawler.py		worker_crawler.py
worker_crawler_getrobots.py		worker_crawler_getrobots.py
worker_daemon.py		worker_daemon.py
worker_filter.py		worker_filter.py
worker_scanport.py		worker_scanport.py

Provide feedback