This project can be used to dynamically crawl a website and detect used CSS selectors (or unused, though some things have to be improved first, see "Immediate TODOs" if you want to contribute). The results are neatly shown in the GUI for easy referral.
It is split into two parts: a crawler and a website acting as GUI. For a more thorough explanation please read my thesis and/or run the code yourself.
I also created a test suite which one an run through the GUI.
I have been unable to finish this as I would like. Some of the things I'd like to have done:
- Fix the State Machine, which crawls a web page by intercepting and firing events.
- Add a JS styler and fix some style issues.
- Find a solution for the Content-Security-Policy
- Probably remove CasperJS in favor of pure PhantomJS
- Multiple crawlers
- Use Google to retrieve a first set of seed URLs to increase coverage
- Add more vendor prefixes
- Use robots.txt to find more seed URLs
- Detect sitemaps to detect more URLs
- Crawl up URL trees (
/activities/1/
to/activities/
) - Major code cleanup
- Direct feedback test suite (instead of having to mash F5)
- Offer ability to download the cleaned CSS files
The project is built with large amount of time pressure and (in the beginning) lack of knowledge of CasperJS and PhantomJS.
CSS detector is built using CasperJS and PhantomJS. The website uses Node and a handful of libraries.
- Install PhantomJS 2.0 (http://phantomjs.org/)
- Install CasperJS (grab the latest master branch)
- Install io.js 3.x.x (https://iojs.org/)
- Go to
/css-detector/crawler
andnpm install
- Go to
/css-detector/website
andnpm install
- You might need to install the sqlite3 module manually from the master branch:
npm install https://github.com/mapbox/node-sqlite3/tarball/master
- The sqlite3 version on NPM is not yet compatible with io.js 3.x.x. As soon as it is, set
dependencies.sqlite3
in/css-detector/website/package.json
to"sqlite3": "^3.0.x"
- You might need to install the sqlite3 module manually from the master branch:
Run using the GUI:
node index.js
in/css-detector/website
- Open
http://localhost:8000
Run using the terminal:
casperjs index.js --url=<url here>
Note: the website must be running, it is used for storage