xuexi

Requirements

The main requirement is to scrape the contents of all articles of a certain column (in fact, from its sub columns) and save them as human-readable documents.
Possible future requirements are to scrape other similar/dissimilar columns, update the scraped contents (as new articles are posted every day), convert the scraped contents to other formats, etc.

Scrape from the column index the addresses of all sub column indexes
Scrape from the sub column indexes the addresses of all articles (article lists may span multiple pages)
Scrape article contents
Save article metadata (url, title, time, source)
Save article contents as documents

All pages are of the same url form: ${base-url}/${page-id}/${template-id}.html
- ${base-url} = https://www.xuexi.cn/

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
samples		samples
xuexi-crawler-js		xuexi-crawler-js
README.md		README.md