-
Notifications
You must be signed in to change notification settings - Fork 93
Using scrape in artoo.spiders
suntong001 edited this page Jan 20, 2015
·
4 revisions
This wiki is on how to use scrape
in artoo.spiders
, i.e., how to pass a configuration object for the scrape
method that will be automatically called on the retrieved data by artoo.spiders
.
The detailed implementation example can be found here, which is an example of using the same scraper
configuration object to scrap the first and the following pages.
For the case of typical two step scraping,
- A list of urls is scrapped from an index page.
- That url list is used by
artoo.spiders
to trigger series of HTTP requests to collect data from those urls (content pages).- On successful retrieving data from each url,
artoo.spiders
use thescrape
method with the configuration object to retrieve only the needed data from the content pages.
- On successful retrieving data from each url,
It will be a simple matter to define a scraper
configuration object, which will be different from the one that scraps for indexes, then just pass it to artoo.spiders
as shown above.