Keep track of links that are unvisited due to failed response #8

innovationchef · 2018-05-25T22:25:05Z

No description provided.

innovationchef · 2018-05-31T19:12:49Z

I am still not able to completely understand how the sitemap spider in working. The spider keeps crawling down the sitemap.xml until it receives a valid page response. In between the first request to final page - scrapy redirects from HTTP to HTTPS protocol for once in between, however, I am not able to figure out where it does so. Ideally, there should be a point where the response.status says 301 redirections, but the process_response in the middleware that I wrote skips (basically it is happening somewhere inside such that I can't log it from a middleware) this part in the middle and only outputs the final responses - 200. Thus, I am not able to log other 40x responses using the process_response() function. What if these responses are also being handled in the backend? (Which seems to be the only case) How to track these response statuses and log the URLs returning these responses? - There seems to be an answer, but I am not sure how to rigorously test it.

So, how do I test these? I mean, I cannot generate a 402 response on my own (Or maybe IDK how to do it) to test the custom response handlers for these responses.

justinccdev · 2018-06-03T17:43:20Z

I'm okay with simply dropping failed response and not revisiting. Perhaps if a whole website failed this would be an issue.

innovationchef mentioned this issue May 27, 2018

Handle Exceptions #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keep track of links that are unvisited due to failed response #8

Keep track of links that are unvisited due to failed response #8

innovationchef commented May 25, 2018

innovationchef commented May 31, 2018

justinccdev commented Jun 3, 2018

Keep track of links that are unvisited due to failed response #8

Keep track of links that are unvisited due to failed response #8

Comments

innovationchef commented May 25, 2018

innovationchef commented May 31, 2018

justinccdev commented Jun 3, 2018