You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am still not able to completely understand how the sitemap spider in working. The spider keeps crawling down the sitemap.xml until it receives a valid page response. In between the first request to final page - scrapy redirects from HTTP to HTTPS protocol for once in between, however, I am not able to figure out where it does so. Ideally, there should be a point where the response.status says 301 redirections, but the process_response in the middleware that I wrote skips (basically it is happening somewhere inside such that I can't log it from a middleware) this part in the middle and only outputs the final responses - 200. Thus, I am not able to log other 40x responses using the process_response() function. What if these responses are also being handled in the backend? (Which seems to be the only case) How to track these response statuses and log the URLs returning these responses? - There seems to be an answer, but I am not sure how to rigorously test it.
So, how do I test these? I mean, I cannot generate a 402 response on my own (Or maybe IDK how to do it) to test the custom response handlers for these responses.
No description provided.
The text was updated successfully, but these errors were encountered: