-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scraping dies with "Input buffer contains unsupported image format" if logo returns 301 #2028
Comments
From googling, it looks like that's an error message from the Probably happening here? Line 488 in 9cc613f
Might just be a bad URL that's getting erroneously read as image data. |
The proximate cause of the error is the fact that the URL: https://encyclopediaofmath.org/common/spr_logo.gif Does a 301 to https://encyclopediaofmath.org/wiki/Main_Page Presumably, this "logo" link is somewhere in the initial metadata that mwoffliner gathers about the wiki. So the bug is that mwoffliner is hardcoded to download this as image data and doesn't consider 301 to be an error status. It then crashes early on in the scraping process. |
If the folks putting in the request would like to fix their problem without waiting for mwoffliner, I would suggest putting an image (even a 1x1 PNG) at that URL. |
Thank you! Would specifying a custom favicon with |
Yes, I believe that would work |
Here the solution is IMHO to test the format of the image at the same time (early) like other ZIM metadata. |
Zimfarm recipe: https://farm.openzim.org/recipes/encyclopediaofmath.org_en_all
Zim-request details: openzim/zim-requests#964 (comment)
Log:
It is pretty hard to tell which image has been grabbed and failed to be read, making the fix even more delicate.
The text was updated successfully, but these errors were encountered: