Scraping dies with "Input buffer contains unsupported image format" if logo returns 301 #2028

benoit74 · 2024-05-22T13:20:04Z

Zimfarm recipe: https://farm.openzim.org/recipes/encyclopediaofmath.org_en_all

Zim-request details: openzim/zim-requests#964 (comment)

Log:

[error] [2024-05-22T12:56:44.462Z] Failed to run mwoffliner after [18s]: {
	"stack": "Error: Input buffer contains unsupported image format",
	"message": "Input buffer contains unsupported image format"
}
[error] [2024-05-22T12:56:44.462Z] 

**********

Input buffer contains unsupported image format

**********

It is pretty hard to tell which image has been grabbed and failed to be read, making the fix even more delicate.

The text was updated successfully, but these errors were encountered:

audiodude · 2024-05-23T04:50:33Z

From googling, it looks like that's an error message from the sharp library.

Probably happening here?

mwoffliner/src/Downloader.ts

Line 488 in 9cc613f

    
           .buffer(await sharp(resp.data).toColorspace('srgb').toBuffer(), imageminOptions.get('webp').get(resp.headers['content-type']))

Might just be a bad URL that's getting erroneously read as image data.

audiodude · 2024-05-23T06:00:50Z

The proximate cause of the error is the fact that the URL:

https://encyclopediaofmath.org/common/spr_logo.gif

Does a 301 to https://encyclopediaofmath.org/wiki/Main_Page

Presumably, this "logo" link is somewhere in the initial metadata that mwoffliner gathers about the wiki. So the bug is that mwoffliner is hardcoded to download this as image data and doesn't consider 301 to be an error status. It then crashes early on in the scraping process.

audiodude · 2024-05-23T06:01:34Z

If the folks putting in the request would like to fix their problem without waiting for mwoffliner, I would suggest putting an image (even a 1x1 PNG) at that URL.

benoit74 · 2024-05-23T12:16:52Z

Thank you!

Would specifying a custom favicon with --customZimFavicon allow to bypass the code trying to download the "logo"?

audiodude · 2024-05-23T14:51:18Z

Yes, I believe that would work

kelson42 · 2024-06-29T06:28:11Z

Here the solution is IMHO to test the format of the image at the same time (early) like other ZIM metadata.

benoit74 added bug question labels May 22, 2024

audiodude changed the title ~~Input buffer contains unsupported image format~~ Scraping dies with "Input buffer contains unsupported image format" if logo returns 301 May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraping dies with "Input buffer contains unsupported image format" if logo returns 301 #2028

Scraping dies with "Input buffer contains unsupported image format" if logo returns 301 #2028

benoit74 commented May 22, 2024

audiodude commented May 23, 2024

audiodude commented May 23, 2024

audiodude commented May 23, 2024

benoit74 commented May 23, 2024

audiodude commented May 23, 2024

kelson42 commented Jun 29, 2024

Scraping dies with "Input buffer contains unsupported image format" if logo returns 301 #2028

Scraping dies with "Input buffer contains unsupported image format" if logo returns 301 #2028

Comments

benoit74 commented May 22, 2024

audiodude commented May 23, 2024

audiodude commented May 23, 2024

audiodude commented May 23, 2024

benoit74 commented May 23, 2024

audiodude commented May 23, 2024

kelson42 commented Jun 29, 2024