Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When i Use MinerU, Error: pymupdf.mupdf.FzErrorSyntax: code=8: Failed to decode JPX image. #4066

Closed
CocoaML opened this issue Nov 20, 2024 · 3 comments
Labels
not a bug not a bug / user error / unable to reproduce

Comments

@CocoaML
Copy link

CocoaML commented Nov 20, 2024

Description of the bug

MinerU Error

The Pdf is this:
part_4.pdf

How can I skip this error page and continue proceed to another page? Thanks.

How to reproduce the bug

log:

Traceback (most recent call last):
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/runpy.py", line 198, in _run_module_as_main
return _run_code(code, main_globals, None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/runpy.py", line 88, in _run_code
exec(code, run_globals)
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 71, in
cli.main()
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 501, in main
run()
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 351, in run_file
runpy.run_path(target, run_name="main")
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 310, in run_path
return _run_module_code(code, init_globals, run_name, pkg_name=pkg_name, script_name=fname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 127, in _run_module_code
_run_code(code, mod_globals, init_globals, mod_name, mod_spec, pkg_name, script_name)
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.12.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 118, in _run_code
exec(code, run_globals)
File "/usr/mineru_test/mineru_project/mineru_project_python/minerU/magic_pdf_parse_main.py", line 151, in
pdf_parse_main(pdf_path, output_dir="./out")
File "/usr/mineru_test/mineru_project/mineru_project_python/minerU/magic_pdf_parse_main.py", line 108, in pdf_parse_main
pipe.pipe_classify()
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/pipe/UNIPipe.py", line 25, in pipe_classify
self.pdf_type = AbsPipe.classify(self.pdf_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/pipe/AbsPipe.py", line 63, in classify
pdf_meta = pdf_meta_scan(pdf_bytes)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 331, in pdf_meta_scan
image_info_per_page, junk_img_bojids = get_image_info(doc, page_width_pts, page_height_pts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 119, in get_image_info
page_result = process_image(page, junk_img_bojids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/magic_pdf/filter/pdf_meta_scan.py", line 39, in process_image
recs = page.get_image_rects(img, transform=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/pymupdf/utils.py", line 879, in get_image_rects
pix = pymupdf.Pixmap(page.parent, xref) # make pixmap of the image to compute MD5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/pymupdf/init.py", line 10110, in init
img = mupdf.pdf_load_image(pdf, ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/pymupdf/mupdf.py", line 50901, in pdf_load_image
return _mupdf.pdf_load_image(doc, obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pymupdf.mupdf.FzErrorSyntax: code=8: Failed to decode JPX image

How can I skip this error page and continue proceed to another page? Looking forward to your reply,Thanks.

PyMuPDF version

1.24.14

Operating system

Linux

Python version

3.11

@JorjMcKie JorjMcKie added the not a bug not a bug / user error / unable to reproduce label Nov 20, 2024
@JorjMcKie
Copy link
Collaborator

JorjMcKie commented Nov 20, 2024

The image error is only reported by this message, processing continues: your script is not ended by an exception.

@CocoaML
Copy link
Author

CocoaML commented Nov 20, 2024

The image error is only reported by this message, processing continues: your script is not ended by an exception.

Thank you for your feedback.

The process was stopped because of an error pdf page. An exception was reported. The error log is:

File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/pymupdf/utils.py", line 879, in get_image_rects
pix = pymupdf.Pixmap(page.parent, xref) # make pixmap of the image to compute MD5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/pymupdf/init.py", line 10110, in init
img = mupdf.pdf_load_image(pdf, ref)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/mineru_test/python/miniforge3/envs/mineru_project_py311/lib/python3.11/site-packages/pymupdf/mupdf.py", line 50901, in pdf_load_image
return _mupdf.pdf_load_image(doc, obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pymupdf.mupdf.FzErrorSyntax: code=8: Failed to decode JPX image

MinerU

Looking forward to further communication with you.

@JorjMcKie
Copy link
Collaborator

Moving this to "Discussions", as there is no PyMuPDF/MuPDF problem.

@pymupdf pymupdf locked and limited conversation to collaborators Nov 20, 2024
@JorjMcKie JorjMcKie converted this issue into discussion #4067 Nov 20, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
not a bug not a bug / user error / unable to reproduce
Projects
None yet
Development

No branches or pull requests

2 participants