-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get the page number of each figure? #75
Comments
please check out this example snippet in #63
|
Thanks for your prompt reply. However, it doesn't work for my case. For example, there is a figure on Page 8 in my pdf file. When I ran the code below, it can crop the figure for me. For this code, I have to indicate the page of each figure detected from the file.
But when I ran this code below, it returned the error: "figure_box = figures[0].boxes[0] IndexError: list index out of range"
Not sure what's wrong there? |
Do you mind emailing the PDF file? |
Thanks @LiyingCheng95 this is definitely a bug; I'm looking into patching it! First, it seems like the figure is actually being detected correctly. For example:
So I looked into where the bug is coming from. It seems like bug is coming from this cross-layer indexing operation is not finding a match:
This is super weird because the boxes definitely overlap
So I checked and it looks like there's a bug in my
I'll work on fixing this. In the meantime, you should be able to grab all the figures using |
You could use the layout parser directly to parse figures page by page. |
is this problems be solved? I think I meet same problems here |
I want to crop all the figures/images/tables in one pdf. Can get the page number of each figure in doc.figures[x]?
The text was updated successfully, but these errors were encountered: