Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[A World of Propensities by Karl Popper (1997).pdf](https://github.com/user-attachments/files/17758342/A.World.of.Propensities.by.Karl.Popper.1997.pdf) #4060

Closed
brucenielson opened this issue Nov 17, 2024 · 5 comments

Comments

@brucenielson
Copy link

brucenielson commented Nov 17, 2024

I got the 'repeating word' error to replicate in Google Colab. The previous bug report got closed due to not being able to replicate. But you should now be able to easily replicate it. See:

https://colab.research.google.com/drive/1d3BCUI5PyV928PcJwmnx_RkvWGHGJGC9?usp=sharing

Just to be sure, here is the document again that I'm using.

Drop that into the Colab and run the simple code and you can see from the results already saved that it creates the repeating "It" word in the section I previously reported. We should now be able to replicate this bug in Colab.

Originally posted by @brucenielson in #4042 (comment)

@brucenielson
Copy link
Author

brucenielson commented Nov 17, 2024

Ladies and Gentlemen,

I shall begin with some personal memories and a personal
confession of faith, and only then turn to the topic of my
lecture.

It
was 54 years ago, in Prague in August 1934, that I first
...
I came to Prague with the corrected page proofs of my book,
It
Logik der Forschung. was published three months later in
Vienna, and in English 25 years later as The Logic of Scientific
Discovery. ...
Tarski and Godel arrived, independently at almost the same
It
time. was first published by Tarski in 1930, whereupon
It
Godel, of course, accepted Tarski's priority. is a theory of

@brucenielson
Copy link
Author

I created this ticket out of the previous one so I thought it would put it in the right queue. But it looks like maybe it didn't. This is already known to be a problem with PyMUPDF4LLM, not PyMu.

@JorjMcKie
Copy link
Collaborator

I have also worked on this in the meantime. Really a weird phenomenon.
You are right: it is a problem happening in PyMuPDF4LLM - there is no issue in PyMuPDF - my transfer of the problem to here was wrong.

So I am going to close the bug here again, and open a new issue on the4llm repo.

@brucenielson
Copy link
Author

Give me the link to that issue and I'll track it there.

@brucenielson
Copy link
Author

Can we please delete out the pdf from the issue once you have it downloaded to work with?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants