Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudfare Issue with CRHOY.com #647

Open
gabrielgq opened this issue Aug 8, 2024 · 2 comments
Open

Cloudfare Issue with CRHOY.com #647

gabrielgq opened this issue Aug 8, 2024 · 2 comments

Comments

@gabrielgq
Copy link

CRHOY:

This is a Cloudflare issue so I don't know if this is the right place to post but if anyone can help I'd be vary thankful.

crhoy.com

Some sample urls that I have tried

crhoy.com/economia/estas-son-las-razones-por-las-que-sugef-recomienda-destituir-a-presidente-del-popular
crhoy.com/economia/empresarios-piden-avanzar-en-proyectos-para-mejorar-la-competitividad

The exact code i used to test this articles/website

import newspaper

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = newspaper.configuration.Configuration()
config.browser_user_agent = user_agent


article = newspaper.article('https://www.crhoy.com/economia/estas-son-las-razones-por-las-que-sugef-recomienda-destituir-a-presidente-del-popular/', config=config)
print(article.text)

Site is protected by Cloudflare
I tried more complex methods with readability and selenium, even used 12ft.io and http://txtify.it

@femdias
Copy link

femdias commented Aug 27, 2024

Hey Gabriel! I was having the same problem also, then I found out that the 0.9.3 updated include the addition of cloudscraper (see changelog). You can read the documentation of cloudscraper library here, it basically modifies requests to bypass Cloudflare. For using it in newspaper4k, you just have to install cloudscraper (pip install cloudscraper), as the code automatically uses it if installed.

Hope it helps!

@gabrielgq
Copy link
Author

Thanks, I added cloudscraper but sadly it still doesn't work for the site I mentioned. Did the sample URLs work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants