You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The exact code i used to test this articles/website
importnewspaperuser_agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'config=newspaper.configuration.Configuration()
config.browser_user_agent=user_agentarticle=newspaper.article('https://www.crhoy.com/economia/estas-son-las-razones-por-las-que-sugef-recomienda-destituir-a-presidente-del-popular/', config=config)
print(article.text)
Site is protected by Cloudflare
I tried more complex methods with readability and selenium, even used 12ft.io and http://txtify.it
The text was updated successfully, but these errors were encountered:
Hey Gabriel! I was having the same problem also, then I found out that the 0.9.3 updated include the addition of cloudscraper (see changelog). You can read the documentation of cloudscraper library here, it basically modifies requests to bypass Cloudflare. For using it in newspaper4k, you just have to install cloudscraper (pip install cloudscraper), as the code automatically uses it if installed.
CRHOY:
This is a Cloudflare issue so I don't know if this is the right place to post but if anyone can help I'd be vary thankful.
Some sample urls that I have tried
The exact code i used to test this articles/website
Site is protected by Cloudflare
I tried more complex methods with readability and selenium, even used 12ft.io and http://txtify.it
The text was updated successfully, but these errors were encountered: