Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation breaking due to lxml>=5.x #630

Open
Abdullah0297445 opened this issue Apr 1, 2024 · 3 comments
Open

Installation breaking due to lxml>=5.x #630

Abdullah0297445 opened this issue Apr 1, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Abdullah0297445
Copy link

Describe the bug
Trying to install newskpaper4k via pip. And getting the error:

ImportError: lxml.html.clean module is now a separate project lxml_html_clean.

To Reproduce
Steps to reproduce the behavior, please post any code you used and the website you tried to parse/process:

  1. pip install newspaper4k
  2. See the following traceback:
[stderr] from newspaper import Article as NPArticle
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/__init__.py", line 17, in <module>
[stderr] from .api import (
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/api.py", line 8, in <module>
[stderr] from .article import Article
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/article.py", line 21, in <module>
[stderr] from . import network
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/network.py", line 15, in <module>
[stderr] from newspaper import parsers
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/parsers.py", line 18, in <module>
[stderr] import lxml.html.clean
[stderr] File "/usr/local/lib/python3.11/site-packages/lxml/html/clean.py", line 18, in <module>
[stderr] raise ImportError(
[stderr] ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
[stderr] Install lxml[html_clean] or lxml_html_clean directly.

Expected behavior
Installation via pip should've worked.

System information

  • OS: python3.11-slim in Docker
  • Python version [3.11]
  • newspaper4k [0.9.1]
  • lxml [5.1.0]

Workaround
Anyone who's having this issue, for now just add lxml[html_clean]==5.2.0 in your requirements.txt file.

Quickfix
To quickly fix the issue in this repo, for now we can edit this line in pyproject,toml file and pin the version of lxml below 5.x:
https://github.com/AndyTheFactory/newspaper4k/blob/b5b20976bd320f89ffa25b8d4a7a94d190ee549a/pyproject.toml#L34C3-L34C15

@Abdullah0297445 Abdullah0297445 added the bug Something isn't working label Apr 1, 2024
@RomanAverin
Copy link

Same issue

@carter-0
Copy link

I'm also experiencing this on macOS, Python 3.9. Patching the pyproject.toml gets it working for now.

@Didou09
Copy link

Didou09 commented Jul 28, 2024

same issue too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants