You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Trying to install newskpaper4k via pip. And getting the error:
ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
To Reproduce
Steps to reproduce the behavior, please post any code you used and the website you tried to parse/process:
pip install newspaper4k
See the following traceback:
[stderr] from newspaper import Article as NPArticle
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/__init__.py", line 17, in <module>
[stderr] from .api import (
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/api.py", line 8, in <module>
[stderr] from .article import Article
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/article.py", line 21, in <module>
[stderr] from . import network
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/network.py", line 15, in <module>
[stderr] from newspaper import parsers
[stderr] File "/usr/local/lib/python3.11/site-packages/newspaper/parsers.py", line 18, in <module>
[stderr] import lxml.html.clean
[stderr] File "/usr/local/lib/python3.11/site-packages/lxml/html/clean.py", line 18, in <module>
[stderr] raise ImportError(
[stderr] ImportError: lxml.html.clean module is now a separate project lxml_html_clean.
[stderr] Install lxml[html_clean] or lxml_html_clean directly.
Expected behavior
Installation via pip should've worked.
System information
OS: python3.11-slim in Docker
Python version [3.11]
newspaper4k [0.9.1]
lxml [5.1.0]
Workaround
Anyone who's having this issue, for now just add lxml[html_clean]==5.2.0 in your requirements.txt file.
Describe the bug
Trying to install newskpaper4k via pip. And getting the error:
To Reproduce
Steps to reproduce the behavior, please post any code you used and the website you tried to parse/process:
Expected behavior
Installation via pip should've worked.
System information
Workaround
Anyone who's having this issue, for now just add lxml[html_clean]==5.2.0 in your requirements.txt file.
Quickfix
To quickly fix the issue in this repo, for now we can edit this line in pyproject,toml file and pin the version of lxml below 5.x:
https://github.com/AndyTheFactory/newspaper4k/blob/b5b20976bd320f89ffa25b8d4a7a94d190ee549a/pyproject.toml#L34C3-L34C15
The text was updated successfully, but these errors were encountered: