Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove lxml in favor of bs4 #2353

Closed
jsangmeister opened this issue Apr 4, 2024 · 0 comments · Fixed by #2354
Closed

Remove lxml in favor of bs4 #2353

jsangmeister opened this issue Apr 4, 2024 · 0 comments · Fixed by #2354
Assignees
Milestone

Comments

@jsangmeister
Copy link
Contributor

The breaking changes of lxml version 5.2 (seen in #2351) pointed me to the fact that we are currently using two tools (lxml and bs4) for the same purpose (extracting plain text from html content). We should conglomerate these and not use two tools without a good reason (which I don't see in this case). Since bs4 seems to be the more versatile, forgiving and easier-to-use parser, I would opt to remove lxml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant