-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Taipei Times
#639
base: master
Are you sure you want to change the base?
Add Taipei Times
#639
Conversation
# Conflicts: # src/fundus/publishers/__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding 👍 Our first publisher from Taiwan 🚀
return [] | ||
else: | ||
selection = re.sub(r"(?i)(^by\s*|/.*)", "", author_selection[0]) | ||
return [author.strip() for author in selection.split(" and ")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using generic_author_parsing
here would be beneficial because it also handles all the normalization. It is best practice to utilize the generic functions, as they normalize the output.
Edit: Also I think Staff reporter
can be safely removed from the output.
_paragraph_selector = XPath("//div[@class='archives']/p") | ||
_summary_selector = XPath("//div[@class='archives']/h2") | ||
_author_selector = XPath("//div[@class='archives']//div[@class='name']/text()") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By Eddy Chang, Taipei Times/台北時報張聖恩
at the end of this article is extracted as well.
No description provided.