Add `Taipei Times` #639

addie9800 · 2024-10-15T20:37:52Z

No description provided.

# Conflicts: # src/fundus/publishers/__init__.py

MaxDall

Thanks for adding 👍 Our first publisher from Taiwan 🚀

MaxDall · 2024-10-17T18:16:00Z

src/fundus/publishers/tw/TaipeiTimes.py

+                return []
+            else:
+                selection = re.sub(r"(?i)(^by\s*|/.*)", "", author_selection[0])
+            return [author.strip() for author in selection.split(" and ")]


Using generic_author_parsing here would be beneficial because it also handles all the normalization. It is best practice to utilize the generic functions, as they normalize the output.

Edit: Also I think Staff reporter can be safely removed from the output.

MaxDall · 2024-10-17T18:17:20Z

src/fundus/publishers/tw/TaipeiTimes.py

+        _paragraph_selector = XPath("//div[@class='archives']/p")
+        _summary_selector = XPath("//div[@class='archives']/h2")
+        _author_selector = XPath("//div[@class='archives']//div[@class='name']/text()")


By Eddy Chang, Taipei Times／台北時報張聖恩 at the end of this article is extracted as well.

addie9800 added 3 commits October 15, 2024 22:37

add Taipei Times

bce90d3

Merge branch 'master' into add-taipei-times

b60cc6d

# Conflicts: # src/fundus/publishers/__init__.py

fix annotations

be0850e

MaxDall requested changes Oct 17, 2024

View reviewed changes

filter author byline

aa86d69

addie9800 requested a review from MaxDall October 22, 2024 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Taipei Times` #639

Add `Taipei Times` #639

addie9800 commented Oct 15, 2024

MaxDall left a comment

MaxDall Oct 17, 2024

MaxDall Oct 17, 2024

Add Taipei Times #639

Are you sure you want to change the base?

Add Taipei Times #639

Conversation

addie9800 commented Oct 15, 2024

MaxDall left a comment

Choose a reason for hiding this comment

MaxDall Oct 17, 2024

Choose a reason for hiding this comment

MaxDall Oct 17, 2024

Choose a reason for hiding this comment

Add `Taipei Times` #639

Add `Taipei Times` #639