Add `La Vanguardia` #637

addie9800 · 2024-10-15T18:47:21Z

No description provided.

# Conflicts: # src/fundus/publishers/es/__init__.py

MaxDall

Thanks a lot for adding 👍

MaxDall · 2024-10-17T17:49:47Z

src/fundus/publishers/es/__init__.py

+            NewsMap("https://www.lavanguardia.com/newsml/home.xml"),
+            RSSFeed("https://www.lavanguardia.com/rss/home.xml"),
+            RSSFeed("https://www.lavanguardia.com/rss/internacional.xml"),


There seem to be sitemaps as well https://www.lavanguardia.com/sitemap-noticias-202102.xml.gz as well as two other NewsMaps:
https://www.lavanguardia.com/sitemap-google-news.xml
https://www.lavanguardia.com/sitemap-news-agencias.xml

Ah perfect, I missed them

MaxDall · 2024-10-17T17:59:20Z

src/fundus/publishers/es/la_vanguardia.py

+
+        @attribute
+        def authors(self) -> List[str]:
+            return generic_author_parsing(self.precomputed.ld.bf_search("author"))


There are some encoding errors for the author field when parsing this article.

Turns out, it seems to be the case that, if there is this ZWSP character, it seems to be followed by information unrelated to the author, so it can safely be just removed

MaxDall · 2024-10-17T18:00:28Z

src/fundus/publishers/es/la_vanguardia.py

+        _summary_selector = XPath("//h2[@class='epigraph']")
+
+        @attribute
+        def body(self) -> Optional[ArticleBody]:


The selector seems to have trouble parsing this article

It seems to me, as if there is nothing we can do about it, since the content is loaded using a script. The HTML we get in Fundus seems to mostly be scripts

MaxDall · 2024-10-17T18:02:44Z

src/fundus/publishers/es/la_vanguardia.py

+        def topics(self) -> List[str]:
+            return generic_topic_parsing(self.precomputed.meta.get("Keywords"))


One could argue that the topics at the page's bottom are more descriptive. What do you think?

add La Vanguardia

cf3f75e

addie9800 changed the base branch from master to add-abc October 15, 2024 18:47

addie9800 added 2 commits October 16, 2024 22:40

Merge branch 'add-abc' into add-la-vanguardia

8bf55f8

# Conflicts: # src/fundus/publishers/es/__init__.py

fix annotations

a0a0f7f

MaxDall requested changes Oct 17, 2024

View reviewed changes

addie9800 added 2 commits October 22, 2024 17:30

add sitemaps

0621c82

apply suggestions

1ee5169

addie9800 requested a review from MaxDall October 22, 2024 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `La Vanguardia` #637

Add `La Vanguardia` #637

addie9800 commented Oct 15, 2024

MaxDall left a comment

MaxDall Oct 17, 2024

addie9800 Oct 22, 2024

MaxDall Oct 17, 2024

addie9800 Oct 22, 2024

MaxDall Oct 17, 2024

addie9800 Oct 22, 2024

MaxDall Oct 17, 2024

		def topics(self) -> List[str]:
		return generic_topic_parsing(self.precomputed.meta.get("Keywords"))

Add La Vanguardia #637

Are you sure you want to change the base?

Add La Vanguardia #637

Conversation

addie9800 commented Oct 15, 2024

MaxDall left a comment

Choose a reason for hiding this comment

MaxDall Oct 17, 2024

Choose a reason for hiding this comment

addie9800 Oct 22, 2024

Choose a reason for hiding this comment

MaxDall Oct 17, 2024

Choose a reason for hiding this comment

addie9800 Oct 22, 2024

Choose a reason for hiding this comment

MaxDall Oct 17, 2024

Choose a reason for hiding this comment

addie9800 Oct 22, 2024

Choose a reason for hiding this comment

MaxDall Oct 17, 2024

Choose a reason for hiding this comment

Add `La Vanguardia` #637

Add `La Vanguardia` #637