You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dc.title
og:title
headline
articletitle
article-title
parsely-title
title
Meta tags for description:
description
og:description
Meta tags for body:
articleBody
articleText
FYI
It would be good if you can fix/improve/adapt the code so that it can extract full information from these websites since these websites are the most popular websites in the world.
By "full information" i mean title, publication date and article body
Issue by aleksandar-devedzic
Tue May 9 09:34:18 2023
Originally opened as codelucas/newspaper#969
I have extracted some meta tags, you can try to identify title, text, description and date by replacing provided tags in :
meta[property='{}']
meta[name='{}']
meta[itemprop='{}']
Meta tags for publication and modification date:
published_date
published_time
cXenseParse:publishtime
pubdate
publish_date
PublishDate
dcterms.created
rnews:datePublished
article:published_time
prism.publicationDate
displaydate
OriginalPublicationDate
og:published_time
datePublished
article_date_original
article.published
published_time_telegram
sailthru.date
datePublished
date
Date
original-publish-date
DC.date.issued
dc.date
DC.Date
parsely-pub-date
publishtime
publication_date
uploadDate
coverageEndTime
publishdate
publish-date
publishedAtDate
dcterms.date
publishedDate
creationDateTime
pub_date
updated_time
og:updated_time
datemodified
last-modified
Last-Modified
DC.date.modified
article:modified_time
modified_time
modifiedDateTime
dc.dcterms.modified
lastmod
Meta tags for title:
dc.title
og:title
headline
articletitle
article-title
parsely-title
title
Meta tags for description:
description
og:description
Meta tags for body:
articleBody
articleText
FYI
It would be good if you can fix/improve/adapt the code so that it can extract full information from these websites since these websites are the most popular websites in the world.
By "full information" i mean title, publication date and article body
CNN - https://edition.cnn.com/
BBC News - https://www.bbc.com/news
Reuters - https://www.reuters.com/
The New York Times - https://www.nytimes.com/
The Guardian - https://www.theguardian.com/international
Al Jazeera - https://www.aljazeera.com/
Associated Press (AP) News - https://apnews.com/
NBC News - https://www.nbcnews.com/
Fox News - https://www.foxnews.com/
USA Today - https://www.usatoday.com/
ABC News - https://abcnews.go.com/
CBS News - https://www.cbsnews.com/
The Washington Post - https://www.washingtonpost.com/
Time - https://time.com/
Forbes - https://www.forbes.com/
Bloomberg - https://www.bloomberg.com/
The Wall Street Journal - https://www.wsj.com/
The Huffington Post - https://www.huffpost.com/
The Independent - https://www.independent.co.uk/
The Sydney Morning Herald - https://www.smh.com.au/
The Economist - https://www.economist.com/
The Times of India - https://timesofindia.indiatimes.com/
The Daily Mail - https://www.dailymail.co.uk/home/index.html
The Telegraph - https://www.telegraph.co.uk/
The Sun - https://www.thesun.co.uk/
The Mirror - https://www.mirror.co.uk/
The Daily Beast - https://www.thedailybeast.com/
The Atlantic - https://www.theatlantic.com/
National Geographic - https://www.nationalgeographic.com/
Science Daily - https://www.sciencedaily.com/
The Verge - https://www.theverge.com/
Wired - https://www.wired.com/
TechCrunch - https://techcrunch.com/
Engadget - https://www.engadget.com/
Mashable - https://mashable.com/
Forbes India - https://www.forbesindia.com/
Hindustan Times - https://www.hindustantimes.com/
CNN Business - https://www.cnn.com/business
Financial Times - https://www.ft.com/
CNBC - https://www.cnbc.com/
Business Insider - https://www.businessinsider.com/
Politico - https://www.politico.eu/
The Hill - https://thehill.com/
The Washington Times - https://www.washingtontimes.com/
The Boston Globe - https://www.bostonglobe.com/
The LA Times - https://www.latimes.com/
The Chicago Tribune - https://www.chicagotribune.com/
The Sydney Morning Herald - https://www.smh.com.au/
The Globe and Mail - https://www.theglobeandmail.com/
The Toronto Star - https://www.thestar.com/
The text was updated successfully, but these errors were encountered: