BBC-dataset-samples

A sample dataset of 1001 BBC news articles

A BBC dataset sample of over 1000 records. Dataset was extracted using the Bright Data API.

Some of the data points that are included in the BBC dataset:

id: Unique identifier for the news article
url: The web address where the article is published
author: The name of the journalist or contributor who wrote the article
headline: The main title of the article
topics: Array of topics related to the article
publication_date: The date when the article was published
content: The full text of the article
videos: Any embedded videos related to the article
images: Any images included in the article
related_articles: Links to other articles that are relevant to the topic

And a lot more.

This is a sample subset which is derived from the "BBC news" dataset which includes more than 75K records.

Available dataset file formats: JSON, NDJSON, JSON Lines, CSV, or Parquet. Optionally, files can be compressed to .gz.

Dataset delivery type options: Email, API download, Webhook, Amazon S3, Google Cloud storage, Google Cloud PubSub, Microsoft Azure, Snowflake, SFTP.

Update frequency: Once, Daily, Weekly, Monthly, Quarterly, or Custom basis.

Data enrichment available as an addition to the data points extracted: Based on request.

Get the full BBC dataset.

What are the BBC datasets use cases?

1. Media Trend Analysis

Track media trends and analyze the evolution of news coverage over time using BBC datasets, with a focus on topic frequency and framing.

2. Information Integrity

To develop algorithms using BBC datasets that detect fake news and assess the integrity of information.

3. Economic Forecasting

Integrate BBC datasets into sophisticated algorithmic trading models and economic forecasting tools. By feeding real-time news data into trading algorithms, the goal is to enable these systems to respond swiftly and effectively to market movements triggered by breaking news events, economic reports, or political developments.

Free access to web scraping tools and datasets for academic researchers and NGOs

The Bright Initiative offers access to Bright Data's Web Scraper APIs and ready-to-use datasets to leading academic faculties and researchers, NGOs and NPOs promoting various environmental and social causes. You can submit an application here.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
BBC news.csv		BBC news.csv
BBC-datasets.png		BBC-datasets.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BBC-dataset-samples

A sample dataset of 1001 BBC news articles

Some of the data points that are included in the BBC dataset:

What are the BBC datasets use cases?

1. Media Trend Analysis

2. Information Integrity

3. Economic Forecasting

Free access to web scraping tools and datasets for academic researchers and NGOs

About

luminati-io/BBC-dataset-samples

Folders and files

Latest commit

History

Repository files navigation

BBC-dataset-samples

A sample dataset of 1001 BBC news articles

Some of the data points that are included in the BBC dataset:

What are the BBC datasets use cases?

1. Media Trend Analysis

2. Information Integrity

3. Economic Forecasting

Free access to web scraping tools and datasets for academic researchers and NGOs

About

Topics

Resources

Stars

Watchers

Forks