Ask your question


What Is News Data?

News data consists of data from news sources or about the news itself. In most cases, this refers to data from reputable news agencies, but not always, depending on the industry or use case for the data.

Where Does News Data Come From?

As noted above, the main sources of this data are newspapers and websites, delivered via API. However, other options include open-source news datasets, trade journals, and research reports.

What Types of Columns/Attributes Should I Expect When Working with This Data?

News datasets list the following information: title, author, publisher, publication date, timestamp, and, usually, the full text. Many datasets also collect information on the images accompanying the news articles and the number of comments or likes and shares per social media site.

Many datasets also use NLP ML programs that have the capability to tag articles as true or fake.

What Is This Data Used For?

The uses of this data are as varied as the people who collect it. Businesses, politicians, or academics may monitor trends or conduct market or industry research. They may also use the data to supplement their PR crisis management strategies.

Researchers and educators may also use the data to identify fake news via machine learning programs. Similarly, data scientists may use it to create and monitor fake news detection programs.

How Should I Test the Quality of News Data?

Depending on your purpose in collecting this data, the quality of the news itself might not be an important factor. You may, for example, only want to track the spread of a certain story. Alternately, you may want to develop a program that can identify or even create fake news. In this case, it is very important that you be able to differentiate between accurate articles and false ones.

Luckily, advances in AI are already able to help with this. There are NLP programs that can detect fake news using tools like the Support Vector Classifier language.

Interesting Case Studies and Blogs to Look Into

Popular News articles – A Free Public Dataset
Kaggle: News Headlines Dataset For Sarcasm Detection

Tangible Examples of Impact

The [Fake News Detection As Natural Language Inference] project takes sentences into three parts. The first sentence is the title of an article already known to be fake news. The second sentence is the title of another article, and the task is to decide whether it agrees with the original fake news, disagrees with it, or is unrelated. The tasks are treated as natural language inference (NLI). As illustrated above, all the strong models, such as BERT, were also incorporated during the training phase. These results are assembled and retrained with noisy labels.

Analytics India Magazine: Top ML Projects To Fight Fake News Fatigue During COVID-19

Connected Datasets

High Intensity Drugs Trafficking Area and High Intensity Financial Crimes Area


ZIGRAM’s dataset – ‘High Intensity Drugs Trafficking Area and High Intensity Financial Crimes Area’ provides News Data and that can be used in

0 (0)   Reviews (0)

Sanctions Connect – Largest Repository of Sanctions Worldwide


ZIGRAM’s dataset – ‘Sanctions Connect – Largest Repository of Sanctions Worldwide’ provides News Data, Economic Data and that can be used in and Supplier Risk

0 (0)   Reviews (0)

Picasso Podcast Data: Podcast Metadata for Apple iTunes (1.2M+ global podcasts)


Picasso’s dataset – ‘Picasso Podcast Data: Podcast Metadata for Apple iTunes (1.2M+ global podcasts)’ provides News Data and Social Data that can be used in

0 (0)   Reviews (0)