Search
Profile

Ask your question

Close

What Is Public Health Data?

Public health data refers to healthcare information about an entire population, with the aim of improving overall health outcomes. Unlike some other data categories, this data must be disseminated to the general population.

Where Does Public Health Data Come From?

Governments, private organizations, and academic institutions collect public health data from a variety of sources. These sources most often include anonymized hospital and healthcare data, disease registries, and insurance data. Other sources include surveys, medical studies, and law enforcement (particularly for addiction and mental illnesses).

What Types of Columns/Attributes Should I Expect When Working with This Data?

Public health data is varied and unique to nations and smaller geographical areas. However, the data generally covers disease or condition, comorbidities, severity, sources of disease, historical trends, and demographics.

What Is Public Health Data Used For?

As written above, this data is used to track and improve the health of a population. Healthcare workers and public officials are the primary users of this data. They track public health, identify trends and sources of disease outbreaks, and craft responses to it.

These responses include public education catered to the demographics suffering most from the outbreak: for instance, an immigrant community with poor English and a historic distrust of doctors, for example, requires different strategies than a campaign focused on native-born citizens with no particular mistrust of medical professionals.

Insurers also use this data to screen new clients or identify existing clients who have high risk factors for disease. The companies can offer higher premiums or incentivized plans for lowering risk factors.

How Should I Test the Quality of This Data?

The wide range of sources, down to individual administrators in individual clinics, increase the likelihood of errors in annotation. This is particularly true for the most at-risk populations as they are the least likely to correct mistakes in their paperwork (when complete paperwork exists).

Additionally, analyzing data on patients with comorbidities comes with its own challenges: which disease is primary? Should a government address a secondary disease before addressing a primary disease?

In essense, the data scientist must simply collect as much data as possible, keep it as clean and up-to-date as possible, and accept that this data category has a high likelihood of inaccuracy.

Interesting Case Studies and Blogs to Look Into

EHR Intelligence: 6 Use Cases for EHR Data Utilization in Public, Community Health
Health Data

Tangible Examples of Impact

Public Health said in an advisory Thursday that it had temporarily removed Alberta’s COVID-19 testing data from its national statistics. It said there were problems with the province’s testing figures that overestimated the percentage of tests that came back positive, although it did not elaborate.

The Globe and Mail Canada: ‘Discrepancies’ with Alberta testing data inflated national positivity rate

Relevant datasets

UNICEF Data

by UNICEF logo

UNICEF Data tracks and presents data on child safety and development worldwide. Their Child Statistics include information on topics such as child health, nutrition, education, and protection.

0 (0)   Reviews (0)

Google Dataset Search

by Google_logo

Google Dataset Search provides quality, continuously-updating data of all kinds for both researchers, data analysts, journalists, and the general public. They aim to enable the free and open discovery of all kinds of data and metadata in the world.

The platform also offers a Dataset Developer Page to help people add structured data to their datasets or to resolve any other problems.

0 (0)   Reviews (0)

Alation Data Platform

by Alation data catalog logo

The Alation Data Platform leverages a client organization’s own internal data to help them manage their workflows. Their data catalog, governance, and discovery services rely on IT, machine learning, and industry data. Industries that Alation focuses on include healthcare, finance, insurance, manufacturing, and retail.

0 (0)   Reviews (0)

Similar Data Providers

  • The Arabesque GroupThe Arabesque Group
    5 (1)
    Reviews ()
    Data sets (4)
    Established in 2013, the Arabesque Group is a leading global financial technology company that combines AI with environmental, social and governance (ESG) data to assess the performance and sustainability of corporations worldwide. In addition to their Asset Management consultation service, the groups offers Arabesque S-Ray GmbH and Arabesque AI Ltd. datasets.
  • Black Box Intelligence Consumer IntelligenceBlack Box Intelligence Consumer Intelligence
    5 (1)
    Reviews ()
    Data sets (0)
    Black Box Intelligence Consumer Intelligence is designed to provide detailed analysis on individual competitor sales and performance data.
  • Home by VendigiHome by Vendigi
    4.3 (3)
    Reviews (1)
    Data sets (1)
    Home by Vendigi provides audience data for all things home buyers, remodelers, and sellers. Their data comes from first-party sources like top multiple listing systems (MLSs) major brokers like RE/MAX, Coldwell Banker, Century 21, and Sotheby's. Users of Vendigi's Home data range from home and garden retailers to insurance institutions to telecom companies.