Ask your question


What Is Public Health Data?

Public health data refers to healthcare information about an entire population, with the aim of improving overall health outcomes. Unlike some other data categories, this data must be disseminated to the general population.

Where Does Public Health Data Come From?

Governments, private organizations, and academic institutions collect public health data from a variety of sources. These sources most often include anonymized hospital and healthcare data, disease registries, and insurance data. Other sources include surveys, medical studies, and law enforcement (particularly for addiction and mental illnesses).

What Types of Columns/Attributes Should I Expect When Working with This Data?

Public health data is varied and unique to nations and smaller geographical areas. However, the data generally covers disease or condition, comorbidities, severity, sources of disease, historical trends, and demographics.

What Is Public Health Data Used For?

As written above, this data is used to track and improve the health of a population. Healthcare workers and public officials are the primary users of this data. They track public health, identify trends and sources of disease outbreaks, and craft responses to it.

These responses include public education catered to the demographics suffering most from the outbreak: for instance, an immigrant community with poor English and a historic distrust of doctors, for example, requires different strategies than a campaign focused on native-born citizens with no particular mistrust of medical professionals.

Insurers also use this data to screen new clients or identify existing clients who have high risk factors for disease. The companies can offer higher premiums or incentivized plans for lowering risk factors.

How Should I Test the Quality of This Data?

The wide range of sources, down to individual administrators in individual clinics, increase the likelihood of errors in annotation. This is particularly true for the most at-risk populations as they are the least likely to correct mistakes in their paperwork (when complete paperwork exists).

Additionally, analyzing data on patients with comorbidities comes with its own challenges: which disease is primary? Should a government address a secondary disease before addressing a primary disease?

In essense, the data scientist must simply collect as much data as possible, keep it as clean and up-to-date as possible, and accept that this data category has a high likelihood of inaccuracy.

Interesting Case Studies and Blogs to Look Into

EHR Intelligence: 6 Use Cases for EHR Data Utilization in Public, Community Health
Health Data

Tangible Examples of Impact

Public Health said in an advisory Thursday that it had temporarily removed Alberta’s COVID-19 testing data from its national statistics. It said there were problems with the province’s testing figures that overestimated the percentage of tests that came back positive, although it did not elaborate.

The Globe and Mail Canada: ‘Discrepancies’ with Alberta testing data inflated national positivity rate

Relevant datasets

Airports Council International Health Measures Portal


Airports Council International Health Measures Portal is a mobile app and API that informs health measures in airports.

0 (0)   Reviews (0)

Advera Health Analytics Evidex in Advanced Data Analytics

by advera-health-analytics

Advera Health Analytics Advanced Data Analytics provides better strategic decisions and insights.

0 (0)   Reviews (0)

Advera Health Analytics Evidex in Drug Safety Data & Service.

by advera-health-analytics

Advera Health Analytics Drug Safety Data & Services provides drug safety analytics.

0 (0)   Reviews (0)