Public health data refers to healthcare information about an entire population, with the aim of improving overall health outcomes. Unlike some other data categories, this data must be disseminated to the general population.
Governments, private organizations, and academic institutions collect public health data from a variety of sources. These sources most often include anonymized hospital and healthcare data, disease registries, and insurance data. Other sources include surveys, medical studies, and law enforcement (particularly for addiction and mental illnesses).
Public health data is varied and unique to nations and smaller geographical areas. However, the data generally covers disease or condition, comorbidities, severity, sources of disease, historical trends, and demographics.
As written above, this data is used to track and improve the health of a population. Healthcare workers and public officials are the primary users of this data. They track public health, identify trends and sources of disease outbreaks, and craft responses to it.
These responses include public education catered to the demographics suffering most from the outbreak: for instance, an immigrant community with poor English and a historic distrust of doctors, for example, requires different strategies than a campaign focused on native-born citizens with no particular mistrust of medical professionals.
Insurers also use this data to screen new clients or identify existing clients who have high risk factors for disease. The companies can offer higher premiums or incentivized plans for lowering risk factors.
The wide range of sources, down to individual administrators in individual clinics, increase the likelihood of errors in annotation. This is particularly true for the most at-risk populations as they are the least likely to correct mistakes in their paperwork (when complete paperwork exists).
Additionally, analyzing data on patients with comorbidities comes with its own challenges: which disease is primary? Should a government address a secondary disease before addressing a primary disease?
In essense, the data scientist must simply collect as much data as possible, keep it as clean and up-to-date as possible, and accept that this data category has a high likelihood of inaccuracy.
EHR Intelligence: 6 Use Cases for EHR Data Utilization in Public, Community Health
Health Data
Public Health said in an advisory Thursday that it had temporarily removed Alberta’s COVID-19 testing data from its national statistics. It said there were problems with the province’s testing figures that overestimated the percentage of tests that came back positive, although it did not elaborate.