Public health data refers to healthcare information about an entire population, with the aim of improving overall health outcomes. Unlike some other data categories, this data must be disseminated to the general population.
Governments, private organizations, and academic institutions collect public health data from a variety of sources. These sources most often include anonymized hospital and healthcare data, disease registries, and insurance data. Other sources include surveys, medical studies, and law enforcement (particularly for addiction and mental illnesses).
Public health data is varied and unique to nations and smaller geographical areas. However, the data generally covers disease or condition, comorbidities, severity, sources of disease, historical trends, and demographics.
As written above, this data is used to track and improve the health of a population. Healthcare workers and public officials are the primary users of this data. They track public health, identify trends and sources of disease outbreaks, and craft responses to it.
These responses include public education catered to the demographics suffering most from the outbreak: for instance, an immigrant community with poor English and a historic distrust of doctors, for example, requires different strategies than a campaign focused on native-born citizens with no particular mistrust of medical professionals.
Insurers also use this data to screen new clients or identify existing clients who have high risk factors for disease. The companies can offer higher premiums or incentivized plans for lowering risk factors.
The wide range of sources, down to individual administrators in individual clinics, increase the likelihood of errors in annotation. This is particularly true for the most at-risk populations as they are the least likely to correct mistakes in their paperwork (when complete paperwork exists).
Additionally, analyzing data on patients with comorbidities comes with its own challenges: which disease is primary? Should a government address a secondary disease before addressing a primary disease?
In essense, the data scientist must simply collect as much data as possible, keep it as clean and up-to-date as possible, and accept that this data category has a high likelihood of inaccuracy.
Public Health said in an advisory Thursday that it had temporarily removed Alberta’s COVID-19 testing data from its national statistics. It said there were problems with the province’s testing figures that overestimated the percentage of tests that came back positive, although it did not elaborate.
UNICEF Data tracks and presents data on child safety and development worldwide. Their Child Statistics include information on topics such as child health, nutrition, education, and protection.
Google Dataset Search provides quality, continuously-updating data of all kinds for both researchers, data analysts, journalists, and the general public. They aim to enable the free and open discovery of all kinds of data and metadata in the world.
The platform also offers a Dataset Developer Page to help people add structured data to their datasets or to resolve any other problems.
The Alation Data Platform leverages a client organization’s own internal data to help them manage their workflows. Their data catalog, governance, and discovery services rely on IT, machine learning, and industry data. Industries that Alation focuses on include healthcare, finance, insurance, manufacturing, and retail.