Medical claims data (also billing claims data) refers to medical billing information submitted to insurers or national health services. Since this data contains information about diagnoses and treatments, healthcare workers must anonymize it before submission.
Doctors, nurses, and administrators record this information using the necessary medical codes (for example, the ICD10, CPT, or NDC). They then send the completed information to clearinghouses. These clearinghouses check that the claims are complete and work to anonymize the data for patient privacy.
Workplaces also record and submit this data in the case of workplace injuries.
Insurance companies, national health services, and workplace, hospital, and clinical administrations all maintain health claims databases.
Administrators record this data using universal medical codes, the most common being the ICD10 (International Classification of Diseases version 10). Other codes are the NDC (National Drug Code), the CPT (Current Procedural Terminology), and the HCPC system (Healthcare Common Procedure Coding system).
The information that these claims record are typically divided into two parts. The first part contains primary information like the patient’s primary diagnosis and the procedure(s) employed to treat it. Additional primary information includes the patient’s date of birth, sex, residential code, and their insurer and insurance plan.
The second part includes details on the patient’s ailments, including secondary diagnoses and physician notes.
The primary purpose of this data is to ensure that insurers properly cover the costs of patient care and medical procedures.
Secondary uses include evaluation of worker or public health for intervention and screening for fraud and waste. For example, medical claims data showing doctors in one county bill their patients for certain medical tests at a significantly higher rate than doctors in the neighboring county. This indicates either a localized health hazard or medical fraud on the part of the doctors. Further investigation should shed light on this situation.
It is very difficult to test the accuracy and validity of data at the initial collection stage. A data scientist may never know if an unscrupulous doctor claimed he provided a service that he knows he never delivered. However, not only are fraudulent actors most likely only a small minority, but advances in machine learning risk analyses alert insurers and national health services to potential issues faster and more reliably by the day. Patients themselves also have the right to see their medical records and can report errors to responsible bodies.
Beyond this stage, however, the professionals in the medical claims clearinghouses work every day to make sure the data they receive is complete, consistent, and clean. Data scientists may not have much additional work to do in this area.
Stanford Medical School: HEALTH CARE CLAIMS DATA
Change Healthcare: Claims Management System Integration Case Study
After controlling for secular trends and state fixed effects with multivariate regression models, Villalobos and colleagues found a positive association between Medicare imaging utilization and the lagged number of paid malpractice claims per capita.