Skip to content

Predicting out-of-hospital deaths in claims data using machine learning and external linkage.

Notifications You must be signed in to change notification settings

opioiddatalab/OutOfHospitalDeath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Out-of-Hospital Death

What we are studying

IBM Watson Health’s (formerly Truven Health Analytics) MarketScan Research Databases contains longitudinal, real-world treatment patterns for over 230 million US patients, the largest such database in the world. These data have been used for more than 60 published analyses of opioid- related health outcomes. However, the assessment of fatalities is limited without linkage to death data since insurance claims only capture in-hospital deaths.

We will leverage our partnership with IBM Watson Health to develop new algorithms which differentiate disenrollment (due to change in insurance provider) from disenrollment as a result of death, including out-of-hospital deaths. In preliminary analyses, we found that 88% of deaths in the Marketscan population under the age of 65 (Commercial Claims and Encounters [CCAE]) and 89.6% of deaths among those 65 and older (Medicare Supplemental [MDS]) occurred out-of-hospital.

Research using insurance claims data allow for a comprehensive nationwide assessment of opioid prescribing patterns and downstream effects. However without linkage to death data, disenrollment from the database is treated as a censoring event, as patients are lost to follow-up. Our preliminary analyses show that 27.6% of individuals who “disenroll” from the Medicare Supplemental plan have a death date within 30 days of disenrollment. Death proximal to disenrollment is rare in those <65 (1.1% of disenrollment), but this is also a population where overdose-related death is a more frequent cause-of-death, making this a population of particular interest. When evaluating the association between opioid use and mortality, treating disenrollment as a censoring event will result in misclassifying outcome events of interest. Under special circumstances, unbiased relative effect measures can be estimated despite this. Specifically, deaths that are recognized (in-hospital) must be correctly classified (perfect specificity), and the proportion of deaths that is not captured must be the same across exposure groups and independent of other risk factors for the outcome. If this is not the case, the estimated risk ratio will be biased (either toward or away from the null). Currently, there is no way to know the extent to which this assumption holds without analyzing linked claims-death data. With sensitivity for mortality at ~10% using only in-hospital deaths, there is a substantial risk that modest differences in the location of deaths (in- vs. out-of-hospital) due to underlying differences in the exposure groups would result in bias. Moreover, all estimates of absolute effect measures (risk difference, incidence rate difference, cumulative incidence) will be biased, and these effect measures are critical for understanding public health impact and communicating risks to patients and their healthcare providers.

There are also study designs for which mortality is not specifically an outcome of interest (e.g., addiction treatment outcomes, long-term opioid use, tampering and injection consequences such as hepatitis and HIV infection) but nonetheless can introduce bias when death is mistakenly handled as a source of administrative censoring. When individuals are followed for differing amounts of follow-up time (as is often the case in analyses of administrative claims data), the date of disenrollment is typically considered a censoring event. The methods that are routinely used for time-to-event data (e.g. Kaplan-Meier, Cox proportional hazards) assume that those patients remain at risk after being censored, and outcomes that they are assumed to experience at the same rate as the uncensored patients are reflected in the cumulative incidence by having those who remain under observation stand-in for them. This will result in estimates of cumulative incidence that are higher than they should be. Differences between the comparison groups in the proportion of disenrollment due to death would lead to bias (either upward or downward). This is a serious concern in older populations (e.g., those >65 years of age) in whom death accounts for a substantial proportion of disenrollment. At present, accurate ascertainment of mortality in commercially insured patients under the age of 65 requires linkage to either state-specific death data or National Death Index (NDI). Given the complexity of requesting death data from all 50 states for any national study of health outcomes, National Death Index data are more straightforward, but they are also costly – often prohibitively so.

Why it matters

The development and validation of an algorithm to distinguish disenrollment (out-of- hospital death versus other types of disenrollment) addresses a major limitation in claims-based studies. These methodological advances will 1) enable a broad array of research for which mortality is an outcome of interest, 2) improve the efficiency of studies for which data on cause of death is necessary but too costly to obtain for the full study population, and 3) address a pervasive bias in studies of outcomes for which mortality serves as a competing risk. This effort will produce tools to overcome practical limitations for claims-based PMR studies of overdose to link to external mortality registries.

How we are studying it

Methods for this project are under construction. Conceptually, we will use named linkage between MarketScan and government mortality data to create an algorithm (released publicly) that differentiates death from disenrollment using variables found in claims data. The linked MarketScan-SSA data will be used to train and internally validate the machine learning algorithm’s performance. The algorithm will then go through external validation in a separate linked dataset, and applied analyses will quantify the bias due to misclassified deaths. This work will allow researcehers to understand the impact of unobserved out-of-hospital deaths in post-marketing studies of opioid-related morbidity and mortality as well as provide the necessary tools to enable multiple approaches to mitigating bias.

How to use the results

With this algorithm, researchers will be able to select cut-off points based on predicted probability to identify which patients are most likely to have died, allowing focused and cost-effective NDI linkage to death certificate data. Algorithm results will enable quantitative bias analyses based on stratum-specific discrimination of death (measured sensitivity and specificity of the algorithm within key subgroups) to assess the robustness of main findings and adjust both the estimate and its confidence interval appropriately. identifying instances when disenrollment is likely a proxy for death adds significant important health conditions and treatments, including but not limited to opioid studies. The method (but not the algorithm) described herein could serve as an example to be applied to opioid surveillance data.

Who is conducting and supporting the study

This study is led by Michele Jonsson Funk at the University of North Carolina Gillings School of Global Public Health, in conjunction with colleagues at IBM Watson Health. Researchers on the team include: Jessica Young, Maryalice Nocera, and Nabarun Dasgupta. Funding for this project comes from the United States Food and Drug Administration. Studies at the Opioid Data Lab are conducted by independent researchers and do not necessarily represent the views of funders or partners. We are grateful to generations of taxpayers in North Carolina for supporting public universities. We are also grateful to US taxpayers for safeguarding public health by supporting FDA and this research project. This study is being registered with the University of North Carolina Institutional Review Board.

About

Predicting out-of-hospital deaths in claims data using machine learning and external linkage.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages