University of California, Berkeley
Dissertation Title: "Prediction with Systematically Missing Data: Methods for Health Plan Payment and Cancer Stage Classification"Missing data is a common barrier in health services research and has important implications for both health plan payment policy and cancer outcomes research. This dissertation assesses two approaches for leveraging data in plan payment risk adjustment, and evaluates lung cancer stage classification algorithms and subsequently estimates survival outcomes.
Chapter one evaluates non-representative sampling in Medicare Advantage risk adjustment. Setting per-person payments based on data samples that differ nontrivially from their target populations may incorrectly characterize expected costs and create unintended adverse incentives. A propensity-score matched sample of traditional Medicare beneficiaries who resemble Medicare Advantage enrollees is used to estimate risk adjustment formulas. Matching improves balance on observables but fitting the risk adjustment formulas on a random versus a matched sample yields little difference in plan payments, suggesting that employing a random sample for risk adjustment estimation is not a large contributor to problematic selection incentives.
Chapter two proposes to break the feedback loop between insurer actions and health plan payments by transforming the data used to set payments. Data modified to reflect the researcher or policymaker’s beliefs about efficient and fair levels of spending versus observed spending levels can be used for calibrating payments. The proposed data modification approach is demonstrated in two Medicare applications and compared to two other common methods, illustrating that the “side effects” of the approaches vary by context and that data transformation is an effective tool for addressing misallocations in individual health insurance markets.
Chapter three examines using health insurance claims data to classify lung cancer stage and compares survival outcomes based on observed and predicted stage. Oncology health outcomes research has been limited by the difficulty of identifying cancer stage in claims data, and this study first demonstrates the feasibility of employing machine learning-based methods to classify early versus late stage lung cancer. This work is then extended to predicting a tripartite outcome of stages I-II, stage III, and stage IV, which is more clinically relevant due to the survival differences between these groups. The machine learning-based classification algorithms approximate the separation obtained by stratifying survival on the observed lung cancer stages.