Dissertation Title: "Essays on Health Policy Methods"This dissertation includes 4 chapters on methods for health policy research.
In the first chapter, we discuss how many statistical models rely on assumptions that, when violated, threaten their substantive conclusions. Motivated by a desire to demonstrate robustness, researchers may test the null hypothesis of "no violation" of these assumptions. Failing to find evidence to reject this null, they may conclude that the assumption holds, especially if the point estimate of the violation is small. However, this approach inappropriately reverses Type I and Type II error. In many cases, it may miss important violations due to lack of sufficient statistical power or, with adequate statistical power, detect violations that are "statistically significant" but practically trivial. With a focus on the parallel trends assumption in difference-in-differences, we reformulate model assumption tests in a non-inferiority framework and focus on ruling out violations that would meaningfully change main effect estimates. We conclude that ruling out meaningful violations of modeling assumptions requires additional statistical power, often similar in magnitude to adoption of more robust models.
In the second chapter, we extend this work explore how researchers may adapt analyses to meet the parallel trends assumption. We explore 3 modeling decisions: selection of the control group, time horizon, and functional form of the trend difference between treatment and comparison groups. With both simulations and examples from the Affordable Care Act’s dependent coverage mandate, we discuss how ad hoc approaches for making decisions about these parameters may bias estimates of treatment effects and overstate the strength of the evidence in favor of a chosen model. We argue that researchers should pre-specify the processes by which they make these decisions, including how they plan to make an initial selection, evaluate the appropriateness of their choice, consider and evaluate alternatives, and perform sensitivity analyses. We also detail statistical approaches applicable to such procedures, including multiple testing methods, alpha spending functions, and conditional inference. Overall, we highlight the value of being explicit and transparent in assumptions and model selection, while still allowing for researcher flexibility.
In the third chapter, we delve more deeply into one of these researcher decisions: the length of the pre-intervention time period. Difference-in-differences (DID) and synthetic control methods (SCM) assume that if treatment and comparison groups were sufficiently similar prior to an intervention (e.g. "parallel trends"), researchers can use the comparison group to impute the treatment group's counterfactual trajectory. Previous work has characterized conditions under which these methods are unbiased or asymptotically unbiased as the number of time periods goes to infinity, and cautioned against using pre-intervention time series that are too short. However, pre-intervention time series that are too long may also introduce bias when trends change. While all empirical researchers must select a pre-intervention time series, there is a dearth of guidance regarding its ideal length. We argue that rather than focusing on parallel trends over a long time horizon, researchers should optimize prediction of the treatment group by the comparison group. Based on this criterion, we present an estimator that leverages time-series cross validation to select optimal pre-intervention period weights. We show that this estimator is asymptotically unbiased under traditional assumptions. It also minimizes absolute or mean-squared error under more flexible assumptions about the stability of the data-generating process. In practice, our approach improves performance compared to other estimators in standard, empirically-calibrated simulation scenarios, even those with a relatively short number of pre-intervention time periods (e.g., a 40% decrease in median root mean-squared error compared to synthetic DID or augmented SCM). When applied to re-analyze the impact of Massachusetts health reform on mortality, our method also yields smaller treatment effects.
Last, the fourth chapter applied rigorous methods to the unexpected events of 2020. The emergence of COVID-19 prompted US schools to close their doors in spring 2020, disrupting the education of over 55 million K-12 students. A variety of approaches were adopted across the educational sector to maintain learning as the pandemic continued unabated in the 2020-2021 school year, including virtual instruction, alternating schedules, and surveillance testing. We developed an agent-based model to assess and compare the impact of these approaches on school-based transmission and outbreak risk. We found that elementary schools are mirrors of community incidence, reflecting the background circulation of COVID-19 with limited transmission events likely to unfold following a single introduction. By contrast, high schools were prone to large outbreaks without stringent mitigation efforts in place. Across age groups, we found that the risk of large outbreaks can be mitigated by investment in the PPE necessary to reduce classroom transmission potential, in surveillance testing with rapid turnaround, and across our society at large to suppress community transmission and prevent the introduction of infections into school settings.