*Harvard PhD Program in Health Policy Alumnus & Faculty Member
Dissertation Title： "Evaluating Health Interventions Over Time: Empirical Tests of the Validity of the Single Interrupted Time Series Design"Single interrupted time series (ITS) is a quasi-experimental evaluation design used frequently in the health policy literature. This manuscript investigates the validity of single ITS through two within-study comparisons (WSCs), comparing the results of a randomized controlled trial (RCT) with the results that would have been obtained had a single ITS design been employed.
In Part 1, I discuss the theory underlying both within-study comparisons and single ITS. I propose an assessment framework to determine whether a given design should be deemed "concordant" with an RCT for a given intervention. This framework aims to unify metrics for concordance used in the existing literature, and accounts for both practical and statistical significance. After summarizing best practices of single ITS analysis, I propose a two part falsification test to determine whether the single ITS design is well suited for the trend stability of a particular dataset. This test draws from literature on determining structural breaks in time series data, as well as work on the optimal binning of data in the regression discontinuity design.
In Part 2, I conduct two within-study comparisons for single ITS. The first study evaluates a behavior change campaign in Uganda aimed at increasing uptake of rapid diagnostic tests for malaria. The WSC finds that single ITS estimates are highly concordant with that of the RCT, producing almost identical results in both point estimate and standard error. This result is robust to multiple specifications. The second study evaluates the effect of the expansion of Medicaid on emergency department use in Oregon. In this case, the single ITS estimates are so discordant with the RCT as to produce statistically significant results in the wrong direction. This result is also robust to multiple specification decisions. In comparing these opposing results, I note important differences between the two datasets. The Uganda data passed the falsification test for trend stability proposed in Part 1, while the Oregon data failed. Additionally, the Oregon sample is likely subject to a manifestation of self-selection known as "Ashenfelter's dip," whereas the Uganda sample is not. The implication of this shift in outcomes just before the intervention's introduction is especially damaging to single ITS, in comparison to traditionally "weaker" pre-post designs.
In Part 3, I attempt to generate hypotheses as to when single ITS should and should not be used. First, samples defined by self-selection are particularly problematic for single ITS analysis. Second, the advantages of relying on time trends must be weighed against the additional strong assumptions that the single ITS design carries with it. Third, trend stability in the pre period is a crucial factor in getting reliable estimates from single ITS. Fourth, the robustness of results in both WSCs suggests that whether to evaluate a given program with single ITS is a more important decision than how to implement it.