Naive Regression vs. Factor Models: Adjusting for Multiple Cause Confounding
Authors: Justin Grimmer, Dean Knox, Brandon Stewart; 24(182):1−70, 2023.
Abstract
Factor models are commonly used in various fields, such as genetics, networks, medicine, and politics, to account for shared, unobserved confounders ($\boldsymbol{Z}$) in observational settings with multiple treatments ($\boldsymbol{A}$). Wang and Blei (2019, WB) propose the “deconfounder” method, which extends these procedures by using factor models of $\boldsymbol{A}$ to estimate “substitute confounders” ($\widehat{\boldsymbol{Z}}$). The deconfounder then estimates treatment effects by regressing the outcome ($\boldsymbol{Y}$) on a subset of $\boldsymbol{A}$ while adjusting for $\widehat{\boldsymbol{Z}}$. WB claim that the deconfounder is unbiased under certain assumptions, including the absence of single-cause confounders and the “pinpointing” of $\widehat{\boldsymbol{Z}}$. We clarify that pinpointing requires every confounder to affect infinitely many treatments. We prove that when the conditions for asymptotic unbiasedness of the deconfounder hold, a naive semiparametric regression of $\boldsymbol{Y}$ on $\boldsymbol{A}$ that ignores confounding is also asymptotically unbiased. We provide bias formulas for finite numbers of treatments and demonstrate that different deconfounders exhibit different types of bias. By replicating all deconfounder analyses with available data, we find that neither the naive regression nor the deconfounder consistently outperforms the other. Notably, the deconfounder produces implausible estimates in WB’s case study of movie earnings, suggesting that comic author Stan Lee’s cameo appearances causally contributed $15.5 billion, which represents a significant portion of Marvel movie revenue. As a result, we conclude that neither approach is a viable substitute for meticulous research design in real-world applications.
[abs]