The Blessings of Multiple Causes

Causal inference from observational data is a vital problem, but comes with strong assumptions.  Most methods assume that all confounders are observed, variables that correlate to both the causal variables (e.g., the treatment) and the effect of those variables (e.g., the efficacy of the treatment).  However, many scientific studies involve multiple causes, different variables whose effects are simultaneously of interest.  We propose the deconfounder, an algorithm that combines unsupervised machine learning and predictive model checking to perform causal inference in multiple-cause settings.  The deconfounder infers a latent variable as a substitute for unobserved confounders and then uses that substitute to perform causal inference.  We develop theory for when the deconfounder leads to unbiased causal estimates, and show that it requires weaker assumptions than classical causal inference. We analyze its performance in three types of studies: semi-simulated data around smoking and lung cancer, semi-simulated data around genomewide association studies, and a real dataset about actors and movie revenue.  The deconfounder provides a checkable approach to estimating close-to-truth causal effects.

This is joint work with David Blei.


Yixin Wang is a PhD candidate in the Statistics Department of Columbia University, advised by Professor David Blei. Her research interests lie in Bayesian statistics, machine learning, and causal inference. She obtained her BSc in Mathematics and Computer Science from the Hong Kong University of Science and Technology. Her research has received several awards, including the ASA Biometrics student paper award, the INFORMS data mining best paper award, and the ICSA conference young researcher award.