Model-based Causal Discovery for Zero-Inflated Count Data

Junsouk Choi, Yang Ni; 24(200):1−32, 2023.

Abstract

Zero-inflated count data are common in various scientific fields, including social science, biology, and genomics. However, few causal discovery approaches can effectively handle excessive zeros and the complex characteristics of multivariate count data, such as overdispersion. In this paper, we introduce a new model called the zero-inflated generalized hypergeometric directed acyclic graph (ZiG-DAG) model, which enables the inference of causal structure solely from observational zero-inflated count data. The ZiG-DAGs leverage a wide range of generalized hypergeometric probability distributions, allowing for flexible modeling of different types of zero-inflated count data. Furthermore, the ZiG-DAGs accommodate both linear and nonlinear causal relationships. We establish the identifiability of the causal structure for the proposed ZiG-DAGs using a general proof technique applicable beyond this particular model. We develop score-based algorithms for causal structure learning and demonstrate the superior performance of our method through extensive synthetic experiments and a real dataset with known ground truth. Finally, we showcase the practical utility of ZiG-DAGs by applying them to reverse-engineer a gene regulatory network from a single-cell RNA-sequencing dataset.

[abs]

[pdf][bib]

[code]