Regularized Joint Mixture Models
Konstantinos Perrakis, Thomas Lartigue, Frank Dondelinger, Sach Mukherjee; 24(19):1−47, 2023.
Abstract
Regularized regression models have been extensively studied and, when certain conditions are met, they provide fast and statistically interpretable results. However, in many applications, large datasets are heterogeneous, meaning that there are distributional differences between latent groups within the data. This means that the assumption that the conditional distribution of the response variable Y given the features X is the same for all samples may not hold. Additionally, in scientific applications, the covariance structure of the features may contain important signals, and learning this structure is also influenced by the latent group structure. In this paper, we propose a class of mixture models for paired data (X,Y) that combines the distribution of X (using sparse graphical models) and the conditional distribution of Y given X (using sparse regression models). The regression and graphical models are specific to the latent groups, and the model parameters are estimated jointly. This approach allows signals in either or both of the feature distribution and regression model to inform the learning of the latent structure and provides automatic control of confounding by such structure. Estimation is performed using an expectation-maximization algorithm, and its convergence is established theoretically. We illustrate the key ideas with empirical examples and provide an R package for implementation.
[abs]