Generalized Linear Models in Non-interactive Local Differential Privacy with Public Data
Di Wang, Lijie Hu, Huanyu Zhang, Marco Gaboardi, Jinhui Xu; 24(132):1−57, 2023.
Abstract
This paper examines the estimation of smooth Generalized Linear Models (GLMs) in the Non-interactive Local Differential Privacy (NLDP) model. In contrast to the classical setting, our model allows the server to access additional public but unlabeled data. The first part of the paper focuses on GLMs. Specifically, it considers the case where each data record is i.i.d. sampled from a zero-mean multivariate Gaussian distribution. Motivated by Stein’s lemma, the paper presents an $(\epsilon, \delta)$-NLDP algorithm for GLMs. Additionally, it provides the sample complexity of public and private data for the algorithm to achieve an $\ell_2$-norm estimation error of $\alpha$ (with high probability) as ${O}(p \alpha^{-2})$ and $ \tilde{O}(p^3\alpha^{-2}\epsilon^{-2})$ respectively, where $p$ is the dimension of the feature vector. This is a significant improvement over the previously known exponential or quasi-polynomial in $\alpha^{-1}$, or exponential in $p$ sample complexities of GLMs with no public data. The paper also considers a more general setting where each data record is i.i.d. sampled from some sub-Gaussian distribution with bounded $\ell_1$-norm. Based on a variant of Stein’s lemma, the paper proposes an $(\epsilon, \delta)$-NLDP algorithm for GLMs. In this setting, the sample complexity of public and private data to achieve an $\ell_\infty$-norm estimation error of $\alpha$ is ${O}(p^2\alpha^{-2})$ and $ \tilde{O}(p^2\alpha^{-2}\epsilon^{-2})$ respectively, under certain assumptions and if $\alpha$ is not too small, i.e., $\alpha \geq \Omega(\frac{1}{\sqrt{p}})$. The second part of the paper extends the idea to the problem of estimating non-linear regressions and demonstrates similar results as in GLMs for both multivariate Gaussian and sub-Gaussian cases. Finally, the paper presents experimental results on synthetic and real-world datasets to demonstrate the effectiveness of the proposed algorithms. To the best of our knowledge, this is the first paper that shows the existence of efficient and effective algorithms for GLMs and non-linear regressions in the NLDP model with unlabeled public data.
[abs]