Distributed Nonparametric Regression Imputation for Missing Response Problems with Large-scale Data
Ruoyu Wang, Miaomiao Su, Qihua Wang; 24(68):1−52, 2023.
Abstract
Missing data analysis often employs nonparametric regression imputation, but this approach faces challenges due to high dimensionality. The advent of big data has led to larger sample sizes, which can alleviate this problem. However, the storage and computational requirements of large-scale data pose new challenges for classical nonparametric regression imputation methods. To address this, we propose two distributed nonparametric regression imputation methods: one based on kernel smoothing and the other on the sieve method. The kernel-based method has low communication cost, while the sieve-based method can accommodate more local machines. We demonstrate these methods by estimating the response mean. Our proposed estimators for the response mean in distributed nonparametric regression imputation are shown to be asymptotically normal, with asymptotic variances achieving the semiparametric efficiency bound. We evaluate the proposed methods through simulation studies and real data analysis.
[abs]