Communication-efficient Distributed Statistical Inference for Massive Data with Heterogeneous Auxiliary Information
Authors
Paper Information
-
Journal:
Journal of Machine Learning Research -
Added to Tracker:
Mar 03, 2026
Abstract
Heterogeneous auxiliary information commonly arises in big data due to diverse study settings and privacy constraints. Excluding such indirect evidence often results in a substantial loss of statistical inference efficiency. This article proposes a novel framework for integrating a mixture of individual-level data and multiple external heterogeneous summary statistics by multiplying likelihood functions and confidence densities. Theoretically, we show that the proposed method possesses desirable properties and can achieve statistical efficiency comparable to that of the individual participant data (IPD) estimator, which uses all available individual-level data. Furthermore, we develop a communication-efficient distributed inference procedure for massive datasets with heterogeneous auxiliary information. We demonstrate that the proposed iterative algorithm achieves linear convergence under general conditions or generalized linear models. Finally, extensive simulations and real data applications are conducted to illustrate the performance of the proposed methods.
Author Details
Miaomiao Yu
AuthorZhongfeng Jiang
AuthorJiaxuan Li
AuthorYong Zhou
AuthorCitation Information
APA Format
Miaomiao Yu
,
Zhongfeng Jiang
,
Jiaxuan Li
&
Yong Zhou
.
Communication-efficient Distributed Statistical Inference for Massive Data with Heterogeneous Auxiliary Information.
Journal of Machine Learning Research
.
BibTeX Format
@article{paper986,
title = { Communication-efficient Distributed Statistical Inference for Massive Data with Heterogeneous Auxiliary Information },
author = {
Miaomiao Yu
and Zhongfeng Jiang
and Jiaxuan Li
and Yong Zhou
},
journal = { Journal of Machine Learning Research },
url = { https://www.jmlr.org/papers/v27/23-0440.html }
}