On Consistent Bayesian Inference from Synthetic Data
Authors
Research Topics
Paper Information
-
Journal:
Journal of Machine Learning Research -
Added to Tracker:
Jul 15, 2025
Abstract
Generating synthetic data, with or without differential privacy, has attracted significant attention as a potential solution to the dilemma between making data easily available, and the privacy of data subjects. Several works have shown that consistency of downstream analyses from synthetic data, including accurate uncertainty estimation, requires accounting for the synthetic data generation. There are very few methods of doing so, most of them for frequentist analysis. In this paper, we study how to perform consistent Bayesian inference from synthetic data. We prove that mixing posterior samples obtained separately from multiple large synthetic data sets, that are sampled from a posterior predictive, converges to the posterior of the downstream analysis under standard regularity conditions when the analyst's model is compatible with the data provider's model. We also present several examples showing how the theory works in practice, and showing how Bayesian inference can fail when the compatibility assumption is not met, or the synthetic data set is not significantly larger than the original.
Author Details
Ossi Räisä
AuthorJoonas Jälkö
AuthorAntti Honkela
AuthorResearch Topics & Keywords
Bayesian Statistics
Research AreaCitation Information
APA Format
Ossi Räisä
,
Joonas Jälkö
&
Antti Honkela
.
On Consistent Bayesian Inference from Synthetic Data.
Journal of Machine Learning Research
.
BibTeX Format
@article{JMLR:v26:23-1428,
author = {Ossi R{{\"a}}is{{\"a}} and Joonas J{{\"a}}lk{{\"o}} and Antti Honkela},
title = {On Consistent Bayesian Inference from Synthetic Data},
journal = {Journal of Machine Learning Research},
year = {2025},
volume = {26},
number = {74},
pages = {1--65},
url = {http://jmlr.org/papers/v26/23-1428.html}
}