Assumption-Lean Post-Integrated Inference with Surrogate-Control Outcomes
Authors
Paper Information
-
Journal:
Biometrika -
DOI:
10.1093/biomet/asag004 -
Published:
February 03, 2026 -
Added to Tracker:
Feb 10, 2026
Abstract
Summary Data integration methods aim to extract low-dimensional embeddings from high-dimensional outcomes to remove unwanted variation, such as batch effects and unmeasured covariates, across heterogeneous datasets. However, multiple hypothesis testing after integration can be biased due to data-dependent processes. We introduce a robust post-integrated inference method that accounts for latent heterogeneity by leveraging control outcomes. Using causal interpretations, we derive nonparametric identifiability of direct effects via negative-control outcomes. By utilizing surrogate-control outcomes as an extension of negative-control outcomes, we develop semiparametric inference on projected direct-effect estimands, accounting for hidden mediators, confounders and moderators. These estimands remain statistically meaningful under model misspecification and in the presence of error-prone embeddings. We provide bias quantifications and finite-sample linear expansions with uniform concentration bounds. The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification, facilitating data-adaptive estimation using machine learning algorithms. We evaluate our approach with random forests through simulations and the analysis of single-cell CRISPR-perturbed datasets, which may contain potential unmeasured confounders.
Author Details
Larry Wasserman
AuthorJin-Hong Du
AuthorKathryn Roeder
AuthorCitation Information
APA Format
Larry Wasserman
,
Jin-Hong Du
&
Kathryn Roeder
(2026)
.
Assumption-Lean Post-Integrated Inference with Surrogate-Control Outcomes.
Biometrika
, 10.1093/biomet/asag004.
BibTeX Format
@article{paper865,
title = { Assumption-Lean Post-Integrated Inference with Surrogate-Control Outcomes },
author = {
Larry Wasserman
and Jin-Hong Du
and Kathryn Roeder
},
journal = { Biometrika },
year = { 2026 },
doi = { 10.1093/biomet/asag004 },
url = { https://doi.org/10.1093/biomet/asag004 }
}