Biometrika Feb 03, 2026

Assumption-Lean Post-Integrated Inference with Surrogate-Control Outcomes

Authors
Larry Wasserman Jin-Hong Du Kathryn Roeder
Paper Information
  • Journal:
    Biometrika
  • DOI:
    10.1093/biomet/asag004
  • Published:
    February 03, 2026
  • Added to Tracker:
    Feb 10, 2026
Abstract

Summary Data integration methods aim to extract low-dimensional embeddings from high-dimensional outcomes to remove unwanted variation, such as batch effects and unmeasured covariates, across heterogeneous datasets. However, multiple hypothesis testing after integration can be biased due to data-dependent processes. We introduce a robust post-integrated inference method that accounts for latent heterogeneity by leveraging control outcomes. Using causal interpretations, we derive nonparametric identifiability of direct effects via negative-control outcomes. By utilizing surrogate-control outcomes as an extension of negative-control outcomes, we develop semiparametric inference on projected direct-effect estimands, accounting for hidden mediators, confounders and moderators. These estimands remain statistically meaningful under model misspecification and in the presence of error-prone embeddings. We provide bias quantifications and finite-sample linear expansions with uniform concentration bounds. The proposed doubly robust estimators are consistent and efficient under minimal assumptions and potential misspecification, facilitating data-adaptive estimation using machine learning algorithms. We evaluate our approach with random forests through simulations and the analysis of single-cell CRISPR-perturbed datasets, which may contain potential unmeasured confounders.

Author Details
Larry Wasserman
Author
Jin-Hong Du
Author
Kathryn Roeder
Author
Citation Information
APA Format
Larry Wasserman , Jin-Hong Du & Kathryn Roeder (2026) . Assumption-Lean Post-Integrated Inference with Surrogate-Control Outcomes. Biometrika , 10.1093/biomet/asag004.
BibTeX Format
@article{paper865,
  title = { Assumption-Lean Post-Integrated Inference with Surrogate-Control Outcomes },
  author = { Larry Wasserman and Jin-Hong Du and Kathryn Roeder },
  journal = { Biometrika },
  year = { 2026 },
  doi = { 10.1093/biomet/asag004 },
  url = { https://doi.org/10.1093/biomet/asag004 }
}