JMLR

Biological Sequence Kernels with Guaranteed Flexibility

Authors
Alan N. Amin Debora S. Marks Eli N. Weinstein
Research Topics
Nonparametric Statistics
Paper Information
  • Journal:
    Journal of Machine Learning Research
  • Added to Tracker:
    Dec 30, 2025
Abstract

Applying machine learning to biological sequences---DNA, RNA and protein---has enormous potential to advance human health and environmental sustainability. To support such high-stakes applications, it is important to develop models and evaluations that not only capture underlying biology, but also have theoretical guarantees of reliability and performance. In this article, we analyze kernel methods for biological sequences, including both hand-crafted kernels and deep neural network-based kernels. We show that popular biological kernels can severely fail at learning functions or distinguishing distributions. We then develop modified kernels that (1) are universal, characteristic, and metrize the space of distributions, and (2) preserve the underlying biological inductive biases and domain knowledge embedded in the original kernel. Our results rest on novel proof techniques for kernels that handle the structure of biological sequence space--discrete, variable length sequences--and biological notions of sequence similarity. We illustrate our theoretical results in simulation and on real biological data sets.

Author Details
Alan N. Amin
Author
Debora S. Marks
Author
Eli N. Weinstein
Author
Research Topics & Keywords
Nonparametric Statistics
Research Area
Citation Information
APA Format
Alan N. Amin , Debora S. Marks & Eli N. Weinstein . Biological Sequence Kernels with Guaranteed Flexibility. Journal of Machine Learning Research .
BibTeX Format
@article{paper714,
  title = { Biological Sequence Kernels with Guaranteed Flexibility },
  author = { Alan N. Amin and Debora S. Marks and Eli N. Weinstein },
  journal = { Journal of Machine Learning Research },
  url = { https://www.jmlr.org/papers/v26/23-0455.html }
}