Found 195 papers
Sorted by: Newest First"What is Different Between These Datasets?" A Framework for Explaining Data Distribution Shifts
Varun Babbar*, Zhicheng Guo*, Cynthia Rudin
The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-relate...
Linear Separation Capacity of Self-Supervised Representation Learning
Shulei Wang
Recent advances in self-supervised learning have highlighted the efficacy of data augmentation in learning data representation from unlabeled data. Tr...
On the Convergence of Projected Policy Gradient for Any Constant Step Sizes
Zhihua Zhang, Jiacai Liu, Wenye Li et al.
Projected policy gradient (PPG) is a basic policy optimization method in reinforcement learning. Given access to exact policy evaluations, previous s...
Learning with Linear Function Approximations in Mean-Field Control
Erhan Bayraktar, Ali Devran Kara
The paper focuses on mean-field type multi-agent control problems with finite state and action spaces where the dynamics and cost structures are symme...
A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization
Junwen Qiu, Xiao Li, Andre Milzarek
Random reshuffling techniques are prevalent in large-scale applications, such as training neural networks. While the convergence and acceleration effe...
Model-free Change-Point Detection Using AUC of a Classifier
Feiyu Jiang, Rohit Kanrar, Zhanrui Cai
In contemporary data analysis, it is increasingly common to work with non-stationary complex data sets. These data sets typically extend beyond the cl...
EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback
Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov et al.
First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based...
Multiple Instance Verification
Xin Xu, Eibe Frank, Geoffrey Holmes
We explore multiple instance verification, a problem setting in which a query instance is verified against a bag of target instances with heterogeneou...
Learning from Similar Linear Representations: Adaptivity, Minimaxity, and Robustness
Yang Feng, Yuqi Gu, Ye Tian
Representation multi-task learning (MTL) has achieved tremendous success in practice. However, the theoretical understanding of these methods is still...
Exponential Family Graphical Models: Correlated Replicates and Unmeasured Confounders, with Applications to fMRI Data
Kean Ming Tan, Yang Ning, Yanxin Jin
Graphical models have been used extensively for modeling brain connectivity networks. However, unmeasured confounders and correlations among measureme...
Optimizing Return Distributions with Distributional Dynamic Programming
Bernardo Ávila Pires, Mark Rowland, Diana Borsa et al.
We introduce distributional dynamic programming (DP) methods for optimizing statistical functionals of the return distribution, with standard reinforc...
Imprecise Multi-Armed Bandits: Representing Irreducible Uncertainty as a Zero-Sum Game
Vanessa Kosoy
We introduce a novel multi-armed bandit framework, where each arm is associated with a fixed unknown credal set over the space of outcomes (which can ...
Early Alignment in Two-Layer Networks Training is a Two-Edged Sword
Etienne Boursier, Nicolas Flammarion
Training neural networks with first order optimisation methods is at the core of the empirical success of deep learning. The scale of initialisation i...
Hierarchical Decision Making Based on Structural Information Principles
Xianghua Zeng, Hao Peng, Dingli Su et al.
Hierarchical Reinforcement Learning (HRL) is a promising approach for managing task complexity across multiple levels of abstraction and accelerating ...
Generative Adversarial Networks: Dynamics
Matias G. Delgadino, Bruno B. Suassuna, Rene Cabrera
We study quantitatively the overparametrization limit of the original Wasserstein-GAN algorithm. Effectively, we show that the algorithm is a stochast...
“What is Different Between These Datasets?” A Framework for Explaining Data Distribution Shifts
Varun Babbar*, Zhicheng Guo*, Cynthia Rudin
The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-relate...
Assumption-lean and data-adaptive post-prediction inference
Jiacheng Miao, Xinran Miao, Yixuan Wu et al.
A primary challenge facing modern scientific research is the limited availability of gold-standard data, which can be costly, labor-intensive, or inva...
Bagged Regularized k-Distances for Anomaly Detection
Hanyuan Hang, Hanfang Yang, Yuchao Cai et al.
We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled ...
Four Axiomatic Characterizations of the Integrated Gradients Attribution Method
Daniel Lundstrom, Meisam Razaviyayn
Deep neural networks have produced significant progress among machine learning models in terms of accuracy and functionality, but their inner workings...
Fast Algorithm for Constrained Linear Inverse Problems
Mohammed Rayyan Sheriff, Floor Fenne Redel, Peyman Mohajerin Esfahani
We consider the constrained Linear Inverse Problem (LIP), where a certain atomic norm (like the $\ell_1 $ norm) is minimized subject to a quadratic co...
High-Rank Irreducible Cartesian Tensor Decomposition and Bases of Equivariant Spaces
Shihao Shao, Yikang Li, Zhouchen Lin et al.
Irreducible Cartesian tensors (ICTs) play a crucial role in the design of equivariant graph neural networks, as well as in theoretical chemistry and c...
Best Linear Unbiased Estimate from Privatized Contingency Tables
Jordan Awan, Adam Edwards, Paul Bartholomew et al.
In differential privacy (DP) mechanisms, it can be beneficial to release "redundant" outputs, where some quantities can be estimated in multiple ways...
Interpretable Global Minima of Deep ReLU Neural Networks on Sequentially Separable Data
Thomas Chen, Patrícia Muñoz Ewald
We explicitly construct zero loss neural network classifiers. We write the weight matrices and bias vectors in terms of cumulative parameters, which ...
Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods
Bertille FOLLAIN, Francis BACH
We propose a new method for feature learning and function estimation in supervised learning via regularised empirical risk minimisation. Our approach ...
Data-Driven Performance Guarantees for Classical and Learned Optimizers
Rajiv Sambharya, Bartolomeo Stellato
We introduce a data-driven approach to analyze the performance of continuous optimization algorithms using generalization guarantees from statistical ...
Contextual Bandits with Stage-wise Constraints
Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett
We study contextual bandits in the presence of a stage-wise constraint when the constraint must be satisfied both with high probability and in expecta...
Boosting Causal Additive Models
Maximilian Kertel, Nadja Klein
We present a boosting-based method to learn additive Structural Equation Models (SEMs) from observational data, with a focus on the theoretical aspect...
Frequentist Guarantees of Distributed (Non)-Bayesian Inference
Bohan Wu, César A. Uribe
We establish frequentist properties, i.e., posterior consistency, asymptotic normality, and posterior contraction rates, for the distributed (non-)Bay...
Asymptotic Inference for Multi-Stage Stationary Treatment Policy with Variable Selection
Donglin Zeng, Yufeng Liu, Daiqi Gao
Dynamic treatment regimes or policies are a sequence of decision functions over multiple stages that are tailored to individual features. One importan...
EMaP: Explainable AI with Manifold-based Perturbations
Minh Nhat Vu, Huy Quang Mai, My T. Thai
In the last few years, many explanation methods based on the perturbations of input data have been introduced to shed light on the predictions generat...
Autoencoders in Function Space
Justin Bunker, Mark Girolami, Hefin Lambley et al.
Autoencoders have found widespread application in both their original deterministic form and in their variational formulation (VAEs). In scientific ap...
Nonparametric Regression on Random Geometric Graphs Sampled from Submanifolds
Paul Rosa, Judith Rousseau
We consider the nonparametric regression problem when the covariates are located on an unknown compact submanifold of a Euclidean space. Under definin...
System Neural Diversity: Measuring Behavioral Heterogeneity in Multi-Agent Learning
Matteo Bettini, Ajay Shankar, Amanda Prorok
Evolutionary science provides evidence that diversity confers resilience in natural systems. Yet, traditional multi-agent reinforcement learning techn...
Distribution Estimation under the Infinity Norm
Aryeh Kontorovich, Amichai Painsky
We present novel bounds for estimating discrete probability distributions under the $\ell_\infty$ norm. These are nearly optimal in various precise se...
Extending Temperature Scaling with Homogenizing Maps
Christopher Qian, Feng Liang, Jason Adams
As machine learning models continue to grow more complex, poor calibration significantly limits the reliability of their predictions. Temperature scal...
Density Estimation Using the Perceptron
Yury Polyanskiy, Patrik Róbert Gerber, Tianze Jiang et al.
We propose a new density estimation algorithm. Given $n$ i.i.d. observations from a distribution belonging to a class of densities on $\mathbb{R}^d$...
Simplex Constrained Sparse Optimization via Tail Screening
Xueqin Wang, Peng Chen, Jin Zhu et al.
We consider the probabilistic simplex-constrained sparse recovery problem. The commonly used Lasso-type penalty for promoting sparsity is ineffective ...
Score-Based Diffusion Models in Function Space
Jae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista et al.
Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data wit...
Regularized Rényi Divergence Minimization through Bregman Proximal Gradient Algorithms
Thomas Guilmeau, Emilie Chouzenoux, Víctor Elvira
We study the variational inference problem of minimizing a regularized Rényi divergence over an exponential family. We propose to solve this problem w...
WEFE: A Python Library for Measuring and Mitigating Bias in Word Embeddings
Pablo Badilla, Felipe Bravo-Marquez, María José Zambrano et al.
Word embeddings, which are a mapping of words into continuous vectors, are widely used in modern Natural Language Processing (NLP) systems. However, t...
Frontiers to the learning of nonparametric hidden Markov models
Elisabeth Gassiat, Zacharie Naulet, Kweku Abraham
Hidden Markov models (HMMs) are flexible tools for clustering dependent data coming from unknown populations, allowing nonparametric modelling of the ...
On Non-asymptotic Theory of Recurrent Neural Networks in Temporal Point Processes
Zhiheng Chen, Guanhua Fang, Wen Yu
Temporal point process (TPP) is an important tool for modeling and predicting irregularly timed events across various domains. Recently, the recurrent...
Classification in the high dimensional Anisotropic mixture framework: A new take on Robust Interpolation
Stanislav Minsker, Mohamed Ndaoud, Yiqiu Shen
We study the classification problem under the two-component anisotropic sub-Gaussian mixture model in high dimensions and in the non-asymptotic settin...
Universal Online Convex Optimization Meets Second-order Bounds
Yibo Wang, Lijun Zhang, Guanghui Wang et al.
Recently, several universal methods have been proposed for online convex optimization, and attain minimax rates for multiple types of convex function...
Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens
Amirreza Neshaei Moghaddam, Alex Olshevsky, Bahman Gharesifard
We provide the first known algorithm that provably achieves $\varepsilon$-optimality within $\widetilde{O}(1/\varepsilon)$ function evaluations for th...
Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests
Rahul Mazumder, Brian Liu
We study the often overlooked phenomenon, first noted in Breiman (2001), that random forests appear to reduce bias compared to bagging. Motivated by a...
skglm: Improving scikit-learn for Regularized Generalized Linear Models
Badr Moufad, Pierre-Antoine Bannier, Quentin Bertrand et al.
We introduce skglm, an open-source Python package for regularized Generalized Linear Models. Thanks to its composable nature, it supports combining da...
Losing Momentum in Continuous-time Stochastic Optimisation
Kexin Jin, Jonas Latz, Chenguang Liu et al.
The training of modern machine learning models often consists in solving high-dimensional non-convex optimisation problems that are subject to large-s...
Latent Process Models for Functional Network Data
Elizaveta Levina, Ji Zhu, Peter W. MacDonald
Network data are often sampled with auxiliary information or collected through the observation of a complex system over time, leading to multiple netw...
Dynamic Bayesian Learning for Spatiotemporal Mechanistic Models
Sudipto Banerjee, Xiang Chen, Ian Frankenburg et al.
We develop an approach for Bayesian learning of spatiotemporal dynamical mechanistic models. Such learning consists of statistical emulation of the me...
On the Ability of Deep Networks to Learn Symmetries from Data: A Neural Kernel Theory
Andrea Perin, Stephane Deny
Symmetries (transformations by group actions) are present in many datasets, and leveraging them holds considerable promise for improving predictions i...
Fine-grained Analysis and Faster Algorithms for Iteratively Solving Linear Systems
Michal Dereziński, Daniel LeJeune, Deanna Needell et al.
Despite being a key bottleneck in many machine learning tasks, the cost of solving large linear systems has proven challenging to quantify due to prob...
Deep Generative Models: Complexity, Dimensionality, and Approximation
Didong Li, Kevin Wang, Hongqian Niu et al.
Generative networks have shown remarkable success in learning complex data distributions, particularly in generating high-dimensional data from lower-...
ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation
Sungduk Yu, Zeyuan Hu, Akshay Subramaniam et al.
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing cri...
Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching
Jannis Chemseddine, Paul Hagemann, Gabriele Steidl et al.
In inverse problems, many conditional generative models approximate the posterior measure by minimizing a distance between the joint measure and its l...
Deep Variational Multivariate Information Bottleneck - A Framework for Variational Losses
Eslam Abdelaleem, Ilya Nemenman, K. Michael Martini
Variational dimensionality reduction methods are widely used for their accuracy, generative capabilities, and robustness. We introduce a unifying fram...
Diffeomorphism-based feature learning using Poincaré inequalities on augmented input space
Romain Verdière, Clémentine Prieur, Olivier Zahm
We propose a gradient-enhanced algorithm for high-dimensional function approximation. The algorithm proceeds in two steps: firstly, we reduce the inp...
Finite Expression Method for Solving High-Dimensional Partial Differential Equations
Senwei Liang, Haizhao Yang
Designing efficient and accurate numerical solvers for high-dimensional partial differential equations (PDEs) remains a challenging and important topi...
Randomly Projected Convex Clustering Model: Motivation, Realization, and Cluster Recovery Guarantees
Defeng Sun, Yancheng Yuan, Ziwen Wang et al.
In this paper, we propose a randomly projected convex clustering model for clustering a collection of $n$ high dimensional data points in $\mathbb{R}^...
Minimax Optimal Deep Neural Network Classifiers Under Smooth Decision Boundary
Zuofeng Shang, Tianyang Hu, Ruiqi Liu et al.
Deep learning has gained huge empirical successes in large-scale classification problems. In contrast, there is a lack of statistical understanding ab...
Optimal and Efficient Algorithms for Decentralized Online Convex Optimization
Lijun Zhang, Yuanyu Wan, Tong Wei et al.
We investigate decentralized online convex optimization (D-OCO), in which a set of local learners are required to minimize a sequence of global loss f...
Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning
Dennis Chemnitz, Maximilian Engel
For overparameterized optimization tasks, such as those found in modern machine learning, global minima are generally not unique. In order to understa...
PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks
Xiyue Zhang, Benjie Wang, Marta Kwiatkowska et al.
Most methods for neural network verification focus on bounding the image, i.e., set of outputs for a given input set. This can be used to, for example...
Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability
Céline Comte, Matthieu Jonckheere, Jaron Sanders et al.
In this paper, we introduce a policy-gradient method for model-based reinforcement learning (RL) that exploits a type of stationary distributions comm...
On the O(sqrt(d)/T^(1/4)) Convergence Rate of RMSProp and Its Momentum Extension Measured by l_1 Norm
Zhouchen Lin, Huan Li, Yiming Dong
Although adaptive gradient methods have been extensively used in deep learning, their convergence rates proved in the literature are all slower than t...
Categorical Semantics of Compositional Reinforcement Learning
Georgios Bakirtzis, Michail Savvas, Ufuk Topcu
Compositional knowledge representations in reinforcement learning (RL) facilitate modular, interpretable, and safe task specifications. However, gener...
Transformers from Diffusion: A Unified Framework for Neural Message Passing
David Wipf, Qitian Wu, Junchi Yan
Learning representations for structured data with certain geometries (e.g., observed or unobserved) is a fundamental challenge, wherein message passin...
Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning
Yong Lin, Chen Liu, Chenlu Ye et al.
Modern deep learning heavily relies on large labeled datasets, which often comse with high costs in terms of both manual labeling and computational re...
Actor-Critic learning for mean-field control in continuous time
Noufel FRIKHA, Maximilien GERMAIN, Mathieu LAURIERE et al.
We study policy gradient for mean-field control in continuous time in a reinforcement learning setting. By considering randomised policies with entro...
Modelling Populations of Interaction Networks via Distance Metrics
George Bolt, Simón Lunagómez, Christopher Nemeth
Network data arises through the observation of relational information between a collection of entities, for example, friendships (relations) amongst a...
BitNet: 1-bit Pre-training for Large Language Models
Lei Wang, Yi Wu, Hongyu Wang et al.
The increasing size of large language models (LLMs) has posed challenges for deployment and raised concerns about environmental impact due to high ene...
Physics-informed Kernel Learning
Gérard Biau, Nathan Doumèche, Francis Bach et al.
Physics-informed machine learning typically integrates physical priors into the learning process by minimizing a loss function that includes both a da...
Last-iterate Convergence of Shuffling Momentum Gradient Method under the Kurdyka-Lojasiewicz Inequality
Yuqing Liang, Dongpo Xu
Shuffling gradient algorithms are extensively used to solve finite-sum optimization problems in machine learning. However, their theoretical propertie...
Posterior and Variational Inference for Deep Neural Networks with Heavy-Tailed Weights
Ismaël Castillo, Paul Egels
We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random. Following a recent idea of...
Maximum Causal Entropy IRL in Mean-Field Games and GNEP Framework for Forward RL
Berkay Anahtarci, Can Deha Kariksiz, Naci Saldi
This paper explores the use of Maximum Causal Entropy Inverse Reinforcement Learning (IRL) within the context of discrete-time stationary Mean-Field G...
Degree of Interference: A General Framework For Causal Inference Under Interference
Yuki Ohnishi, Bikram Karmakar, Arman Sabbaghi
One core assumption typically adopted for valid causal inference is that of no interference between experimental units, i.e., the outcome of an experi...
Quantifying the Effectiveness of Linear Preconditioning in Markov Chain Monte Carlo
Max Hird, Samuel Livingstone
We study linear preconditioning in Markov chain Monte Carlo. We consider the class of well-conditioned distributions, for which several mixing time bo...
Sparse SVM with Hard-Margin Loss: a Newton-Augmented Lagrangian Method in Reduced Dimensions
Penghe Zhang, Naihua Xiu, Hou-Duo Qi
The hard-margin loss function has been at the core of the support vector machine research from the very beginning due to its generalization capability...
On Model Identification and Out-of-Sample Prediction of PCR with Applications to Synthetic Controls
Devavrat Shah, Anish Agarwal, Dennis Shen
We analyze principal component regression (PCR) in a high-dimensional error-in-variables setting with fixed design. Under suitable conditions, we show...
Bayesian Scalar-on-Image Regression with a Spatially Varying Single-layer Neural Network Prior
Keru Wu, Jian Kang, Ben Wu
Deep neural networks (DNN) have been widely used in scalar-on-image regression to predict an outcome variable from imaging predictors. However, train...
DRM Revisited: A Complete Error Analysis
Yuling Jiao, Ruoxuan Li, Peiying Wu et al.
It is widely known that the error analysis for deep learning involves approximation, statistical, and optimization errors. However, it is challenging ...
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Zhuoran Yang, Han Shen, Tianyi Chen
Bilevel optimization has been recently applied to many machine learning tasks. However, their applications have been restricted to the supervised lear...
Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers
Fan Yang, Hongyang R. Zhang, Sen Wu et al.
The problem of learning one task using samples from another task is central to transfer learning. In this paper, we focus on answering the following q...
Score-based Causal Representation Learning: Linear and General Transformations
Burak Var{{\i}}c{{\i}}, Emre Acartürk, Karthikeyan Shanmugam et al.
This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transfor...
On the Statistical Properties of Generative Adversarial Models for Low Intrinsic Data Dimension
Saptarshi Chakraborty, Peter L. Bartlett
Despite the remarkable empirical successes of Generative Adversarial Networks (GANs), the theoretical guarantees for their statistical accuracy remain...
Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms
Keru Wu, Yuansi Chen, Wooseok Ha et al.
Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that ...
Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles
Lesi Chen, Yaohua Ma, Jingzhao Zhang
In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (...
Adaptive Distributed Kernel Ridge Regression: A Feasible Distributed Learning Scheme for Data Silos
Shao-Bo Lin, Xiaotong Liu, Di Wang et al.
Data silos, mainly caused by privacy and interoperability, significantly constrain collaborations among different organizations with similar data for ...
On Global and Local Convergence of Iterative Linear Quadratic Optimization Algorithms for Discrete Time Nonlinear Control
Vincent Roulet, Siddhartha Srinivasa, Maryam Fazel et al.
A classical approach for solving discrete time nonlinear control on a finite horizon consists in repeatedly minimizing linear quadratic approximations...
A Decentralized Proximal Gradient Tracking Algorithm for Composite Optimization on Riemannian Manifolds
Lei Wang, Le Bao, Xin Liu
This paper focuses on minimizing a smooth function combined with a nonsmooth regularization term on a compact Riemannian submanifold embedded in the E...
Learning conditional distributions on continuous spaces
Cyril Benezet, Ziteng Cheng, Sebastian Jaimungal
We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes, allowing for different dimensions of the feature an...
A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs
Lukas Zierahn, Dirk van der Hoeven, Tal Lancewicki et al.
We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed f...
Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities
Rocco Caprio, Juan Kuntz, Samuel Power et al.
We derive non-asymptotic error bounds for particle gradient descent (PGD, Kuntz et al. (2023)), a recently introduced algorithm for maximum likelihoo...
Linear Hypothesis Testing in High-Dimensional Expected Shortfall Regression with Heavy-Tailed Errors
Kean Ming Tan, Wen-Xin Zhou, Gaoyu Wu et al.
Expected shortfall (ES) is widely used for characterizing the tail of a distribution across various fields, particularly in financial risk management....
Efficient Numerical Integration in Reproducing Kernel Hilbert Spaces via Leverage Scores Sampling
Antoine Chatalic, Nicolas Schreuder, Ernesto De Vito et al.
In this work we consider the problem of numerical integration, i.e., approximating integrals with respect to a target probability measure using only p...
Distribution Free Tests for Model Selection Based on Maximum Mean Discrepancy with Estimated Parameters
Florian Brück, Jean-David Fermanian, Aleksey Min
There exist several testing procedures based on the maximum mean discrepancy (MMD) to address the challenge of model specification. However, these tes...
Statistical field theory for Markov decision processes under uncertainty
George Stamatescu
A statistical field theory is introduced for finite state and action Markov decision processes with unknown parameters, in a Bayesian setting. The Bel...
Bayesian Data Sketching for Varying Coefficient Regression Models
Rajarshi Guhaniyogi, Laura Baracaldo, Sudipto Banerjee
Varying coefficient models are popular for estimating nonlinear regression functions in functional data models. Their Bayesian variants have received ...
Bagged k-Distance for Mode-Based Clustering Using the Probability of Localized Level Sets
Hanyuan Hang
In this paper, we propose an ensemble learning algorithm named bagged $k$-distance for mode-based clustering (BDMBC) by putting forward a new measure ...
Linear cost and exponentially convergent approximation of Gaussian Matérn processes on intervals
David Bolin, Vaibhav Mehandiratta, Alexandre B. Simas
The computational cost for inference and prediction of statistical models based on Gaussian processes with Matérn covariance functions scales cubicall...
Invariant Subspace Decomposition
Margherita Lazzaretto, Jonas Peters, Niklas Pfister
We consider the task of predicting a response $Y$ from a set of covariates $X$ in settings where the conditional distribution of $Y$ given $X$ changes...
Posterior Concentrations of Fully-Connected Bayesian Neural Networks with General Priors on the Weights
Insung Kong, Yongdai Kim
Bayesian approaches for training deep neural networks (BNNs) have received significant interest and have been effectively utilized in a wide range of ...
Outlier Robust and Sparse Estimation of Linear Regression Coefficients
Takeyuki Sasai, Hironori Fujisawa
We consider outlier-robust and sparse estimation of linear regression coefficients, when the covariates and the noises are contaminated by adversarial...
Affine Rank Minimization via Asymptotic Log-Det Iteratively Reweighted Least Squares
Sebastian Krämer
The affine rank minimization problem is a well-known approach to matrix recovery. While there are various surrogates to this NP-hard problem, we prove...
Causal Effect of Functional Treatment
Ruoxu Tan, Wei Huang, Zheng Zhang et al.
We study the causal effect with a functional treatment variable, where practical applications often arise in neuroscience, biomedical sciences, etc. P...
Uplift Model Evaluation with Ordinal Dominance Graphs
Brecht Verbeken, Marie-Anne Guerry, Wouter Verbeke et al.
Uplift modelling is a subfield of causal learning that focuses on ranking entities by individual treatment effects. Uplift models are typically evalua...
High-Dimensional L2-Boosting: Rate of Convergence
Ye Luo, Martin Spindler, Jannis Kueck
Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of L2-Boosting in a high-dimensio...
Feature Learning in Finite-Width Bayesian Deep Linear Networks with Multiple Outputs and Convolutional Layers
Federico Bassetti, Marco Gherardi, Alessandro Ingrosso et al.
Deep linear networks have been extensively studied, as they provide simplified models of deep learning. However, little is known in the case of finite...
How good is your Laplace approximation of the Bayesian posterior? Finite-sample computable error bounds for a variety of useful divergences
Miko{\l}aj J. Kasprzak, Ryan Giordano, Tamara Broderick
The Laplace approximation is a popular method for constructing a Gaussian approximation to the Bayesian posterior and thereby approximating the poster...
Integral Probability Metrics Meet Neural Networks: The Radon-Kolmogorov-Smirnov Test
Alden Green, Seunghoon Paik, Michael Celentano et al.
Integral probability metrics (IPMs) constitute a general class of nonparametric two-sample tests that are based on maximizing the mean difference betw...
On Inference for the Support Vector Machine
Wen-Xin Zhou, Jakub Rybak, Heather Battey
The linear support vector machine has a parametrised decision boundary. The paper considers inference for the corresponding parameters, which indicate...
Random Pruning Over-parameterized Neural Networks Can Improve Generalization: A Training Dynamics Analysis
Hongru Yang, Yingbin Liang, Xiaojie Guo et al.
It has been observed that applying pruning-at-initialization methods and training the sparse networks can sometimes yield slightly better test perform...
Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability
Atticus Geiger, Duligur Ibeling, Amir Zur et al.
Causal abstraction provides a theoretical foundation for mechanistic interpretability, the field concerned with providing intelligible algorithms that...
Implicit vs Unfolded Graph Neural Networks
Yongyi Yang, Tang Liu, Yangkun Wang et al.
It has been observed that message-passing graph neural networks (GNN) sometimes struggle to maintain a healthy balance between the efficient / scalabl...
Towards Optimal Branching of Linear and Semidefinite Relaxations for Neural Network Robustness Certification
Brendon G. Anderson, Ziye Ma, Jingqi Li et al.
In this paper, we study certifying the robustness of ReLU neural networks against adversarial input perturbations. To diminish the relaxation error su...
GraphNeuralNetworks.jl: Deep Learning on Graphs with Julia
Carlo Lucibello, Aurora Rossi
GraphNeuralNetworks.jl is an open-source framework for deep learning on graphs, written in the Julia programming language. It supports multiple GPU ba...
Dynamic angular synchronization under smoothness constraints
Ernesto Araya, Mihai Cucuringu, Hemant Tyagi
Given an undirected measurement graph $\mathcal{H} = ([n], \mathcal{E})$, the classical angular synchronization problem consists of recovering unkno...
Derivative-Informed Neural Operator Acceleration of Geometric MCMC for Infinite-Dimensional Bayesian Inverse Problems
Lianghao Cao, Thomas O'Leary-Roseberry, Omar Ghattas
We propose an operator learning approach to accelerate geometric Markov chain Monte Carlo (MCMC) for solving infinite-dimensional Bayesian inverse pro...
Wasserstein F-tests for Frechet regression on Bures-Wasserstein manifolds
Hongzhe Li, Haoshu Xu
This paper addresses regression analysis for covariance matrix-valued outcomes with Euclidean covariates, motivated by applications in single-cell gen...
Distributed Stochastic Bilevel Optimization: Improved Complexity and Heterogeneity Analysis
Youcheng Niu, Jinming Xu, Ying Sun et al.
This paper considers solving a class of nonconvex-strongly-convex distributed stochastic bilevel optimization (DSBO) problems with personalized inner-...
Learning causal graphs via nonlinear sufficient dimension reduction
Eftychia Solea, Bing Li, Kyongwon Kim
We introduce a new nonparametric methodology for estimating a directed acyclic graph (DAG) from observational data. Our method is nonparametric in nat...
On Consistent Bayesian Inference from Synthetic Data
Ossi Räisä, Joonas Jälkö, Antti Honkela
Generating synthetic data, with or without differential privacy, has attracted significant attention as a potential solution to the dilemma between ma...
Optimization Over a Probability Simplex
James Chok, Geoffrey M. Vasil
We propose a new iteration scheme, the Cauchy-Simplex, to optimize convex problems over the probability simplex $\{w\in\mathbb{R}^n\ |\ \sum_i w_i=1\ ...
Laplace Meets Moreau: Smooth Approximation to Infimal Convolutions Using Laplace's Method
Ryan J. Tibshirani, Samy Wu Fung, Howard Heaton et al.
We study approximations to the Moreau envelope---and infimal convolutions more broadly---based on Laplace's method, a classical tool in analysis which...
Sampling and Estimation on Manifolds using the Langevin Diffusion
Karthik Bharath, Alexander Lewis, Akash Sharma et al.
Error bounds are derived for sampling and estimation using a discretization of an intrinsically defined Langevin diffusion with invariant measure $\te...
Sharp Bounds for Sequential Federated Learning on Heterogeneous Data
Yipeng Li, Xinchen Lyu
There are two paradigms in Federated Learning (FL): parallel FL (PFL), where models are trained in a parallel manner across clients, and sequential FL...
Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization
Yaoyu Zhang, Leyang Zhang, Zhongwang Zhang et al.
Determining whether deep neural network (DNN) models can reliably recover target functions at overparameterization is a critical yet complex issue in ...
Stabilizing Sharpness-Aware Minimization Through A Simple Renormalization Strategy
Chengli Tan, Jiangshe Zhang, Junmin Liu et al.
Recently, sharpness-aware minimization (SAM) has attracted much attention because of its surprising effectiveness in improving generalization performa...
Fine-Grained Change Point Detection for Topic Modeling with Pitman-Yor Process
Feifei Wang, Zimeng Zhao, Ruimin Ye et al.
Identifying change points in dynamic text data is crucial for understanding the evolving nature of topics across various sources, such as news article...
Deletion Robust Non-Monotone Submodular Maximization over Matroids
Paul Dütting, Federico Fusco, Silvio Lattanzi et al.
We study the deletion robust version of submodular maximization under matroid constraints. The goal is to extract a small-size summary of the data set...
Instability, Computational Efficiency and Statistical Accuracy
Raaz Dwivedi, Koulik Khamaru, Martin J. Wainwright et al.
Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an ...
Estimation of Local Geometric Structure on Manifolds from Noisy Data
Yariv Aizenbud, Barak Sober
A common observation in data-driven applications is that high-dimensional data have a low intrinsic dimension, at least locally. In this work, we cons...
Ontolearn---A Framework for Large-scale OWL Class Expression Learning in Python
Caglar Demir, Alkid Baci, N'Dah Jean Kouagou et al.
In this paper, we present Ontolearn---a framework for learning OWL class expressions over large knowledge graphs. Ontolearn contains efficient implem...
Continuously evolving rewards in an open-ended environment
Richard M. Bailey
Unambiguous identification of the rewards driving behaviours of entities operating in complex open-ended real-world environments is difficult, in part...
Recursive Causal Discovery
Ehsan Mokhtarian, Sepehr Elahi, Sina Akbari et al.
Causal discovery from observational data, i.e., learning the causal graph from a finite set of samples from the joint distribution of the variables, i...
Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings
Ilya Shpitser, Henrik von Kleist, Alireza Zamanian et al.
Machine learning methods often assume that input features are available at no cost. However, in domains like healthcare, where acquiring features coul...
On Adaptive Stochastic Optimization for Streaming Data: A Newton's Method with O(dN) Operations
Antoine Godichon-Baggioni, Nicklas Werge
Stochastic optimization methods face new challenges in the realm of streaming data, characterized by a continuous flow of large, high-dimensional data...
Determine the Number of States in Hidden Markov Models via Marginal Likelihood
Yang Chen, Cheng-Der Fuh, Chu-Lan Michael Kao
Hidden Markov models (HMM) have been widely used by scientists to model stochastic systems: the underlying process is a discrete Markov chain, and the...
Variance-Aware Estimation of Kernel Mean Embedding
Geoffrey Wolfer, Pierre Alquier
An important feature of kernel mean embeddings (KME) is that the rate of convergence of the empirical KME to the true distribution KME can be bounded ...
Scaling ResNets in the Large-depth Regime
Pierre Marion, Adeline Fermanian, Gérard Biau et al.
Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these arc...
A Comparative Evaluation of Quantification Methods
Tobias Schumacher, Markus Strohmaier, Florian Lemmerich
Quantification represents the problem of estimating the distribution of class labels on unseen data. It also represents a growing research field in su...
Lightning UQ Box: Uncertainty Quantification for Neural Networks
Nils Lehmann, Nina Maria Gottschling, Jakob Gawlikowski et al.
Although neural networks have shown impressive results in a multitude of application domains, the "black box" nature of deep learning and lack of conf...
Scaling Data-Constrained Language Models
Niklas Muennighoff, Alexander M. Rush, Boaz Barak et al.
The current trend of scaling language models involves increasing both parameter count and training data set size. Extrapolating this trend suggests th...
Curvature-based Clustering on Graphs
Zachary Lubberts, Yu Tian, Melanie Weber
Unsupervised node clustering (or community detection) is a classical graph learning task. In this paper, we study algorithms that exploit the geometry...
Composite Goodness-of-fit Tests with Kernels
Oscar Key, Arthur Gretton, François-Xavier Briol et al.
We propose kernel-based hypothesis tests for the challenging composite testing problem, where we are interested in whether the data comes from any dis...
PFLlib: A Beginner-Friendly and Comprehensive Personalized Federated Learning Library and Benchmark
Yang Liu, Jianqing Zhang, Yang Hua et al.
Amid the ongoing advancements in Federated Learning (FL), a machine learning paradigm that allows collaborative learning with data privacy protection,...
The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning
Wooseok Ha, Bin Yu, Nikhil Ghosh et al.
In this work, we investigate the dynamics of stochastic gradient descent (SGD) when training a single-neuron autoencoder with linear or ReLU activatio...
Efficient and Robust Transfer Learning of Optimal Individualized Treatment Regimes with Right-Censored Survival Data
Pan Zhao, Shu Yang, Julie Josse
An individualized treatment regime (ITR) is a decision rule that assigns treatments based on patients' characteristics. The value function of an ITR i...
DAGs as Minimal I-maps for the Induced Models of Causal Bayesian Networks under Conditioning
Xiangdong Xie, Jiahua Guo, Yi Sun
Bayesian networks (BNs) are a powerful tool for knowledge representation and reasoning, especially for complex systems. A critical task in the applic...
Adjusted Expected Improvement for Cumulative Regret Minimization in Noisy Bayesian Optimization
Shouri Hu, Haowei Wang, Zhongxiang Dai et al.
The expected improvement (EI) is one of the most popular acquisition functions for Bayesian optimization (BO) and has demonstrated good empirical perf...
Manifold Fitting under Unbounded Noise
Zhigang Yao, Yuqing Xia
In the field of non-Euclidean statistical analysis, a trend has emerged in recent times, of attempts to recover a low dimensional structure, namely a ...
Learning Global Nash Equilibrium in Team Competitive Games with Generalized Fictitious Cross-Play
Zelai Xu, Chao Yu, Yancheng Liang et al.
Self-play (SP) is a popular multi-agent reinforcement learning framework for competitive games. Despite the empirical success, the theoretical propert...
Wasserstein Convergence Guarantees for a General Class of Score-Based Generative Models
Xuefeng Gao, Hoang M. Nguyen, Lingjiong Zhu
Score-based generative models are a recent class of deep generative models with state-of-the-art performance in many applications. In this paper, we e...
Extremal graphical modeling with latent variables via convex optimization
Sebastian Engelke, Armeen Taeb
Extremal graphical models encode the conditional independence structure of multivariate extremes and provide a powerful tool for quantifying the risk ...
On the Approximation of Kernel functions
Paul Dommel, Alois Pichler
Various methods in statistical learning build on kernels considered in reproducing kernel Hilbert spaces. In applications, the kernel is often selecte...
Efficient and Robust Semi-supervised Estimation of Average Treatment Effect with Partially Annotated Treatment and Response
Jue Hou, Tianxi Cai, Rajarshi Mukherjee
A notable challenge of leveraging Electronic Health Records (EHR) for treatment effect assessment is the lack of precise information on important clin...
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Jingyang Li, Kuangyu Ding, Kim-Chuan Toh
Stochastic gradient methods for minimizing nonconvex composite objective functions typically rely on the Lipschitz smoothness of the differentiable pa...
Optimizing Data Collection for Machine Learning
Rafid Mahmood, James Lucas, Jose M. Alvarez et al.
Modern deep learning systems require huge data sets to achieve impressive performance, but there is little guidance on how much or what kind of data t...
Unbalanced Kantorovich-Rubinstein distance, plan, and barycenter on nite spaces: A statistical perspective
Shayan Hundrieser, Florian Heinemann, Marcel Klatt et al.
We analyze statistical properties of plug-in estimators for unbalanced optimal transport quantities between finitely supported measures in different p...
Copula-based Sensitivity Analysis for Multi-Treatment Causal Inference with Unobserved Confounding
Jiajing Zheng, Alexander D'Amour, Alexander Franks
Recent work has focused on the potential and pitfalls of causal identification in observational studies with multiple simultaneous treatments. Buildin...
Rank-one Convexification for Sparse Regression
Alper Atamturk, Andres Gomez
Sparse regression models are increasingly prevalent due to their ease of interpretability and superior out-of-sample performance. However, the exact m...
gsplat: An Open-Source Library for Gaussian Splatting
Vickie Ye, Ruilong Li, Justin Kerr et al.
gsplat is an open-source library designed for training and developing Gaussian Splatting methods. It features a front-end with Python bindings compati...
Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming
Sen Na, Michael Mahoney
We consider online statistical inference of constrained stochastic nonlinear optimization problems. We apply the Stochastic Sequential Quadratic Progr...
Sliced-Wasserstein Distances and Flows on Cartan-Hadamard Manifolds
Clément Bonet, Lucas Drumetz, Nicolas Courty
While many Machine Learning methods have been developed or transposed on Riemannian manifolds to tackle data with known non-Euclidean geometry, Optima...
Accelerating optimization over the space of probability measures
Shi Chen, Qin Li, Oliver Tse et al.
The acceleration of gradient-based optimization methods is a subject of significant practical and theoretical importance, particularly within machine ...
Bayesian Multi-Group Gaussian Process Models for Heterogeneous Group-Structured Data
Sudipto Banerjee, Didong Li, Andrew Jones et al.
Gaussian processes are pervasive in functional data analysis, machine learning, and spatial statistics for modeling complex dependencies. Scientific d...
Orthogonal Bases for Equivariant Graph Learning with Provable k-WL Expressive Power
Jia He, Maggie Cheng
Graph neural network (GNN) models have been widely used for learning graph-structured data. Due to the permutation-invariant requirement of graph lear...
Optimal Experiment Design for Causal Effect Identification
Sina Akbari, Negar Kiyavash, Jalal Etesami
Pearl’s do calculus is a complete axiomatic approach to learn the identifiable causal effects from observational data. When such an effect is not iden...
Mean Aggregator is More Robust than Robust Aggregators under Label Poisoning Attacks on Distributed Heterogeneous Data
Jie Peng, Weiyu Li, Stefan Vlaski et al.
Robustness to malicious attacks is of paramount importance for distributed learning. Existing works usually consider the classical Byzantine attacks m...
The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond
Jiin Woo, Gauri Joshi, Yuejie Chi
In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on lo...
depyf: Open the Opaque Box of PyTorch Compiler for Machine Learning Researchers
Kaichao You, Runsheng Bai, Meng Cao et al.
PyTorch 2.x introduces a compiler designed to accelerate deep learning programs. However, for machine learning researchers, fully leveraging the PyTor...
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang
Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gra...
Improving Graph Neural Networks on Multi-node Tasks with the Labeling Trick
Xiyuan Wang, Pan Li, Muhan Zhang
In this paper, we study using graph neural networks (GNNs) for multi-node representation learning, where a representation for a set of more than one n...
Directed Cyclic Graphs for Simultaneous Discovery of Time-Lagged and Instantaneous Causality from Longitudinal Data Using Instrumental Variables
Wei Jin, Yang Ni, Amanda B. Spence et al.
We consider the problem of causal discovery from longitudinal observational data. We develop a novel framework that simultaneously discovers the time-...
Bayesian Sparse Gaussian Mixture Model for Clustering in High Dimensions
Fangzheng Xie, Yanxun Xu, Dapeng Yao
We study the sparse high-dimensional Gaussian mixture model when the number of clusters is allowed to grow with the sample size. A minimax lower bound...
Regularizing Hard Examples Improves Adversarial Robustness
Hyungyu Lee, Saehyung Lee, Ho Bae et al.
Recent studies have validated that pruning hard-to-learn examples from training improves the generalization performance of neural networks (NNs). In t...
Random ReLU Neural Networks as Non-Gaussian Processes
Rahul Parhi, Pakshal Bohra, Ayoub El Biari et al.
We consider a large class of shallow neural networks with randomly initialized parameters and rectified linear unit activation functions. We prove tha...
Riemannian Bilevel Optimization
Jiaxiang Li, Shiqian Ma
In this work, we consider the bilevel optimization problem on Riemannian manifolds. We inspect the calculation of the hypergradient of such problems o...
Supervised Learning with Evolving Tasks and Performance Guarantees
Verónica Álvarez, Santiago Mazuelas, Jose A. Lozano
Multiple supervised learning scenarios are composed by a sequence of classification tasks. For instance, multi-task learning and continual learning ai...
Error estimation and adaptive tuning for unregularized robust M-estimator
Pierre C. Bellec, Takuya Koriyama
We consider unregularized robust M-estimators for linear models under Gaussian design and heavy-tailed noise, in the proportional asymptotics regime w...
From Sparse to Dense Functional Data in High Dimensions: Revisiting Phase Transitions from a Non-Asymptotic Perspective
Xinghao Qiao, Dong Li, Shaojun Guo et al.
Nonparametric estimation of the mean and covariance functions is ubiquitous in functional data analysis and local linear smoothing techniques are most...
Locally Private Causal Inference for Randomized Experiments
Jordan Awan, Yuki Ohnishi
Local differential privacy is a differential privacy paradigm in which individuals first apply a privacy mechanism to their data (often by adding nois...
Estimating Network-Mediated Causal Effects via Principal Components Network Regression
Alex Hayes, Mark M. Fredrickson, Keith Levin
We develop a method to decompose causal effects on a social network into an indirect effect mediated by the network, and a direct effect independent o...
Selective Inference with Distributed Data
Snigdha Panigrahi, Sifan Liu
When data are distributed across multiple sites or machines rather than centralized in one location, researchers face the challenge of extracting mean...
Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization
Michael I. Jordan, Tianyi Lin, Chi Jin
We provide a unified analysis of two-timescale gradient descent ascent (TTGDA) for solving structured nonconvex minimax optimization problems in the f...
An Axiomatic Definition of Hierarchical Clustering
Ery Arias-Castro, Elizabeth Coda
In this paper, we take an axiomatic approach to defining a population hierarchical clustering for piecewise constant densities, and in a similar manne...
Test-Time Training on Video Streams
Renhao Wang, Yu Sun, Arnuv Tandon et al.
Prior work has established Test-Time Training (TTT) as a general framework to further improve a trained model at test time. Before making a prediction...
Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback
Boxin Zhao, Lingxiao Wang, Ziqi Liu et al.
Due to the high cost of communication, federated learning (FL) systems need to sample a subset of clients that are involved in each round of training....
A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation
Hugo Lebeau, Florent Chatelain, Romain Couillet
This work presents a comprehensive understanding of the estimation of a planted low-rank signal from a general spiked tensor model near the computatio...
Memory Gym: Towards Endless Tasks to Benchmark Memory Capabilities of Agents
Marco Pleines, Matthias Pallasch, Frank Zimmer et al.
Memory Gym presents a suite of 2D partially observable environments, namely Mortar Mayhem, Mystery Path, and Searing Spotlights, designed to benchmark...
Enhancing Graph Representation Learning with Localized Topological Features
Zuoyu Yan, Qi Zhao, Ze Ye et al.
Representation learning on graphs is a fundamental problem that can be crucial in various tasks. Graph neural networks, the dominant approach for grap...
Deep Out-of-Distribution Uncertainty Quantification via Weight Entropy Maximization
Antoine de Mathelin, François Deheeger, Mathilde Mougeot et al.
This paper deals with uncertainty quantification and out-of-distribution detection in deep learning using Bayesian and ensemble methods. It proposes a...
DisC2o-HD: Distributed causal inference with covariates shift for analyzing real-world high-dimensional data
Jiayi Tong, Jie Hu, George Hripcsak et al.
High-dimensional healthcare data, such as electronic health records (EHR) data and claims data, present two primary challenges due to the large number...
Bayes Meets Bernstein at the Meta Level: an Analysis of Fast Rates in Meta-Learning with PAC-Bayes
Pierre Alquier, Charles Riou, Badr-Eddine Chérief-Abdellatif
Bernstein's condition is a key assumption that guarantees fast rates in machine learning. For example, under this condition, the Gibbs posterior with ...
Efficiently Escaping Saddle Points in Bilevel Optimization
Shiqian Ma, Minhui Huang, Xuxing Chen et al.
Bilevel optimization is one of the fundamental problems in machine learning and optimization. Recent theoretical developments in bilevel optimization ...