PH.D DEFENCE - PUBLIC SEMINAR

Safety and Reliability in Bayesian Optimization: Algorithms and Regret Guarantees

Speaker

Mr. Arpan Losalka

Advisor

Dr Jonathan Mark Scarlett, Associate Professor, School of Computing

21 Nov 2024 Thursday, 09:30 AM to 11:00 AM

MR20, COM3-02-59

Abstract:

In this work, we study the issues of safety and reliability in the context of Bayesian optimization, a principled and powerful approach for sequential optimization of unknown and expensive-to-evaluate functions. We first consider the problem of sequential maximization of an unknown function $f$ over a set of actions of the form $(s, x)$, while ensuring that every sampled point has a function value below a given safety threshold $h$. We model $f$ using kernel-based and Gaussian process methods, while differing from previous works in our assumption that $f$ is monotonically increasing with respect to $s$ (the safety variable). This assumption is motivated by various practical applications, such as adaptive clinical trial design and robotics. We propose an algorithm, M-SAFEUCB, and show that M-SAFEUCB enjoys theoretical guarantees in terms of safety, a suitably defined regret notion, and approximately finding the entire safe boundary.

Next, we generalize the problem setup to consider an unknown safety function $g$, which may differ from the objective function $f$. The goal of attaining low cumulative regret in this setup, although crucial in effective decision making in practice, has remained largely unexplored. We show that if $g$ is monotone with respect to just the single variable $s$, sublinear regret becomes achievable with our proposed algorithm, M-SAFEOPT, for various goals of practical relevance. We also present a research plan to study modifications of the M-SAFEOPT algorithm, in order to scale to high dimensions under our monotonicity assumptions.

Lastly, we consider the issue of robustness in a stochastic linear bandit problem, a special case of the Bayesian optimization problem setup. Specifically, we assume that the observed function values are not only subject to random noise, but also adversarial corruption with a limited corruption budget $C$. We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. Both variants attain near-optimal regret in the non-corrupted case $C= 0$, while incurring additional additive terms respectively having a linear and quadratic dependency on $C$ in general. Further, in the contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term. We also present extensive experimental evaluation with various adversarial attacks to support our theoretical results.