PH.D DEFENCE - PUBLIC SEMINAR

Automated Machine Learning: New Advances on Bayesian Optimization

Speaker
Mr Dmitrii Kharkovskii
Advisor
Dr Low Kian Hsiang, Associate Professor, School of Computing


01 Dec 2020 Tuesday, 09:00 AM to 10:30 AM

Zoom presentation

Join Zoom Meeting
https://nus-sg.zoom.us/j/89874082051?pwd=MkpDcnFQOHhuZDQxZW81MmJSRU9wZz09

Meeting ID: 898 7408 2051
Password: 944052

Abstract:

Recent advances in Bayesian optimization (BO) have delivered a promising suite of tools for optimizing an unknown expensive to evaluate black-box objective function with a finite budget of evaluations. A significant advantage of BO is its general formulation: BO can be utilized to optimize any black-box objective function. As a result, BO has been applied in a wide range of applications such as automated machine learning, robotics or environmental monitoring, among others. Furthermore, its general formulation makes BO attractive for deployment in new applications. However, potential new applications can have additional requirements not satisfied by the classical BO setting. In this thesis, we aim to address some of these requirements in order to scale up BO technology for the practical use in new real-world applications.

Firstly, this thesis tackles the problem of data privacy, which is not addressed by the standard setting of BO. Specifically, we consider the outsourced setting where the entity holding the dataset and the entity performing BO are represented by different parties, and the dataset cannot be released non-privately. For example, a hospital holds a dataset of sensitive medical records and outsources the BO task on this dataset to an industrial AI company. We present the private-outsourced-Gaussian process-upper confidence bound (PO-GP-UCB) algorithm, which is the first algorithm for privacy-preserving BO in the outsourced setting with a provable performance guarantee. The key idea of our approach is to make the BO performance of our algorithm similar to that of non-private GP-UCB run using the original dataset, which is achieved by using a random projection-based transformation that preserves both privacy and the pairwise distances between inputs. Our main theoretical contribution is to show that a regret bound similar to that of the standard GP-UCB algorithm can be established for our PO-GP-UCB algorithm. We empirically evaluate the performance of our algorithm with synthetic and real-world datasets.

Secondly, we consider applications of BO for hotspot sampling in spatially varying phenomena. For such applications, we exploit the structure of the spatially varying phenomenon in order to increase the BO lookahead and, as a result, improve the performance of the algorithm and make it more suitable for practical use in real-world scenarios. To do this, we present a principled multi-staged Bayesian sequential decision algorithm for nonmyopic adaptive BO that, in particular, exploits macro-actions for scaling up to a further lookahead to match up to a larger available budget. To achieve this, we first generalize GP-UCB to a new acquisition function defined with respect to a nonmyopic adaptive macro-action policy, which, unfortunately, is intractable to be optimized exactly due to an uncountable set of candidate outputs. The key novel contribution of our work here is to show that it is in fact possible to solve for a nonmyopic adaptive epsilon-Bayes-optimal macro-action BO (epsilon-Macro-BO) policy given an arbitrary user-specified loss bound epsilon via stochastic sampling in each planning stage which requires only a polynomial number of samples in the length of macro-actions. To perform nonmyopic adaptive BO in real time, we then propose an asymptotically optimal anytime variant of our epsilon-Macro-BO algorithm with a performance guarantee. Empirical evaluation on synthetic and real-world datasets shows that our proposed approach outperforms existing state-of-the-art algorithms.

Finally, this thesis proposes a black-box attack for adversarial machine learning based on BO. Since the dimension of the inputs in adversarial learning is usually too high for applying BO directly, our proposed attack applies dimensionality reduction and searches for an adversarial perturbation in a low-dimensional latent space. The key idea of our approach is to automate both the selection of the latent space dimension and the search of the adversarial perturbation in the selected latent space by using BO. Additionally, we use Bayesian optimal stopping to boost the query efficiency of our attack. Performance evaluation using image classification datasets shows that our proposed method outperforms the state-of-the-art black-box adversarial attacks.