Data-Efficient Machine Learning with Multiple Output Types and High Input Dimensions
Dr Kankanhalli, Mohan, Provost'S Chair Professor, School of Computing
20 Nov 2017 Monday, 01:00 PM to 02:30 PM
COM2 Level 4
Executive Classroom, COM2-04-02
Recent research works in machine learning (ML) have focused on learning some target variables of interest to achieve competitive (or state-of-the-art) predictive performance in less time but without requiring large quantities of data, which is known as data-efficient ML. This thesis focuses on two important data-efficient ML approaches: active learning (AL) and Bayesian optimization (BO) which, instead of learning passively from a given small set of data, need to select and gather the most informative observations for learning the target variables of interest more accurately given some budget constraints. To advance the state-of-the-art of data-efficient ML, novel generalizations of AL and BO algorithms are proposed in this thesis for addressing the issues arising from multiple output types and high input dimensions which are the practical settings in many real-world applications.
In particular, this thesis aims to (a) exploit the auxiliary types of outputs which usually coexist and correlate well with the target output types, and more importantly, are less noisy and/or less tedious to sample for improving the learning performance of the target output type in both AL and BO algorithms and (b) scale up the state-of-the-art BO algorithm to high input dimensions. To achieve this, the specific data with multiple output types or high input dimensions is represented using some form of Gaussian process (GP)-based probabilistic regression models which allow the predictive uncertainty of the outputs to be formally quantified and consequently exploited for developing efficient AL and BO algorithms.
To achieve above objectives, an AL algorithm of multi-output GP (MOGP) is first developed for minimizing the predictive uncertainty (i.e., posterior joint entropy) of the target output type. In contrast to existing works, our AL problems involve selecting not just the most informative sampling inputs to be observed but also the types of outputs at each selected input for improving the learning performance of only the target output type given a sampling budget. Unfortunately, such an entropy criterion scales poorly in the numbers of candidate sampling inputs and selected observations when optimized. To resolve this issue, we exploit a structure common to sparse MOGP models for deriving a novel AL criterion. Furthermore, we exploit a relaxed form of submodularity property of our new criterion for devising a polynomial-time approximation algorithm that guarantees a constant-factor approximation of that achieved by the optimal set of selected observations. Empirical evaluation on real-world datasets shows that our proposed approach outperforms existing algorithms for AL of MOGP and single-output GP models.
Secondly, to boost the BO performance by exploiting the cheaper and/or less noisy observations of some auxiliary functions with varying fidelities, we proposed a novel generalization of predictive entropy search (PES) for multi-fidelity BO called multi-fidelity PES (MF-PES). In contrast to existing multi-fidelity BO algorithms, our proposed MF-PES algorithm can naturally trade off between exploitation vs. exploration over the target and auxiliary functions with varying fidelities without needing to manually tune any such parameters. To achieve this, we first model the unknown target and auxiliary functions jointly as a convolved MOGP (CMOGP) whose convolutional structure is then exploited for deriving an efficient approximation of MF-PES. Empirical evaluation on synthetic and real-world experiments shows that MF-PES outperforms the state-of-the-art multi-fidelity BO algorithms.
Lastly, to improve the BO performance in real-world applications with high input dimensions (e.g., computer vision, biology), we generalize PES for high-dimensional BO by exploiting an additive structure of the target function. New practical constraints are proposed and approximated efficiently such that the proposed acquisition function of additive PES (add-PES) can be optimized independently for each local and low-dimensional input component. The empirical results show that our add-PES considerably improves the performance of the state-of-the-art high-dimensional BO algorithms by using a simple and common setting for optimizing different tested functions with varying input dimensions, which makes it a superior alternative to existing high-dimensional BO algorithms.