PH.D DEFENCE - PUBLIC SEMINAR

Automated Kernel Selection for Gaussian Processes on Large Datasets

Speaker

Ms Teng Tong

Advisor

Dr Low Kian Hsiang, Associate Professor, School of Computing

28 Sep 2021 Tuesday, 02:00 PM to 03:30 PM

Zoom presentation

Abstract:

In the machine learning community, how to choose an appropriate model to capture the underlying pattern of a dataset has always been of great concern. In particular, kernel selection is an important sub-domain of model selection for kernel-based models such as Gaussian process (GP). However, the complex data patterns and the vast space of candidate kernels render manually selecting an appropriate kernel infeasible. To this end, automated kernel selection (AKS) algorithms have been developed over the past few years. In spite of their appealing performance in discovering better kernels than manually designed ones, selecting an appropriate kernel for large (e.g., million-sized) datasets is still challenging. Moreover, the kernel uncertainty referring to the probability of a kernel being the most appropriate one for a certain dataset was barely considered, which may result in overconfident predictions. This thesis aims at developing scalable and efficient AKS algorithms that consider the uncertainty in kernel selection for GP models, and improving the predictive performance with a limited time budget.

As an initial step, a scalable variational Bayesian kernel selection (VBKS) algorithm is developed to take into account the uncertainty in kernel selection for large datasets. Instead of selecting a single kernel and ignoring the uncertainty, VBKS considers the kernel as a random variable and learns its posterior belief from data with variational inference (VI). Consequently, with Bayesian model averaging (BMA), we take advantage of using multiple kernels for better predictive performance instead of relying on a single kernel. VBKS requires jointly optimizing a large number of hyperparameters of multiple kernels, which is computationally challenging. To address this challenge, the objective function is decomposed into an additive form depending on individual kernels. Stochastic gradient method is then applied to scale the algorithm up to large datasets.

Evaluating a kernel on large datasets still takes a long time even with the stochastic gradient method. To further improve time efficiency, early pruning for AKS (EP-AKS) is designed. Starting with training a large set of GP models specified by different kernels, this algorithm is able to terminate the training of under-performing models in an early stage of training. To trade off between pruning effectiveness and time efficiency, the partially completed learning curves during the iterative training procedure are exploited to make prediction of the kernel performance upon convergence. As the kernels are inherently correlated, a multi-output Gaussian process (MOGP) model is adopted to simultaneously model the learning curves of multiple kernels and similarities among kernels. The EP-AKS algorithm significantly reduces the time incurred while preserving comparable predictive performance on test data to that of not applying pruning.

To explore the vast kernel space more efficiently, we designed the F-Vec algorithm for figuring out potentially good kernels at a low cost. Different from the existing algorithm based on the approximated distances among kernels for the exploration, F-Vec transfers the kernels originally in functional forms into feature vectors. When making comparison, we investigate various metrics for approximating kernel distance besides the one used in the literature. We empirically show that the F-Vec method leads to a more efficient exploration than the algorithm based on the approximated distances using any of the metrics investigated by us.