An Alternative Information-Theoretic Criterion for Active Learning
13 Nov 2018 Tuesday, 04:30 PM to 06:00 PM
COM2 Level 4
Executive Classroom, COM2-04-02
Mutual information (MI) is a highly celebrated measure/criterion in the machine learning (ML) community, especially for active learning of ML models. However, when it cannot be evaluated exactly and needs to be approximated, the resulting active learning performance may be compromised.
This thesis presents an alternative information-theoretic criterion called MI+ that, interestingly, can be related to MI in its formulation but, unlike MI, evaluated exactly and efficiently for active learning of several ML models. We provide various rigorous insights and perspectives to relate and differentiate MI vs. MI+ and discuss the practical implications of their differences to motivate the use of MI+ over MI for active learning. The advantage of using MI+ over MI is investigated in 3 different machine learning problems: Bayesian neural networks for regression problems, inverse reinforcement learning (IRL), and structure discovery for Gaussian processes (GP).
In the Bayesian neural networks for regression problems, our empirical results show that MI+ can achieve the performance of a simple uncertainty sampling criterion which outperforms that of the approximated MI. In the inverse reinforcement learning, where there is only a single work on active learning using the Bayesian IRL and the mean entropy (ME) criterion, our contributions include: (1) devising a Bayesian approach for nonlinear IRL with Gaussian processes (BGPIRL), that is naturally amenable to active learning, and is flexible to model nonlinear reward functions; (2) defining a general active learning problem for IRL that caters to varying realistic experts??? preferences such as batch queries and demonstrated trajectories; (3) identifying the disadvantages of the current ME criterion in both computation and interpretation; then (4) introducing both MI, and MI+ criteria to active learning with BGPIRL, in which MI+ can leverage conditional independence to achieve exact and efficient evaluation. In the structure discovery for Gaussian processes, MI+???s exact and efficient evaluation facilitates a gradient-based optimization for the batch mode active learning that outperforms the approximate MI as the batch size increases.
Through the above diverse applications of MI+, together with the theoretical results, we hope that they serve as a guideline for the use of MI+ in other ML problems, and possibly, inspire the design of new active learning criteria.