Learning to Take Decisions under Incomplete Information: From Bandits to Markov Decision Processes

Mr Debabrota Basu
Dr Bressan, Stephane, Associate Professor, School of Computing

  13 Jun 2018 Wednesday, 10:00 AM to 11:30 AM

 Executive Classroom, COM2-04-02


Intelligence is the ability of accumulating information, processing these information in the form of general constructions, and learning these constructions to adapt to the environment. The components of intelligence- information accumulation, processing and learning, lead to efficient and effective decision making. While imbibing these components in a reinforcement learning algorithm, the reward function or the underlying dynamics of the decision making process is not often known a priori. Thus, the problems of learning by exploration, optimising the decision by exploitation, and balancing exploration-exploitaition take the central stage in designing a reinforcement learning algorithm.

We address these issues mathematically, algorithmically, and experimentally through various problem models, solution methodologies, and real-life applications respectively. We investigate multi-armed bandits, and Markov decision processes as the problem models. We use online functional approximation and optimisation, and information geometry as the solution methodologies. We apply the developed methodologies to address the real-life problems such as automated database tuning, energy- and performance-efficient live virtual machine migration in Clouds, speed optimisation of ships, and online scheduling of jobs arriving in queues to servers.

We are now extending the information geometric framework that addresses information accumulation, processing, and learning, for Markov decision processes.