PH.D DEFENCE - PUBLIC SEMINAR

Bandit-style algorithms for wireless network selection

Speaker

Ms Anuja Meetoo Appavoo

Advisor

Dr Seth Lewis Gilbert, Professor, School of Computing
Dr Tan Kian Lee, Tan Sri Runme Shaw Senior Professor, School of Computing

29 Jul 2020 Wednesday, 02:00 PM to 03:30 PM

Zoom presentation

Join Zoom Meeting
https://nus-sg.zoom.us/j/99888177292?pwd=emxUYU92U3diTWVJekdPS2ozdDgvQT09

Meeting ID: 998 8817 7292
Password: 657984

Abstract:

Multi-armed bandit algorithms are well-studied for decision-making in settings with limited information. EXP3 is a leading bandit algorithm for adversarial bandit setting with impressive theoretical properties. It is fully decentralized, regret-minimizing, and known to converge to a Nash equilibrium when applied in a multi-player setting. These suggest that it should be an excellent solution for distributed resource selection problems. However, its real-world performance lags far behind. It incurs high switching cost, is slow to "stabilize", fails to deal with transient behaviors, and does not provide good guarantees for periodic behaviors in the environment.

In this thesis, we develop bandit-style algorithms with significantly better practical performance, without compromising on good theoretical properties of existing algorithms. We demonstrate that multi-armed bandit algorithms can play an important role in solving distributed resource selection problems, when practical concerns are addressed. As a motivating example, we focus on the wireless network selection problem. Mobile devices often have several wireless networks at their disposal. While choosing the right one is vital for good performance, it is non-trivial. The fact that the environment is dynamic makes it more challenging.

First, we propose Smart EXP3, a fully decentralized bandit-style algorithm, with good theoretical and practical performance. We prove that it is regret-minimizing and retains the convergence property of EXP3, and give an upper bound on the number of network switches. We rely on simulation using synthetic data, trace-driven simulation, controlled experiments and in-the-wild experiments to evaluate it for network selection. We show that it significantly outperforms EXP3 in practice and allows devices to effectively learn and adapt in dynamic environments. In real-world experiments, it can achieve 18% faster download than a naive greedy solution.

Second, we develop Co-Bandit, an algorithm that allows devices to occasionally share their observations and forward feedback received from neighbors; hence, feedback may be received with a delay. Devices perform network selection based on their own observations and feedback from neighbors. Cooperation help them speed up each other's rate of learning. We prove that Co-Bandit is regret-minimizing and converges to a Nash equilibrium. Through simulation, we show that a very small amount of information, even with a delay, is adequate to nudge other devices to select the right network and yield significantly faster stabilization at an optimal state.

Third, we consider algorithms for periodic bandit problems. These problems can be reduced to the problem of bandits with expert advice and solved using EXP4. But, EXP4 has high running time and memory cost. Periodic EXP4 is its computationally efficient variant. With a collaborator, we applied it to network selection, assuming repetitive network behaviors, and observed that it incurs high switching cost and is slow to learn the pattern. We prove that it converges to a Nash equilibrium. We develop Smart Periodic EXP4 and demonstrate, via simulation for network selection, that it learns the pattern relatively faster, with reduced switching.

Overall, we contribute a set of features that can significantly improve the practical performance of bandit algorithms, without affecting their theoretical properties.