PH.D DEFENCE - PUBLIC SEMINAR

Towards Human-Centric AI: Inverse Reinforcement Learning Meets Algorithmic Fairness

Speaker

Mr. Sreejith Balakrishnan

Advisor

Dr Low Kian Hsiang, Associate Professor, School of Computing
Dr Harold Soh Soon Hong, Assistant Professor, School of Computing

04 Jul 2023 Tuesday, 04:00 PM to 05:30 PM

Abstract:

AI decision-making systems have been widely implemented in many real-world domains where their outcomes substantially impact society. This has led to a growing emphasis on Human-centric AI, a perspective that AI should be designed with the goals and ethical principles of the end user in mind. This thesis focuses on two critical aspects of human-centric AI - value alignment and fairness. Value alignment ensures that the goals and values of the AI systems align with those of the human stakeholders. Fairness in algorithmic decisions helps prevent bias and discrimination from propagating in society.

Inverse reinforcement learning (IRL) offers a potential solution to the challenge of value alignment in Reinforcement Learning. Despite recent advancements, IRL suffers from an unidentifiability problem - multiple reward functions can lead to the observed expert behaviour, and the actual reward function is not identifiable without additional domain knowledge or supplementary information. To address this challenge, we adopt the perspective that an IRL algorithm should return a characterization of the reward function space instead of a single solution. The first work presented in our thesis proposes an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions that are consistent with the expert demonstrations by efficiently exploring the reward function space. BO-IRL achieves this by utilizing Bayesian optimization along with our newly proposed kernel that (a) projects the parameters of policy-invariant reward functions to a single point in a latent space and (b) ensures nearby points in the latent space correspond to reward functions yielding similar likelihoods. This projection allows the use of standard stationary kernels in the latent space to capture the correlations present across the reward function space. Empirical results on synthetic and real-world environments (model-free and model-based) show that BO-IRL discovers multiple reward functions while minimizing the number of expensive exact policy optimizations.

In our second work, we explore algorithmic fairness, which focuses on incorporating fairness in AI-generated decisions. Our work is motivated by the realization that there is no universal definition of fairness. Furthermore, most stakeholders and policy-makers often disagree on the suitability of a fairness principle to a given scenario and are unable to anticipate the outcomes under a given fairness principle. To this end, we propose SCALES, a general framework that translates various fairness principles into optimal decisions by representing them as elements of SCALES-CMDP, a new variant of the Constrained Markov Decision Process (CMDP). With the help of causal language, our framework can place constraints on both the procedure of decision-making (procedural fairness) as well as the outcomes resulting from decisions (outcome fairness). Specifically, we show that fairness principles can be translated to a combination of a utility component, a non-causal component, or a causal component which can, in turn, be mapped to a SCALES-CMDP. SCALES is a powerful impact analysis tool that provides useful insights into decisions made under various fairness principles. We illustrate SCALES using a set of case studies involving a simulated healthcare scenario and the real-world COMPAS dataset. Our experiments in single-step and sequential decision-making scenarios demonstrate that our framework produces fair policies that abide by constraints under various fairness principles.

Our final work unifies the concepts of value alignment and fairness by extending the IRL problem to scenarios where the expert agent follows one of K given fairness principles hidden from the learner. The unidentifiability problem is exacerbated in this setting due to the presence of multiple valid combinations of reward functions and fairness principles that can make the expert's trajectories optimal. To tackle this challenge, we propose FAIR-BOIRL, a new BO-based IRL algorithm inspired by BO-IRL that sequentially searches for valid solutions across both the reward function space and fairness principles. To efficiently allocate the evaluation budget, FAIR-BOIRL uses IM-GPTS, a novel acquisition function that prioritizes fairness principles that are more likely to explain the observed behaviour while disregarding those that are less promising. IM-GPTS uses an implicit multi-arm bandit strategy to identify the most promising reward function and the corresponding fairness principle at each iteration of FAIR-BOIRL. IM-GPTS utilizes the popular Gaussian Process Thompson Sampling (GP-TS) and naturally trades-off exploration and exploitation across both the reward function space and fairness principles. Our experimental results indicate a significant improvement in sample efficiency under FAIR-BOIRL, with the acquisition function allocating most of the evaluation budgets to fairness principles that are more likely to generate the given expert trajectories.

Towards Human-Centric AI: Inverse Reinforcement Learning Meets Algorithmic Fairness

COM2 Level 2