Explaining and improving Deep Neural Networks via concept-based explanations
Dr Lee Mong Li, Professor, School of Computing
Abstract :
Despite the outstanding performance, Deep Neural Networks (DNNs) remain as black boxes to end-users. This opaque nature impedes user trust in DNNs and limits their adaption in the real world, especially in safety-critical domains such as healthcare. Therefore, elucidating the decisions made by DNNs in a way that is readily understandable to all users, including non-experts, is crucial to secure public trust. Concept-based explanations are intuitive as they explain model decisions in terms of representations easily understandable to humans, such as super-pixels or word phrases. In addition, the explanations must be descriptive and faithfully explain why a model makes its decisions. In this thesis, we explore two approaches to generate such explanations to rationalize the decisions of DNNs in computer vision, especially Convolutional Neural Networks (CNNs) and investigate using the insights gained via these explanations to improve the model accuracy.
The first approach is to generate post-hoc linguistic explanations to rationalize a trained CNN's decisions because linguistic explanations are intuitive and descriptive. However, generating linguistic explanations that describe the features that truly contributed to the model decision is challenging. We propose a novel framework called FLEX (Faithful Linguistic EXplanations) to address this challenge. We derive a novel way to associate the features responsible for the decision to words. We also propose a new decision-relevance metric that measures the faithfulness of a linguistic explanation to the model's reasoning. Experiment results on two benchmark datasets demonstrate that the proposed framework can generate faithful explanations compared to state-of-the-art linguistic explanation methods.
As the second approach, we explore developing a CNN with intrinsic interpretability. We introduce Comprehensible CNN (CCNN), which learns features consistent with human perception by learning the correspondence between visual features and word phrases. CCNN decisions are calculated as a linear combination of these features allowing CCNN to explain its decisions faithfully in word phrases. The proposed model employs an objective function that optimizes both the prediction accuracy and the semantics of the learned features. Experiment results demonstrate that CCNN can learn concepts consistent with human perception and their corresponding contributions to the model decision without compromising accuracy.
Faithfully explaining DNNs allows us to use such explanations to understand possible shortcomings of the dataset used to train the model, particularly the under-represented regions in the dataset. In the third work, we propose a framework that utilizes concept-based explanations to automatically augment the dataset with new images that can cover these under-represented regions to improve the model performance. The framework is able to use the explanations generated by both interpretable classifiers and post-hoc explanation methods for black-box classifiers. Experiment results demonstrate that the proposed approach improves the accuracy of classifiers compared to state-of-the-art augmentation strategies.
This thesis explores understanding DNNs via faithful human-understandable explanations and using insights gained via such explanations to improve classification accuracy. Improved understanding and performance will bolster public confidence in Artificial Intelligence (AI), leading to the increased adaption of AI systems in the real world.