Context-Dependent Deep Learning for Affective Computing
 
      Abstract:
Affective Computing is an interdisciplinary field that deals with enabling machines to detect emotions, respond intelligently based on recognized emotions, and express emotions. While all these capabilities are equally important, in this thesis we plan to explore the first step of detecting emotions. Recognizing emotions has been done using different modalities such as images (faces, body gestures, etc), text (chats, tweets, etc), speech (conversations, etc), and physiological signals (GSR, PPG, etc) and have been utilized in wide-ranging domains such as healthcare, customer service, and digital assistants. While current models which are trained on large amounts of data can give a significant improvement in performance, over-reliance on these deep-learning models on the input data can have negative consequences, which include but are not limited to confusion between closely-related emotion classes, difficulty generalizing to other domains or under-represented groups in the data and, suppressing intra-class variations. We hypothesize that incorporating additional context can provide additional insights regarding the task/input to these models and thereby help improve their robustness against these challenging conditions. 
The first problem at focus is fine-grained emotion recognition. The majority of the current emotion recognition systems focus on recognizing only a small set of six to eight emotions. However, in reality, we humans recognize and express a broad spectrum of emotions.  Performing fine-grained classification involves finding subtle differences between close classes which are semantically similar, making it a challenging problem for deep-learning models. In this thesis, we aim to develop effective techniques for pre-trained language models to better perform this challenging task. In this regard, we first introduce how to use external knowledge as context. We introduce Knowledge-Embedded Attention (KEA) which uses knowledge from emotion lexicons to augment the contextual representations from pre-trained language models via attention. KEA is found to be effective in recognizing fine-grained emotion across several datasets, however, it is dependent on the type of knowledge being used. Therefore, in our next work in fine-grained classification, we look at developing a more generalized approach, where we guide the model’s learning strategy toward distinguishing these confusable labels. We developed Label-aware Contrastive Loss (LCL) which adaptively embeds inter-class relationships into a contrastive objective function to weigh the closely confusable classes such as angry and furious more than far-apart classes such as angry and sad. Both these proposed approaches for fine-grained classification were evaluated against representative baselines using multiple datasets.
For the next part of the thesis, we explore fairness in emotion recognition models. Machine learning models automatically learn discriminative features from the data, and are therefore susceptible to learning strongly-correlated biases, such as using protected attributes like gender and race. In this thesis, we propose to mitigate bias by explicitly guiding the model's focus toward task-relevant features using domain knowledge, and we hypothesize that this can indirectly reduce the dependence of the model on spurious correlations it learns from the data. We explore bias mitigation in facial expression recognition systems using facial Action Units (AUs) as the task-relevant feature. We introduce Feature-based Positive Matching Contrastive Loss which learns the distances between the positives of a sample based on the similarity between their corresponding AU embeddings. We compare our approach with representative baselines and show that incorporating task-relevant features via our method can improve model fairness at minimal cost to classification performance.
In summary, we find that augmenting the standard pre-trained models with relevant context helps improve their downstream task performance in two important tasks in the field of Affective Computing. Finally, we discuss the limitations and future directions of this research.

