Applications of Domain Divergences for Domain Adaptation In NLP

Mr. Abhinav Ramesh Kashyap
Dr Kan Min Yen, Associate Professor, School of Computing

23 Mar 2023 Thursday, 10:00 AM to 11:30 AM

MR20, COM3-02-59


Machine learning models that work under different conditions are important for their deployment. The absence of robustness to varying conditions has adverse effects in the real world: increasing cases of autonomous cars crashing, making adverse legal decisions against minorities, effective for only major languages in the world. The lack of robustness arises because machine learning models trained under certain input distributions are not guaranteed to work under a different distribution. An important aspect of making them work under different distributions is to measure how different these two distributions are. We can use the mathematical tool of domain divergence to quantify the difference. Understanding and applying divergence measures in novel ways is an important avenue for making machine learning models useful.

In this thesis, we take the journey to explore the different applications of divergence measures, with a special interest in adapting NLP models to new inputs that arise naturally. We first identify the different divergence measures that are used within Natural Language Processing (NLP) and provide a taxonomy. Further, we identify applications of divergences and make contributions along them: 1) Making Decisions in the Wild - help practitioners predict the performance drop of a model under a new distribution 2) Learning Representations - aligning source and target domain representations for novel applications 3) Inspecting Model Internals - understand the inherent robustness of models under new distributions.

For the first application, we performed a large-scale correlational study of different divergence measures with a drop in model performance. En-route we compare whether divergence measures based on traditional word-level distributions are more reliable than those based on contextual word representations from pretrained language models. Based on our study, we make appropriate recommendations for divergence measures that best predict performance drop.

In the second application, we employ machine learning models that reduce divergence between two domains to enable the generation of sentences between domains. Further, we make enhancements to the model to produce sentences that satisfy certain linguistic constraints with downstream applications to domain adaptation in Natural Language Processing.
In the third application, we apply divergence measures to inspect the internals of PLMs. PLMs are more robust due to the number of parameters and the large-scale data on which they are trained. We use divergence measures to understand the robustness of PLM representations for different domains.

As a final part of the thesis, we propose a method along the second application to apply divergence methods in a parameter-efficient manner for domain adaptation in NLP. Our method follows a two-step process of first extracting domain-invariant representation by reducing divergence measures between two domains and then reducing task-specific loss on labelled data in the source domain.

In summary, this thesis contributes to the application of domain divergences in novel ways to improve domain adaptation in NLP. We conclude with the limitations of our work and future avenues for research.

I am a fifth year PhD student at that National University of Singapore. Recently. I recently joined as a AI Scientist at ASUS-AICS in Singapore. My research is in natural language processing.

I am interested in making NLP models robust under different domains, also called Domain Adaptation. I am interested in using unsupervised data to enable domain adaptation, especially in scenarios when there is little data available (low resource). As part of my bigger vision to enable practical methods for NLP, recently I have been interested in methods that can achieve domain adaptation in inexpensive ways - using modular parameter-efficient methods such as adapters. Also, I am interested in making machine learning and NLP work in critical domains such as Clinician notes and Electronic Health Records.