Towards Robust-Deep Learning With Real-World Applications
Abstract:
Deep learning models have achieved tremendous success in recent years but they are known to lack robustness in test environments that differ from their training data distribution. This is a significant bottleneck in the deployment of deep learning models to real-world applications such as healthcare and genomics.
In Part I of this thesis, we consider the problem of accurately detecting mutations in cancer genomes using deep learning, which is traditionally done using hand-engineering statistical models. The lack of large, labeled datasets has been a significant bottleneck to successfully deep learning cancer genomics. Here, we create a large-scale high-quality pseudo-labeled training dataset with balanced representation from multiple cancer types. We further design new input representations of sequencing reads and customized model architectures for cancer mutation detection using matched tumor and normal samples. Our method, VarNet ("Variant Network"), demonstrates robust performance often exceeding current state-of-the-art methods across a range of independent real tumor benchmarks. We further explore methods to maintain the robustness of VarNet in clinical FFPE tumor samples.
In Part II of this thesis, we consider a Fourier perspective of the robustness of deep learning models. Recent work has shown that deep neural networks (DNNs) latch on to the frequency statistics of training data and can suffer significant performance drops when these statistics change at test-time. Understanding and modifying the Fourier-sensitivity of deep learning models would help to improve their generalization ability. Hence, we propose the first principled measure of the Fourier-sensitivity of any differentiable model using the unitary Fourier-transform of its input-gradient. We observe that modern computer vision models are consistently sensitive to particular frequencies dependent on the dataset, training method and architecture. Based on this measure, we further propose a novel Fourier-regularization framework that can arbitrarily modify the frequency bias of models. Next, we consider the challenge of overcoming real-world distribution shifts from a Fourier perspective. Motivated by observations of the sensitivity of DNNs to the frequency statistics of data, we propose Fourier Moment Matching (FMM), a model-agnostic input transformation that matches the Fourier-amplitude statistics of source to target data using unlabeled samples. We demonstrate through extensive empirical evaluations that FMM is effective both individually and when combined with a variety of existing unsupervised domain adaptation methods for multiple real-world applications.
In Part III of this thesis, we consider representation learning for robustness. A fundamental challenge in machine learning is learning representations that enable generalization in distributions different from the training distribution. Empirical Risk Minimization (ERM), which is the predominant learning principle, is known to under-perform in minority sub-populations as well as generalize to unseen test domains. In this work, we propose a novel learning principle called Uniform Risk Minimization (URM), which is based on a balanced risk measure we propose called uniform risk. Rather than considering the risk over the training distribution alone, as in ERM, uniform risk considers the expected risk over all test distributions with the same support as the training distribution. Strikingly, we show that uniform training data distributions and feature spaces are an optimal choice to lower uniform risk and support generalization. Hence, we propose a representation learning method that learns uniformly distributed feature spaces to improve robustness of deep neural networks. We empirically demonstrate the efficacy of our method using benchmarks of sub-population shifts and domain generalization. We show that URM is competitive with the best existing methods that assume knowledge of the specific generalization task and can also be combined with them for improved robustness.
In conclusion, this thesis proposes a variety of methods for robust deep learning across diverse domains such as genomics, computer vision, time-series and audio.