Transferability of Deep Learning Models
Abstract:
Transferability in deep learning models aims to enhance model performance across diverse datasets, domains and tasks. We first examine Out-of-Distribution Generalization (OOD) which requires the model to generalize to out-of-domain data. We investigate the cross-domain generalization error bound and establish a link to robustness and sharpness. Unlike conventional theoretical guarantees, our approach connects robust OOD algorithms to model sharpness, which measures the smoothness of a solution in geometric and optimization contexts. We formally demonstrate that reducing the sharpness of the learned model enhances OOD generalization.
Task adaptation, a cornerstone of transferability, examines models' ability to effectively bridge the gap between a single-task solving system and a more versatile intelligence capable of navigating multifarious challenges. Meta-learning is a promising approach for developing adaptability by acquiring knowledge from a wide range of related tasks. In contrast to traditional methods like Empirical Risk Minimization (ERM), meta-learning seeks a solution across multiple tasks, enabling faster adaptation to new, unseen tasks compared to ERM.
In this thesis, we focus on the difference between meta-learning and ERM in learning multiple tasks, revealing the underlying principle of fast adaptability of meta-learning. We provide a provable analysis for understanding this principle when optimizing a meta-learning objective function.
Then we consider task adaptation in an OOD setting, which is critical in real-world applications such as autonomous driving and healthcare. Unlike in-distribution task adaptation, where training and test tasks share similar distributions, adapting to OOD tasks requires models to generalize effectively despite substantial distributional shifts. We devise a Mixture of Expert (MoE) Energy-based Meta-Learning framework where each expert is tailored to similar OOD tasks, thus significantly bolstering the adaptability of the nearest target tasks to the corresponding expert.
In conclusion, our research studies OOD generalization, task adaptation with meta-learning and task adaptation in an OOD setting, as the three pivotal problems to understanding transferability in deep learning models.