PH.D DEFENCE - PUBLIC SEMINAR

Dynamic Neural Architectures for Improved Inference

Speaker

Mr Cai Shaofeng

Advisor

Dr Ooi Beng Chin, Lee Kong Chian Centennial Professor, School of Computing

08 Nov 2021 Monday, 10:00 AM to 11:30 AM

Zoom presentation

Abstract:

Deep neural networks (DNNs) have achieved super-human performance in many unstructured data types, e.g., images, text, audio, and so on. However, besides effectiveness, the deployment of DNNs in real-world applications has also highlighted the need for efficiency and interpretability. The conventional wisdom in improving inference efficiency without degrading prediction performance is to design more efficient architectures directly or reduce the model size via model compression. However, these methods typically optimize efficiency for the entire dataset as a whole, and the resultant architectures remain static during inference, thereby missing the opportunities to further improve efficiency in an input-dependent manner. Besides efficiency, the model capacity of the resultant architectures, e.g., after pruning certain model components, is permanently reduced, which generally results in worse prediction performance.

Dynamic neural architecture is an emerging research area that attracts increasingly more attention in recent years. Dynamic architectures customize their architectures or adapt model weights conditioned on the input at runtime to improve model inference. The architecture customization is typically achieved by selectively bypassing certain model components layer-wise on the fly given the input, which can achieve much higher inference efficiency with comparable prediction performance.

In this thesis, we will first formulate dynamic neural architecture in a unified framework. Specifically, dynamic neural architecture can be formulated as a sequential decision process, where a separate hypernetwork is introduced to dynamically determine gating variables that regulate the participation of corresponding architecture components layer-wise. To improve inference efficiency, the gating variables are typically sparsified to save computation by deactivating corresponding architecture components at different structural levels, namely layers, branches, channels, neurons, and weights.

We will identify the challenges and limitations of existing architectures and approaches in terms of model inference. Then we will propose general and novel dynamic architecture techniques, in particular, dynamic routing, model slicing and adaptive relation modeling, for improved inference. The first technique dynamic routing is a dynamic path technique to improve inference efficiency by selectively routing inputs to only necessary transformation branches in a cell-based backbone network. The second technique model slicing is a controllable dynamic width technique to enable neural networks to support budgeted inference, namely producing prediction within a given computational budget dynamically at runtime. The third technique adaptive relation modeling is a dynamic weight technique to improve inference effectiveness and interpretability for structured data, which is less well supported by conventional DNNs. With these techniques, DNNs will be able to support more efficient, effective and interpretable model inference.