PH.D DEFENCE - PUBLIC SEMINAR

Understanding and Improving Neural Architecture Search

Speaker

Mr Shu Yao

Advisor

Dr Low Kian Hsiang, Associate Professor, School of Computing

25 Aug 2022 Thursday, 02:00 PM to 03:30 PM

Zoom presentation

Abstract:

Over the past decade, various deep neural network (DNN) architectures have been devised and have achieved superhuman performance for a wide range of tasks. Designing these neural networks, however, incurs substantial efforts from domain experts by trial and error. These human efforts gradually become unaffordable with an increasing demand for customizing DNNs for different tasks. To this end, neural architecture search (NAS) has been widely applied to automate the design of neural networks in recent years. A number of NAS algorithms have been proposed to improve the search efficiency or the search effectiveness of NAS, i.e., to reduce the search cost and improve the generalization performance of the selected architectures, respectively. Despite these advances, there are still certain essential aspects of NAS that have not been well investigated in the literature, which however may help us to understand and even further improve existing NAS algorithms.

Firstly, only a few efforts have been devoted to understanding the neural architectures selected by popular NAS algorithms in the literature. In the first work of this thesis, we take the first step of understanding popular NAS algorithms by answering the following questions: What types of architectures are selected by popular NAS algorithms, and why are they selected? In particular, we reveal that existing NAS algorithms (e.g., DARTS, ENAS) tend to favor architectures with wide and shallow cell structures. These favorable architectures consistently achieve fast convergence and are consequently selected by NAS algorithms. Our empirical and theoretical studies then further suggest that their fast convergence derives from their smooth loss landscape and accurate gradient information. Nonetheless, these architectures may not necessarily lead to better generalization performance than other candidate architectures in the same search space, and therefore further improvement is possible by revising existing NAS algorithms in the future.

Secondly, standard NAS algorithms typically aim to select only a single neural architecture from the search spaces and thus have overlooked the capability of other candidate architectures in helping improve the performance of their final selected architecture. To this end, in the second work of this thesis, we present two novel sampling algorithms under our Neural Ensemble Search via Bayesian Sampling (NESBS) framework that can effectively and efficiently select a well-performing ensemble of neural architectures from the NAS search space. Compared with state-of-the-art NAS algorithms and other well-known ensemble search baselines, our NESBS algorithms are shown to be able to achieve improved performance in both classification and adversarial defense tasks on various benchmark datasets while incurring a comparable search cost to these NAS algorithms.

Thirdly, the search efficiency of popular NAS algorithms in the literature is severely limited by the need for model training during the search process. To overcome this limitation, we propose a novel NAS algorithm called NAS at Initialization (NASI) that exploits the capability of Neural Tangent Kernel (NTK) in being able to characterize the converged performance of candidate architectures at initialization, hence allowing model training to be completely avoided to boost the search efficiency. Besides the improved search efficiency, NASI also achieves competitive search effectiveness on various datasets like CIFAR-10/100 and ImageNet. Further, NASI can guarantee the benefits of being label- and data-agnostic under mild conditions, i.e., the provable transferability of architectures selected by our NASI over different datasets.

Finally, though recent NAS algorithms using training-free metrics are able to select well-performing architectures in practice, the reason why training-free NAS using these metrics performs well still has not been fully understood. To this end, in the last work of this thesis, we provide unified theoretical analyses for gradient-based training-free NAS to understand why it performs well in practice. Based on these theoretical understandings, we then develop a novel NAS framework called Hybrid Neural Architecture Search (HNAS) that is able to enjoy the advantages of both training-free (i.e., the superior search efficiency) and training-based NAS (i.e., the remarkable search effectiveness), allowing gradient-based training-free NAS to be further improved.