Robustness and Uncertainty Estimation for Deep Neural Networks
Dr Lee Mong Li, Professor, School of Computing
Join Zoom Meeting
https://nus-sg.zoom.us/j/88316110840?pwd=dTdpUUNOMGp5eThwOTE4Qm54T1g1UT09
Meeting ID: 883 1611 0840
Password: 624623
Abstract:
Deep neural network (DNN) models have achieved impeccable success for various machine learning tasks. However, due to their vulnerability towards adversarial attacks and over-confident predictions for any inputs, these models cannot be trusted for sensitive real-world applications. In this thesis, we focus on improving the robustness against adversarial attacks and predictive uncertainty estimation of the existing DNN-based classification models.
An attacker produces the adversarial images by deliberately injecting minor perturbations to mislead the DNN models without perceptually changing the original inputs. Among existing defense approaches, adversarial training-based frameworks can provide robustness for multiple perturbation types by incorporating expensive adversarial examples from all perturbation types at each training iterations. However, this increases the training time and raises the question of how many perturbation types one should include for training. Alternatively, manifold-based defense models can eliminate the need to generate expensive adversarial examples by projecting input adversarial images onto the clean data manifold to achieve robustness for multiple perturbation types. However, the success of these models relies on whether the generative network can capture the complete clean data manifold, which remains an open problem for the complex input domain.
We first propose a new generative framework, called RBF-Net. We divide the task of capturing the distribution of training images into iterative sub-tasks of capturing the distribution of image patches at each layer of the network. The layers of an RBF-Net model consist of the radial basis function (RBF) filters. We train the RBF-Net using a layer-wise non-parametric density estimation algorithm to effectively capture the densities of the input patches at each layer of the network. We design an algorithm called RBF-Gen to generate new images from an RBF-Net model by iteratively reconstructing feature maps of the hidden layers. Unlike existing generative models, we can also apply RBF-Gen to reconstruct a given image from any hidden-layer feature maps.
Next, we use the RBF-Net to develop a robust image classification model, called RBF-CNN, to defend against adversarial attacks. We show that the reconstructed images using the proposed technique mitigate any minor perturbations in terms of $\ell_{p\geq 1}$ norms. Further, incorporating the reconstruction process for training also allows us to improve the adversarial robustness of our RBF-CNN models. Our experimental results demonstrate that our RBF-CNN models provide robustness for any minor perturbations in $\ell_1$, $\ell_2$, and $\ell_{\infty}$ norms, without the need for expensive adversarial training. RBF-CNN models also exhibit interpretable saliency maps to demonstrate their robustness against attacks. To the best of our knowledge, RBF-CNN is the first generative model-based defense to provide robustness for any minor perturbations in terms of any in terms of $\ell_{p\geq 1}$ norms and provides a desirable trade-off between robustness vs. accuracy at run-time.
Recent studies also find that the DNN-based models cannot determine the source of predictive uncertainties as they produce over-confident predictions even when the test inputs are obtained from unknown, out-of-distribution (OOD), far away from the known, in-domain training distribution. Among the existing predictive uncertainty estimation techniques, only Dirichlet Prior Network (DPN) can distinctly model different uncertainty types, namely, uncertainty in model parameters, data uncertainty, or due to distributional mismatch between training and test examples. It distinguishes the out-of-distribution(OOD) examples by producing flat Dirichlet distributions, compared to the in-domain examples. However, in this thesis, we show that in the presence of high data uncertainties among multiple classes, even a DPN model tends to flatter Dirichlet distributions for the in-domain examples. This behavior results in the inability to differentiate misclassified examples from out-of-distribution examples, thus compromising the correct identification of the source of uncertainty. We propose to increase the gap between the representations of the in-domain and the OOD examples such that they can be easily differentiated. Specifically, we design a new loss function using an explicit precision regularizer to achieve the desirable Dirichlet distributions for different uncertainty types. Our experimental results demonstrate that our proposed technique consistently improves OOD detection performance.