A Simple Approach to Enhance Single-Model Deep Uncertainty via Distance-Awareness
Authors: Jeremiah Zhe Liu, Shreyas Padhy, Jie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zachary Nado, Jasper Snoek, Dustin Tran, Balaji Lakshminarayanan; Published in 2023; Volume 24, Issue 42: Pages 1-63.
Abstract
Accurately quantifying uncertainty is a significant challenge in deep learning because neural networks often make overly confident errors and assign high confidence to predictions for out-of-distribution (OOD) inputs. The most popular methods for estimating predictive uncertainty in deep learning involve combining predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles. However, their practicality in real-time, industrial-scale applications is limited due to high memory and computational costs. Additionally, ensembles and BNNs do not necessarily address all the issues with the underlying member networks. In this study, we explore principled approaches to improve the uncertainty property of a single network based on a deterministic representation. By formulating uncertainty quantification as a minimax learning problem, we first identify distance awareness as a necessary condition for deep neural networks (DNNs) to achieve high-quality (i.e., minimax optimal) uncertainty estimation. Distance awareness refers to the model’s ability to quantify the distance of a testing example from the training data. We then propose the Spectral-normalized Neural Gaussian Process (SNGP), a simple method that enhances the distance-awareness ability of modern DNNs through two simple changes: (1) applying spectral normalization to hidden weights to enforce bi-Lipschitz smoothness in representations, and (2) replacing the last output layer with a Gaussian process layer. SNGP consistently outperforms other single-model approaches in prediction, calibration, and out-of-domain detection on various vision and language understanding benchmarks, as well as on modern architectures like Wide-ResNet and BERT. Furthermore, SNGP provides complementary benefits to popular techniques such as deep ensembles and data augmentation, making it a simple and scalable building block for probabilistic deep learning.
[abs]
[code]