An Analysis of Robustness of Non-Lipschitz Networks
Maria-Florina Balcan, Avrim Blum, Dravyansh Sharma, Hongyang Zhang; 24(98):1−43, 2023.
Abstract
Despite significant progress, deep networks still remain highly vulnerable to adversarial attacks. One of the main challenges is that small changes in the input can lead to significant changes in the final-layer feature space of the network. In this paper, we introduce an attack model to better understand this inherent challenge. Our model allows the adversary to move data by an arbitrary distance in the feature space, but only in random low-dimensional subspaces. We prove that such adversaries can be extremely powerful, as they can defeat any algorithm that has to classify any input it receives. However, by allowing the algorithm to abstain from classifying unusual inputs, we demonstrate that these adversaries can be overcome when the classes are well-separated in the feature space. We also provide strong theoretical guarantees for setting algorithm parameters to optimize accuracy-abstention trade-offs using data-driven methods. Our results offer new robustness guarantees for nearest-neighbor style algorithms and are applicable to contrastive learning, where we empirically demonstrate how such algorithms can achieve high robust accuracy with low abstention rates. Furthermore, our model is motivated by strategic classification, where entities being classified aim to manipulate their observable features to achieve a desired classification. We also provide new insights into this area.
[abs]
[code]