Extending Adversarial Attacks to Generate Adversarial Class Probability Distributions

Authors: Jon Vadillo, Roberto Santana, Jose A. Lozano; Published in 2023, Volume 24, Pages 1-42.

Abstract

Deep learning models have shown remarkable performance and generalization capabilities in various artificial intelligence tasks. However, these models are susceptible to adversarial attacks, where imperceptible perturbations are added to input data, resulting in misclassification. In this paper, we propose a novel probabilistic framework that extends and generalizes adversarial attacks to manipulate the probability distribution of classes when the attack is applied to multiple inputs. This new attack paradigm provides the adversary with greater control over the target model, exposing vulnerabilities that cannot be exploited with conventional methods. We present four strategies to efficiently generate such attacks and demonstrate their effectiveness in extending existing adversarial attack algorithms. Additionally, we validate our approach through experiments on spoken command classification and tweet emotion classification tasks, which are representative machine learning problems in audio and text domains, respectively. Our results show that we can closely approximate any desired probability distribution for the classes while maintaining a high success rate in fooling the model and evading label-shift detection methods.

[Abstract]

[PDF][BibTeX]

[Code]