[Submitted on 31 Aug 2023]

Click here to download a PDF of the research paper titled “Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff” by Satoshi Suzuki and 6 other authors.

Download PDF

Abstract: This paper addresses the tradeoff between standard accuracy on clean examples
and robustness against adversarial examples in deep neural networks (DNNs).
While adversarial training (AT) improves robustness, it reduces standard accuracy, resulting in a tradeoff. To mitigate this tradeoff, we propose a new AT method called ARREST. ARREST consists of three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). AFT trains a DNN on adversarial examples by initializing its parameters with a DNN that is pretrained on clean examples. RGKD and NR respectively involve a regularization term and an algorithm to preserve latent representations of clean examples during AFT. RGKD penalizes the distance between the representations of the pretrained and AFT DNNs. NR switches input adversarial examples to nonadversarial ones when the representation changes significantly during AFT. By combining these components, ARREST achieves both high standard accuracy and robustness. Experimental results demonstrate that ARREST effectively mitigates the tradeoff compared to previous AT-based methods.

Submission history

From: Satoshi Suzuki Dr. [view email]

[v1]

Thu, 31 Aug 2023 04:46:12 UTC (2,755 KB)