Adversarial Robustness in Vision Transformers: Attention as a Defense Mechanism
Vision Transformers (ViTs) have demonstrated remarkable performance across computer vision tasks, yet their robustness properties against adversarial perturbations remain insufficiently understood. In this work, we present a systematic analysis of how the self-attention mechanism in ViTs provides a natural defense against adversarial attacks. We introduce Attention Robustness Score (ARS), a novel metric quantifying the stability of attention maps under adversarial perturbations. Through extensive experiments on ImageNet and CIFAR-100, we demonstrate that ViTs exhibit 12-18% higher robust accuracy compared to convolutional counterparts under PGD and AutoAttack, and we trace this advantage to the global receptive field and low-rank structure of attention matrices. We further propose Adversarial Attention Regularization (AAR), a training-time technique that amplifies this intrinsic robustness, achieving state-of-the-art adversarial accuracy of 68.4% on ImageNet under $\ell_\infty$ threat model ($\epsilon = 4/255$) without sacrificing clean accuracy.


