The AttackBench
framework wants to fairly compare gradient-based attacks based on their security evaluation curves. To this end, we derive a process involving five distinct stages, as depicted below.
AttackBench
limits the number of forward and backward queries to the model, such that all attacks are compared within a given maximum query budget.
This step provides common ground with shared assumptions, advantages, and limitations.
We then run the attacks against the selected models individually and collect the performance metrics of interest in our analysis, which are perturbation size, execution time, and query usage.
local optimality
metric that quantifies how close an attack is to the optimal solution.
global optimality
.
2
Datasets
9
Models
6
Libraries
20
Distinct Attacks
102
Implementations
815
Comparisons
We perform an extensive experimental analysis that compares 20 attacks (listed below), retrieving their original implementation and collecting the other implementations available among popular adversarial attack libraries.
We empirically test a total of 102 techniques, re-evaluating them in terms of their runtime, success rate and perturbation distance, as well as with our newly introduced optimality
metrics.
While implementing AttackBench
, we collected additional insights, including sub-optimal implementations, attacks returning incorrect results, and errors in the source code that prevent attacks from concluding their runs correctly.
These additional insights could lead to a complete re-evaluation of the State of the Art, as incorrect evaluations might have impacted and inflated results in published work.
@article{CinaRony2024AttackBench,
author = {Antonio Emanuele Cinà, Jérôme Rony, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Ismail Ben Ayed, Fabio Roli },
title = {AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples},
journal = {ArXiv},
year = {2024},
eprint = {2404.19460},
}