Overview: This paper proposes a way to improve the defense to adversarial samples by matching the logit prediction of clean samples and the adversarial ones.
White box and black box attack: White box means the model architecture/weights is visible to the attacker; the black box is the opposite.
PGD attack: This is the baseline of attacks in this paper; PGD finds the adversarial samples around a norm ball. PGD is also an optimization method that does SGD and then projected onto the allowed region.
Clean logit pairing and clean logit squeezing:The author also found penalize random clean sample pair’s logit distance or simply penalize the norms of the logits helps in defending. The contribution is mainly from the later empirically but there is no clear explanation.
The rationale of Logit pairing: There is no signal to tell the model that the adversarial example is similar to specifically to the individual cat image that started the process.
Computation advantage: Typical adversarial training multiplies the training time by a factor of k but the complexity of logit matching is associated with the number of classes.