Learnable Boundary Guided Adversarial Training

Jiequan Cui, Shu Liu, Liwei Wang, Jiaya Jia

Previous adversarial training raises model robustness under the compromise of accuracy on natural data. In this paper, our target is to reduce natural accuracy degradation. We use the model logits from one clean model $\mathcal{M}^{natural}$ to guide learning of the robust model $\mathcal{M}^{robust}$, taking into consideration that logits from the well trained clean model $\mathcal{M}^{natural}$ embed the most discriminative features of natural data, {\it e.g.}, generalizable classifier boundary. Our solution is to constrain logits from the robust model $\mathcal{M}^{robust}$ that takes adversarial examples as input and make it similar to those from a clean model $\mathcal{M}^{natural}$ fed with corresponding natural data. It lets $\mathcal{M}^{robust}$ inherit the classifier boundary of $\mathcal{M}^{natural}$. Thus, we name our method Boundary Guided Adversarial Training (BGAT). Moreover, we generalize BGAT to Learnable Boundary Guided Adversarial Training (LBGAT) by training $\mathcal{M}^{natural}$ and $\mathcal{M}^{robust}$ simultaneously and collaboratively to learn one most robustness-friendly classifier boundary for the strongest robustness. Extensive experiments are conducted on CIFAR-10, CIFAR-100, and challenging Tiny ImageNet datasets. Along with other state-of-the-art adversarial training approaches, {\it e.g.}, Adversarial Logit Pairing (ALP) and TRADES, the performance is further enhanced.

Knowledge Graph



Sign up or login to leave a comment