ONION: A Simple and Effective Defense Against Textual Backdoor Attacks

Fanchao Qi, Yangyi Chen, Mukai Li, Zhiyuan Liu, Maosong Sun

Backdoor attacks, which are a kind of emergent training-time threat to deep neural networks (DNNS). They can manipulate the output of DNNs and posses high insidiousness. In the field of natural language processing, some attack methods have been proposed and achieve very high attack success rates on multiple popular models. Nevertheless, the studies on defending textual backdoor defense are little conducted. In this paper, we propose a simple and effective textual backdoor defense named ONION, which is based on outlier word detection and might be the first method that can handle all the attack situations. Experiments demonstrate the effectiveness of our model when blocking two latest backdoor attack methods.

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment