In this work, we initiate a formal study of probably approximately correct (PAC) learning under evasion attacks, where the adversary's goal is to \emph{misclassify} the adversarially perturbed sample point $\widetilde{x}$, i.e., $h(\widetilde{x})\neq c(\widetilde{x})$, where $c$ is the ground truth concept and $h$ is the learned hypothesis. Previous work on PAC learning of adversarial examples have all modeled adversarial examples as corrupted inputs in which the goal of the adversary is to achieve $h(\widetilde{x}) \neq c(x)$, where $x$ is the original untampered instance. These two definitions of adversarial risk coincide for many natural distributions, such as images, but are incomparable in general. We first prove that for many theoretically natural input spaces of high dimension $n$ (e.g., isotropic Gaussian in dimension $n$ under $\ell_2$ perturbations), if the adversary is allowed to apply up to a sublinear $o(||x||)$ amount of perturbations on the test instances, PAC learning requires sample complexity that is exponential in $n$. This is in contrast with results proved using the corrupted-input framework, in which the sample complexity of robust learning is only polynomially more. We then formalize hybrid attacks in which the evasion attack is preceded by a poisoning attack. This is perhaps reminiscent of "trapdoor attacks" in which a poisoning phase is involved as well, but the evasion phase here uses the error-region definition of risk that aims at misclassifying the perturbed instances. In this case, we show PAC learning is sometimes impossible all together, even when it is possible without the attack (e.g., due to the bounded VC dimension).

Thanks. We have received your report. If we find this content to be in
violation of our guidelines,
we will remove it.

Ok