Gradients Cannot Be Tamed: Behind the Impossible Paradox of Blocking Targeted Adversarial Attacks

التفاصيل البيبلوغرافية
العنوان: Gradients Cannot Be Tamed: Behind the Impossible Paradox of Blocking Targeted Adversarial Attacks
المؤلفون: Yuval Elovici, Ziv Katzir
المصدر: IEEE transactions on neural networks and learning systems. 32(1)
سنة النشر: 2020
مصطلحات موضوعية: Artificial neural network, Computer Networks and Communications, Computer science, Reproducibility of Results, Computer security, computer.software_genre, Classification, Computer Science Applications, Pattern Recognition, Automated, Adversarial system, Artificial Intelligence, Leverage (statistics), Neural Networks, Computer, computer, Software, Algorithms
الوصف: Despite their accuracy, neural network-based classifiers are still prone to manipulation through adversarial perturbations. These perturbations are designed to be misclassified by the neural network while being perceptually identical to some valid inputs. The vast majority of such attack methods rely on white-box conditions, where the attacker has full knowledge of the attacked network’s parameters. This allows the attacker to calculate the network’s loss gradient with respect to some valid inputs and use this gradient in order to create an adversarial example. The task of blocking white-box attacks has proved difficult to address. While many defense methods have been suggested, they have had limited success. In this article, we examine this difficulty and try to understand it. We systematically explore the capabilities and limitations of defensive distillation, one of the most promising defense mechanisms against adversarial perturbations suggested so far, in order to understand this defense challenge. We show that contrary to commonly held belief, the ability to bypass defensive distillation is not dependent on an attack’s level of sophistication. In fact, simple approaches, such as the targeted gradient sign method, are capable of effectively bypassing defensive distillation. We prove that defensive distillation is highly effective against nontargeted attacks but is unsuitable for targeted attacks. This discovery led to our realization that targeted attacks leverage the same input gradient that allows a network to be trained. This implies that blocking them comes at the cost of losing the network’s ability to learn, presenting an impossible tradeoff to the research community.
تدمد: 2162-2388
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d1bc80df1227b0fffb779088b93d9eeb
https://pubmed.ncbi.nlm.nih.gov/32167916
حقوق: CLOSED
رقم الأكسشن: edsair.doi.dedup.....d1bc80df1227b0fffb779088b93d9eeb
قاعدة البيانات: OpenAIRE