A survey on vulnerability of deep neural networks to adversarial examples and defense approaches to deal with them

khalooei, Mohammad; Homayounpour, Mohammad Mehdi; Amirmazlaghani, Maryam

doi:10.61186/jsdp.20.2.113

Signal and Data Processing Journal A scientific journal officially licensed by the Commission for Scientific Publications of the (MSRT). Publisher: Research Ceter for Developmen of Technologies

EN FA

Volume 20, Issue 2 (9-2023) JSDP 2023, 20(2): 113-144 | Back to browse issues page

‎ 10.61186/jsdp.20.2.113

Mendeley

Zotero

RefWorks

khalooei M, Homayounpour M M, Amirmazlaghani M. A survey on vulnerability of deep neural networks to adversarial examples and defense approaches to deal with them. JSDP 2023; 20 (2) : 8
URL: http://jsdp.rcisp.ac.ir/article-1-1205-en.html

A survey on vulnerability of deep neural networks to adversarial examples and defense approaches to deal with them

Mohammad Khalooei

, Mohammad Mehdi Homayounpour ^*

, Maryam Amirmazlaghani

Amirkabir University of Technology

Abstract: (3630 Views)

Nowadays the most commonly used method in various tasks of machine learning and artificial intelligence are neural networks. In spite of their different uses, neural networks and Deep neural networks (DNNs) have some vulnerabilities. A little distortion or adversarial perturbation in the input data for both additive and non-additive cases can be led to change the output of the trained model, and this could be a kind of DNN vulnerability. Despite the imperceptibility of the mentioned disturbance for human beings, DNN is vulnerable to these changes.
Creating and applying any malicious perturbation named “attack”, penetrates DNNs and makes them incapable of doing the duty assigned to them. In this paper different attack approaches were categorized based on the signal applied in the attack procedure. Some approaches use the gradient signal for detecting the vulnerability of DNN and try to create a powerful attack. The other ones create a perturbation in a blind situation and change a portion of the input to create a potential malicious perturbation. Adversarial attacks include both black-box and White-box situations. White-box situation focuses on training loss function and the architecture of the model but black box situation focuses on the approximation of the main model and dealing with the restriction of the input-output model request.
Making a deep neural network resilient against attacks is named “defense”. Defense approaches are divided into three categories. One of them tries to modify the input, the other one makes some changes in the developed model and also changes the loss function of the model. In the third defense approach some networks are first used for purification and refinement of the input before passing it to the main network. Furthermore, an analytical approach was presented for the entanglement and disentanglement representation of inputs of the trained model. The gradient is a very powerful signal usually used in learning and an attacking approaches. Besides, adversarial training is a well-known approach in changing a loss function method to defend against adversarial attacks.
In this study the most recent research on the vulnerability of DNN through a critical literature review was presented. Literature and our experiments indicate that the projected gradient descent (PGD) and AutoAttack methods are successful approaches in the l2 and l∞ bounded attacks, respectively. Furthermore, our experiments indicate that AutoAttack is much more time-consuming than the other methods. In the defense concept, different experiments were conducted to compare different attacks in the adversarial training approaches. Our experimental results indicate that the PGD is much more efficient in adversarial training than the fast gradient sign method (FGSM) and its deviations like MIFGSM and covers a wider range of generalizations of the trained model on predefined datasets. Furthermore, AutoAttack integration with adversarial training works well, but it is not efficient in low epoch numbers. Aside from that, it has been proven that adversarial training is time-consuming. Furthermore, we released our code for researchers or individuals interested in extending or evaluating predefined models for standard and adversarial machine learning projects. A more detailed description of the framework can be found at https://github.com/khalooei/Robustness-framework .

Article number: 8

Keywords: vulnerability of neural network, robustness, attack, defense, neural network

Full-Text [PDF 1990 kb] (1183 Downloads)

Type of Study: Applicable | Subject: Paper
Received: 2021/01/19 | Accepted: 2023/07/5 | Published: 2023/10/22 | ePublished: 2023/10/22

References

1. [1] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.

2. [2] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015. [DOI:10.1038/nature14539] [PMID]

3. [3] A. H. Marblestone, G. Wayne, and K. P. Kording, "Toward an integration of deep learning and neuroscience," Frontiers in computational neuroscience, vol. 10, p. 94, 2016. [DOI:10.3389/fncom.2016.00094] [PMID] []

4. [4] S. Ganguli, "Towards bridging the gap between neuroscience and artificial intelligence." [Online]. Available: https://cbmm.mit.edu/sites/default/files/documents/Ganguli_AAAI17_SoI.pdf. [Accessed: 01-Dec-2019].

5. [5] Y. LeCun and Y. Bengio, "The Handbook of Brain Theory and Neural Networks," M. A. Arbib, Ed. Cambridge, MA, USA: MIT Press, 1998, pp. 255-258.

6. [6] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopadhyay, "Adversarial Attacks and Defences: A Survey," 2018. [Online]. Available: http://arxiv.org/abs/1810.00069. [Accessed: 17-Aug-2019].

7. [7] A. D. Joseph, B. Nelson, B. I. P. Rubinstein, and J. D. Tygar, Adversarial machine learning. Cambridge University Press.

8. [8] I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and Harnessing Adversarial Examples," in Proceedings of the International Conference on Learning Representations (ICLR), 2015.

9. [9] C. Szegedy et al., "Intriguing properties of neural networks," in Proceedings of the International Conference on Learning Representations (ICLR), 2014.

10. [10] A. Boloor, X. He, C. Gill, Y. Vorobeychik, and X. Zhang, "Simple Physical Adversarial Examples against End-to-End Autonomous Driving Models," in 2019 IEEE International Conference on Embedded Software and Systems (ICESS), 2019, pp. 1-7. [DOI:10.1109/ICESS.2019.8782514]

11. [11] A. Kurakin, I. Goodfellow, and S. Bengio, "Adversarial Machine Learning at Scale," in Proceedings of the International Conference on Learning Representations (ICLR), 2017.

12. [12] N. Akhtar and A. Mian, "Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey," IEEE Access, vol. 6, pp. 14410-14430, 2018. [DOI:10.1109/ACCESS.2018.2807385]

13. [13] S. Kariyappa and M. K. Qureshi, "Improving Adversarial Robustness of Ensembles with Diversity Training," 2019. [Online]. Available: http://arxiv.org/abs/1901.09981. [Accessed: 07-Oct-2019].

14. [14] A. Kurakin, I. Goodfellow, and S. Bengio, "Adversarial examples in the physical world," in Proceedings of the International Conference on Learning Representations (ICLR), 2017. [DOI:10.1201/9781351251389-8]

15. [15] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, "DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2574-2582. [DOI:10.1109/CVPR.2016.282]

16. [16] N. Carlini and D. Wagner, "Towards Evaluating the Robustness of Neural Networks," in Proceedings of the IEEE Symposium on Security and Privacy (SP), 2017, pp. 39-57. [DOI:10.1109/SP.2017.49]

17. [17] S. M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, "Universal adversarial perturbations," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [DOI:10.1109/CVPR.2017.17]

18. [18] J. Wu and R. Fu, "Universal, transferable and targeted adversarial attacks," 2019. [Online]. Available: http://arxiv.org/abs/1908.11332. [Accessed: 31-Dec-2019].

19. [19] Y. Dong et al., "Boosting Adversarial Attacks With Momentum," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [DOI:10.1109/CVPR.2018.00957]

20. [20] S. Qiu, Q. Liu, S. Zhou, and C. Wu, "Review of Artificial Intelligence Adversarial Attack and Defense Technologies," Applied Sciences, vol. 9, no. 5, p. 909, 2019. [DOI:10.3390/app9050909]

21. [21] F. Assion et al., "The Attack Generator: A Systematic Approach Towards Constructing Adversarial Attacks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [DOI:10.1109/CVPRW.2019.00177]

22. [22] D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization," Mathematical Programming, vol. 45, no. 1-3, pp. 503-528, 1989. [DOI:10.1007/BF01589116]

23. [23] T. Miyato, S.-I. Maeda, M. Koyama, and S. Ishii, "Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 8, pp. 1979-1993, 2019. [DOI:10.1109/TPAMI.2018.2858821] [PMID]

24. [24] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, "Towards Deep Learning Models Resistant to Adversarial Attacks," in Proceedings of the International Conference on Learning Representations (ICLR), 2018.

25. [25] S. Sabour, Y. Cao, F. Faghri, and D. J. Fleet, "Adversarial Manipulation of Deep Representations," in Proceedings of the International Conferenceon Learning Representations (ICLR), 2016.

26. [26] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, "The Limitations of Deep Learning in Adversarial Settings," in Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P), 2016, pp. 372-387. [DOI:10.1109/EuroSP.2016.36]

27. [27] J. Su, D. V. Vargas, and K. Sakurai, "One Pixel Attack for Fooling Deep Neural Networks," IEEE Transactions on Evolutionary Computation, vol. 23, no. 5, pp. 828-841, 2019. [DOI:10.1109/TEVC.2019.2890858]

28. [28] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, "Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks," in Proceedings of the IEEE Symposium on Security and Privacy (SP), 2016, pp. 582-597. [DOI:10.1109/SP.2016.41]

29. [29] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, "ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models," in Proceedings of the ACM Workshop on Artificial Intelligence and Security (AISec), 2017, pp. 15-26. [DOI:10.1145/3128572.3140448]

30. [30] L. Rosasco, E. De Vito, A. Caponnetto, M. Piana, and A. Verri, "Are Loss Functions All the Same?," Neural Computation, vol. 16, no. 5, pp. 1063-1076, 2004. [DOI:10.1162/089976604773135104] [PMID]

31. [31] Z. Zhao, D. Dua, and S. Singh, "Generating Natural Adversarial Examples," Proceedings of the International Conference on Learning Representations (ICLR), 2018.

32. [32] M. Arjovsky, S. Chintala, and L. Bottou, "Wasserstein Generative Adversarial Networks," in Proceedings of the International Conference on Machine Learning (ICML), 2017, vol. 70, pp. 214-223.

33. [33] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, "Improved Training of Wasserstein GANs," in Advances in Neural Information Processing Systems (NIPS), 2017, pp. 5767-5777.

34. [34] M. Arjovsky and L. Bottou, "Towards Principled Methods for Training Generative Adversarial Networks," Proceedings of the International Conference on Learning Representations (ICLR), 2017.

35. [35] T. Salimans et al., "Improved Techniques for Training GANs," in Advances in Neural Information Processing Systems (NIPS), 2016, pp. 2234-2242.

36. [36] M. Rosca, B. Lakshminarayanan, D. Warde-Farley, and S. Mohamed, "Variational Approaches for Auto-Encoding Generative Adversarial Networks," 2017. [Online]. Available: http://arxiv.org/abs/1706.04987.

37. [37] X. Yuan, P. He, Q. Zhu, and X. Li, "Adversarial Examples: Attacks and Defenses for Deep Learning," IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 9, pp. 2805-2824, 2019. [DOI:10.1109/TNNLS.2018.2886017] [PMID]

38. [38] D. Stutz, M. Hein, and B. Schiele, "Disentangling Adversarial Robustness and Generalization," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [DOI:10.1109/CVPR.2019.00714]

39. [39] A. Mustafa, S. Khan, M. Hayat, R. Goecke, J. Shen, and L. Shao, "Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks," in The IEEE International Conference on Computer Vision (ICCV), 2019. [DOI:10.1109/ICCV.2019.00348]

40. [40] G. Tao, S. Ma, Y. Liu, and X. Zhang, "Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples," in Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds. Curran Associates, Inc., 2018, pp. 7717-7728.

41. [41] E. Wong, L. Rice, and J. Z. Kolter, "Fast is better than free: Revisiting adversarial training," 2019.

42. [42] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, "Ensemble Adversarial Training: Attacks and Defenses," Proceedings of the International Conference on Learning Representations (ICLR), 2018.

43. [43] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, "Practical Black-Box Attacks against Machine Learning," in Proceedings of the ACM on Asia Conference on Computer and Communications Security (ASIA CCS), 2017, pp. 506-519. [DOI:10.1145/3052973.3053009]

44. [44] H. Hosseini, Y. Chen, S. Kannan, B. Zhang, and R. Poovendran, "Blocking Transferability of Adversarial Examples in Black-Box Learning Systems," 2017. [Online]. Available: http://arxiv.org/abs/1703.04318.

45. [45] G. K. Dziugaite, Z. Ghahramani, and D. M. Roy, "A study of the effect of JPG compression on adversarial images," 2016. [Online]. Available: http://arxiv.org/abs/1608.00853. [Accessed: 01-Oct-2019].

46. [46] N. Das et al., "Keeping the Bad Guys Out: Protecting and Vaccinating Deep Learning with JPEG Compression," 2017. [Online]. Available: http://arxiv.org/abs/1705.02900. [Accessed: 01-Oct-2019].

47. [47] N. Akhtar, J. Liu, and A. Mian, "Defense Against Universal Adversarial Perturbations," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 3389-3398. [DOI:10.1109/CVPR.2018.00357]

48. [48] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, and A. Yuille, "Adversarial Examples for Semantic Segmentation and Object Detection," in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1378-1387. [DOI:10.1109/ICCV.2017.153] [PMID] []

49. [49] A. Ilyas, S. Santurkar, D. Tsipras, L. Engstrom, B. Tran, and A. Madry, "Adversarial Examples Are Not Bugs, They Are Features," in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 125-136.

50. [50] D. Meng and H. Chen, "MagNet," in Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2017, pp. 135-147. [DOI:10.1145/3133956.3134057]

51. [51] B. Biggio, B. Nelson, and P. Laskov, "Support Vector Machines Under Adversarial Label Noise," in Proceedings of the Asian Conference on Machine Learning, 2011, pp. 97-112.

52. [52] S.-I. Mirzadeh, M. Farajtabar, A. Li, and H. Ghasemzadeh, "Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher," in Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), 2020. [DOI:10.1609/aaai.v34i04.5963]

53. [53] N. Carlini and D. Wagner, "Defensive Distillation is Not Robust to Adversarial Examples," eprint arXiv:1607.04311, 2016. [Online]. Available: http://arxiv.org/abs/1607.04311. [Accessed: 10-Oct-2019].

54. [54] W. Xu, D. Evans, and Y. Qi, "Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks," Network and Distributed Systems Security Symposium (NDSS) 2018, 2018. [DOI:10.14722/ndss.2018.23198]

55. [55] J. Gao, B. Wang, Z. Lin, W. Xu, and Y. Qi, "DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples," Proceedings of the International Conference on Learning Representations (ICLR), 2017.

56. [56] S. Gu and L. Rigazio, "Towards deep neural network architectures robust to adversarial examples," in Proceedings of the International Conference on Learning Representations (ICLR) Workshop, 2015.

57. [57] P. Samangouei, M. Kabkab, and R. Chellappa, "Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models," Proceedings of the International Conference on Learning Representations (ICLR), 2018.

58. [58] I. Goodfellow et al., "Generative Adversarial Nets," in Advances in Neural Information Processing Systems (NIPS), 2014, pp. 2672-2680.

59. [59] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu, "Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018. [DOI:10.1109/CVPR.2018.00191] [PMID]

60. [60] J. Uesato, B. O'Donoghue, P. Kohli, and A. Oord, "Adversarial Risk and the Dangers of Evaluating Against Weak Attacks," in Proceedings of Machine Learning Research, 2018, pp. 5025-5034.

61. [61] B. Sun, N.-H. Tsai, F. Liu, R. Yu, and H. Su, "Adversarial Defense by Stratified Convolutional Sparse Coding," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [DOI:10.1109/CVPR.2019.01171]

62. [62] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry, "Robustness May Be at Odds with Accuracy," Proceedings of the International Conference on Learning Representations (ICLR), 2018.

63. [63] D. Su, H. Zhang, H. Chen, J. Yi, P.-Y. Chen, and Y. Gao, "Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models," Springer, Cham, 2018, pp. 644-661. [DOI:10.1007/978-3-030-01258-8_39]

64. [64] F. Tramèr, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel, "The Space of Transferable Adversarial Examples," 2017. [Online]. Available: http://arxiv.org/abs/1704.03453. [Accessed: 20-Oct-2019].

65. [65] X. Wang et al., "Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting," in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2019, vol. 2019-May, pp. 6366-6370. [DOI:10.1109/ICASSP.2019.8683479]

66. [66] L. Schönherr, K. Kohls, S. Zeiler, T. Holz, and D. Kolossa, "Adversarial Attacks Against Automatic Speech Recognition Systems via Psychoacoustic Hiding," in Network and Distributed System Security Symposium (NDSS), 2019. [DOI:10.14722/ndss.2019.23288]

67. [67] J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, "Hotflip: White-box adversarial examples for text classification," in ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), 2018, vol. 2, pp. 31-36. [DOI:10.18653/v1/P18-2006] [PMID] []

68. [68] C. C. and C. B. Yann LeCun, "MNIST handwritten digit database." [Online]. Available: http://yann.lecun.com/exdb/mnist/. [Accessed: 24-Jun-2019].

69. [69] and G. H. Alex Krizhevsky, Vinod Nair, "CIFAR-10 and CIFAR-100 datasets," 2009. [Online]. Available: https://www.cs.toronto.edu/~kriz/cifar.html. [Accessed: 19-Oct-2019].

70. [70] S. Zagoruyko and N. Komodakis, "Wide Residual Networks," in Procedings of the British Machine Vision Conference 2016, 2016, vol. 2016-September, pp. 87.1-87.12. [DOI:10.5244/C.30.87]

71. [71] G. F. Silva, "CNN - Digit Recognizer (PyTorch) | Kaggle," Kaggle.com, 2018. [Online]. Available: https://www.kaggle.com/gustafsilva/cnn-digit-recognizer-pytorch. [Accessed: 14-Dec-2020].

72. [72] A. Paszke et al., "Automatic differentiation in PyTorch," 2017.

73. [73] R. (Roger) Fletcher, Practical methods of optimization. Wiley, 1987.

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.