Adversarial Attacks on Natural Language Processing (NLP) Systems: An Emerging Threat in Cybersecurity
| Full text | |||
| Source | Journal of Information Systems Security Volume 21, Number 3 (2025)
Pages 227–256
ISSN 1551-0123 (Print)ISSN 1551-0808 (Online) |
||
| Authors | Alanoud Aljuaid — Umm Al-Qura University, Mecca, Saudi Arabia
Xiang Liu — Marymount University, Virginia, United States
|
||
| Publisher | Information Institute Publishing, Washington DC, USA | ||
Abstract
Natural language processing (NLP) is a subset of AI and linguistics that enables computers to understand languages spoken and written by humans. As a result of its robustness, NLP-based systems are used to mine cyber-threat intelligence from unstructured sources; hence, their vulnerability to attacks. The aim of this study is to establish the vulnerabilities of NLP-based systems to adversarial attacks, evaluate the impact of adversarial attacks on the effectiveness of NLP-based systems, and recommend strategies to fortify NLP-based systems against adversarial attacks. The study adopted a multi-case design that entailed the selection of four proof-of-concept designs including sentence-level attack, word-level attack, char-level attack, and multilevel attack. The analysis of cases was based on the attack vectors, root-cause analysis, and impact assessment. The counter-measures against the attacks were also established from the cases. The results show that the adversarial attacks experimented in all the four cases were successful as evidenced by high rates of accuracy, impact, and robustness. The vulnerability of the NLP-based systems to adversarial attacks was exposed. The counter-measure that successfully addressed these vulnerabilities entailed injecting adversarial examples into the training data, which was then used to retrain the victim model to make it less vulnerable to the attacks. The interventional strategies that can be used to reduce the extent and impact of attacks on NLP-based systems include collaboration among stakeholders, regulatory support through policies, raising awareness, developing robust detection models and algorithms, and continuous monitoring and updating of systems to bolster defense against adversarial attacks.
Keywords
Natural Language Processing, NLP-based Systems, Adversarial Attacks, Robustness, Accuracy, Vulnerability, Cybersecurity.
References
Abdullah, Z. b., Dahlan, N. b. M., Dahlan, A. b., Irfan bin, A. F., and Arifin, A. S. (2023). Cybersecurity Awareness on Personal Data Protection Using Game-Based Learning. Information Management and Business Review, 15(3(I)), 497-503.
Akhtar, N., and Mian, A. (2018). Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey. IEEE Access, 6, 14410-14430.
Almeida, T. A., Hidalgo, J. M. G., and Yamakami, A. (2011). Contributions to the study of SMS spam filtering: new collection and results. Proceedings of the 11th ACM symposium on Document engineering, 259–262.
Alshattnawi, S., Shatnawi, A., AlSobeh, A. M. R., and Magableh, A. A. (2024). Beyond Word-Based Model Embeddings: Contextualized Representations for Enhanced Social Media Spam Detection. Applied Sciences, 14(6), 2254.
AnjaliM, K., and BabuAnto., P. (2014). Ambiguities in Natural Language Processing. International Journal of Innovative Research in Computer and Communication Engineering, 2, 392-394.
Ansari, M. F., Sharma, P. K., and Dash, B. (2022). Prevention of phishing attacks using AI-based Cybersecurity Awareness Training. International Journal of Smart Sensor and Adhoc Network, 3(6).
Atallah, M. J., McDonough, C. J., Raskin, V., and Nirenburg, S. (2001). Natural language processing for information assurance and security: an overview and implementations. Proceedings of the 2000 workshop on New security paradigms, 51–65.
Berry, H. S. (2023). Survey of the Challenges and Solutions in Cybersecurity Awareness Among College Students. 2023 11th International Symposium on Digital Forensics and Security (ISDFS), 1-6.
Boucher, N., Shumailov, I., Anderson, R., and Papernot, N. (2022). Bad Characters: Imperceptible NLP Attacks. 2022 IEEE Symposium on Security and Privacy (SP), 1987-2004.
Chan, A., Tay, Y., Ong, Y., and Zhang, A. (2020a). Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder. Findings of the Association for Computational Linguistics: EMNLP.
Chan, A., Tay, Y., Ong, Y., and Zhang, A. (2020b). Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder. Findings of the Association for Computational Linguistics: EMNLP 2020, 4175-4189.
Chen, T., Liu, J., Xiang, Y., Niu, W., Tong, E., and Han, Z. (2019). Adversarial attack and defense in reinforcement learning-from AI security view. Cybersecurity, 2(1), 11.
Choi, C., and Choi, J. (2019). Ontology-Based Security Context Reasoning for Power IoT-Cloud Security Service. IEEE Access, 7, 110510-110517.
Chopra, A., Prashar, A., and Sain, C. (2013). Natural language processing. International Journal of Technology Enhancements and Emerging Engineering Research, 1, 131-134.
Cîrnu, C. E., Rotună, C. I., Vevera, A. V., and Boncea, R. (2018). Measures to Mitigate Cybersecurity Risks and Vulnerabilities in Service-Oriented Architecture. Studies in Informatics and Control, 27(3), 359-368.
Craigen, D., Diakun-Thibault, N., and Purse, R. (2014). Defining cybersecurity. Technology Innovation Management Review, 4(10), 13-21.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.
Dong, H., Dong, J., Yuan, S., and Guan, Z. (2023). Adversarial Attack and Defense on Natural Language Processing in Deep Learning: A Survey and Perspective. 409-424.
Dong, Z., and Dong, Q. (2006). Hownet and the Computation of Meaning. Ebneyamini, S., and Sadeghi Moghadam, M. R. (2018). Toward Developing a Framework for Conducting Case Study Research. International Journal of Qualitative Methods, 17(1), 1609406918817954.
Ebrahimi, J., Rao, A., Lowd, D., and Dou, D. (2018). HotFlip: White-Box Adversarial Examples for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
Eykholt, K., Evtimov, I., Fernandes, E., Li, B., Rahmati, A., Xiao, C.,…Song, D. (2018). Robust Physical-World Attacks on Deep Learning Visual Classification. 2018
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1625-1634.
Flyvbjerg, B. (2006). Five Misunderstandings About Case-Study Research. Qualitative Inquiry, 12, 219-245.
Gandhi, A., Ahir, P., Adhvaryu, K., Shah, P., Lohiya, R., Cambria, E.,…Hussain, A. Hate speech detection: A comprehensive review of recent works. Expert Systems, n/a(n/a), e13562.
Gao, J., Lanchantin, J., Soffa, M. L., and Qi, Y. (2018). Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. 2018 IEEE Security and Privacy Workshops (SPW), 50-56.
Georgescu, T.-M. (2020). Natural language processing model for automatic analysis of cybersecurity-related documents. Symmetry, 12(3), 354.
Ghaleb, S. A. A., Mohamad, M., Fadzli, S. A., and Ghanem, W. A. H. M. (2021). Training Neural Networks by Enhance Grasshopper Optimization Algorithm for Spam Detection System. IEEE Access, 9, 116768-116813.
Gongane, V. U., Munot, M. V., and Anuse, A. D. (2022). Detection and moderation of detrimental content on social media platforms: current status and future directions. Social Network Analysis and Mining, 12(1), 129.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio, Y. I., pages 2672–2680. (2014). Generative adversarial nets. Advances in neural information processing systems (NIPS), 27.
Goodfellow, I. J., Shlens, J., and Szegedy, C. (2014). Explaining and Harnessing Adversarial Examples. International Conference on Learning Representations.
Goyal, S., Doddapaneni, S., Khapra, M. M., and Ravindran, B. (2023). A Survey of Adversarial Defenses and Robustness in NLP. ACM Comput. Surv., 55(14s), Article 332.
Halder, S., Tiwari, R., and Sprague, A. (2011). Information extraction from spam emails using stylistic and semantic features to identify spammers. 2011 IEEE International Conference on Information Reuse and Integration, 104-107.
Han, W., Zhang, L., Jiang, Y., and Tu, K. (2020). Adversarial attack and defense of structured prediction models. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Hirschberg, J., and Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245), 261-266.
Hosier, J., Gurbani, V. K., and Milstead, N. (2019). Disambiguation and Error Resolution in Call Transcripts. 2019 IEEE International Conference on Big Data (Big Data), 4602-4607.
Hutson, M. (2018). Hackers easily fool artificial intelligences. Science, 361(6399), 215-215.
Ito, R., and Mimura, M. (2019). Detecting Unknown Malware from ASCII Strings with Natural Language Processing Techniques. 2019 14th Asia Joint Conference on Information Security (AsiaJCIS), 1-8.
Jin, D., Jin, Z., Zhou, J. T., and Szolovits, P. (2020). Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 8018-8025.
Jo, H., Kim, J., Porras, P., Yegneswaran, V., and Shin, S. (2021). GapFinder: Finding Inconsistency of Security Information From Unstructured Text. IEEE Transactions on Information Forensics and Security, 16, 86-99.
Karbab, E. B., and Debbabi, M. (2019). MalDy: Portable, data-driven malware detection using natural language processing and machine learning techniques on behavioral analysis reports. Digital Investigation, 28, S77-S87.
Kennedy, J., and Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN'95 - International Conference on Neural Networks, 4, 1942-1948 vol.1944.
Kim, Y., Kang, H., Suryanto, N., Larasati, H. T., Mukaroh, A., and Kim, H. (2021). Extended Spatially Localized Perturbation GAN (eSLP-GAN) for Robust Adversarial Camouflage Patches. Sensors, 21(16), 5323.
Kuchipudi, B., Nannapaneni, R. T., and Liao, Q. (2020). Adversarial machine learning for spam filters. Proceedings of the 15th International Conference on Availability, Reliability and Security, Article 38.
Kurakin, A., Goodfellow, I., and Bengio, S. (2016). Adversarial Machine Learning at Scale. ArXiv, abs/1611.01236.
Liddy, E. D. (2001). Natural anguage processing. In In Encyclopedia of Library and Information Science (2nd ed.). Marcel Decker, Inc.
Liu, B., Xiao, B., Jiang, X., Cen, S., He, X., and Dou, W. (2023). Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A Case Study on ChatGPT. Security and Communication Networks, 2023, 8691095.
Mamta, Gupta, B. B., Li, K. C., Leung, V. C. M., Psannis, K. E., and Yamaguchi, S. (2021). Blockchain-Assisted Secure Fine-Grained Searchable Encryption for a Cloud-Based Healthcare Cyber-Physical System. IEEE/CAA Journal of Automatica Sinica, 8(12), 1877-1890.
Martin, J., and Elster, C. (2020). Inspecting adversarial examples using the fisher information. Neurocomputing, 382, 80-86.
Mimura, M., and Ito, R. (2022). Applying NLP techniques to malware detection in a practical environment. International Journal of Information Security, 21(2), 279-291.
Mishra, A., Gupta, B. B., Peraković, D., Peñalvo, F. J. G., and Hsu, C. H. (2021). Classification Based Machine Learning for Detection of DDoS attack in Cloud Computing. 2021 IEEE International Conference on Consumer Electronics (ICCE), 1-4
Mo, K., Tang, W., Li, J., and Yuan, X. (2023). Attacking Deep Reinforcement Learning With Decoupled Adversarial Policy. IEEE Transactions on Dependable and Secure Computing, 20(1), 758-768.
Morris, J. X., Lifland, E., Yoo, J. Y., Qi, Y., and https://arxiv.org/abs/2005.05909. (2020). Textattack: A framework for adversarial attacks in natural language processing. Proceedings of the 2020 EMNLP, Arvix.
Nadkarni, P. M., Ohno-Machado, L., and Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.
Naseer, M., Khan, S., Hayat, M., Khan, F. S., and Porikli, F. (2020). A Self-supervised Approach for Adversarial Robustness. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 259-268.
Neekhara, P., Hussain, S., Pandey, P., Dubnov, S., McAuley, J., and Koushanfar, F.-., doi: 10.21437/Interspeech.2019-135. (2019). Universal Adversarial Perturbations for Speech Recognition Systems. Proceedings of Interspeech 2019.
Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., and Swami, A. (2017). Practical Black-Box Attacks against Machine Learning. Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, 506–519.
Parildi, E. S., Hatzinakos, D., and Lawryshyn, Y. (2021). Deep learning-aided runtime opcode-based Windows malware detection. Neural Computing and Applications, 33(18), 11963-11983.
Peddoju, S. K., Upadhyay, H., Soni, J., and Prabakar, N. (2020). Natural Language Processing based Anomalous System Call Sequences Detection with Virtual Memory Introspection. International Journal of Advanced Computer Science and Applications, 11(5).
Poornachandran, P., Raj, D., Pal, S., and Ashok, A. (2016). Effectiveness of Email Address Obfuscation on Internet. 181-191.
Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., and Sun, M. (2021). Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 4569-4580.
Qiu, S., Liu, Q., Zhou, S., and Huang, W. (2022). Adversarial attack and defense technologies in natural language processing: A survey. Neurocomputing, 492, 278-307.
Radford, A., Wu, J., R., C., Luan, D., Amodei, D., and I., S. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8).
Rahman, M. R., Mahdavi-Hezaveh, R., and Williams, L. (2020). A Literature Review on Mining Cyberthreat Intelligence from Unstructured Texts. 2020 International Conference on Data Mining Workshops (ICDMW), 516-525.
Rayan, A., and Taloba, A. I. (2021). Detection of Email Spam using Natural Language Processing Based Random Forest Approach. PREPRINT (Version 1) available at Research Square
Ren, K., Zheng, T., Qin, Z., and Liu, X. (2020). Adversarial Attacks and Defenses in Deep Learning. Engineering, 6(3), 346-360.
Ren, S., Deng, Y., He, K., and Che, W. (2019). Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
Ridder, H.-G. (2017). The theory contribution of case study research designs. Business Research, 10(2), 281-305.
Sadaghiani-Tabrizi, A. (2023). Revisiting Cybersecurity Awareness in the Midst of Disruptions. International Journal for Business Education, 163(1).
Salman, M., Ikram, M., and Kaafar, M. A. (2024). Investigating Evasive Techniques in SMS Spam Filtering: A Comparative Analysis of Machine Learning Models. IEEE Access, 12, 24306-24324.
Sato, M., Suzuki, J., Shindo, H., and Matsumoto, Y. (2018). Interpretable Adversarial Perturbation in Input Embedding Space for Text. ArXiv, abs/1805.02917.
Schwinn, L., Dobre, D., Günnemann, S., and Gidel, G. (2023). Adversarial Attacks and Defenses in Large Language Models: Old and New Threats. arXiv:2310.19737
Shankar, S. (2018). Advanced detection of spam and email filtering using natural language processing algorithms. International Journal of Advance Research, Ideas and Innovations in Technology, 4, 714-717.
Sharif, M., Bhagavatula, S., Bauer, L., and Reiter, M. K. (2016). Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 1528–1540.
Singh, A., and Gupta, B. B. (2022). Distributed denial-of-service (DDoS) attacks and defense mechanisms in various web-enabled computing platforms: issues, challenges, and future research directions. International Journal on Semantic Web and Information Systems (IJSWIS), 18(1), 1-43.
Steinhardt, J., Koh, P., and Liang, P., abs/1706.03691. (2017). Certified Defenses for Data Poisoning Attacks. arXiv:1706.03691v2.
Stone, A. (2007). Natural-Language Processing for Intrusion Detection. Computer, 40(12), 103–105.
Sun, L., Hashimoto, K., Yin, W., Asai, A., Li, J., Yu, P. S., and Xiong, C. (2020). Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT. ArXiv, abs/2003.04985.
Sworna, Z. T., Mousavi, Z., and Babar, M. A. (2023). NLP methods in host-based intrusion detection systems: A systematic review and future directions. Journal of Network and Computer Applications, 220, 103761.
Tewari, A., and Gupta, B. B. (2020). Secure Timestamp-Based Mutual Authentication Protocol for IoT Devices Using RFID Tags. International Journal on Semantic Web and Information Systems (IJSWIS), 16(3), 20-34.
Ukwen, D. O., and Karabatak, M. (2021, 28-29 June 2021). :Review of NLP-based Systems in Digital Forensics and Cybersecurity. 2021 9th International Symposium on Digital Forensics and Security (ISDFS),
Vassilev, A., Oprea, A., Fordyce, A., and Anderson, H. (2023). Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations. NIST Trustworthy and Responsible AI.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Polosukhin, I. (2017). Attention is All you Need. Advances in Neural Information Processing Systems 30 (NIPS 2017).
Vijayan, J. (2024). Google's Gemini AI Vulnerable to Content Manipulation. DarkReading.
Wang, H., Li, J., Wu, H., Hovy, E., and Sun, Y. (2023). Pre-Trained Language Models and Their Applications. Engineering, 25, 51-65.
Wang, S., Yan, Q., Chen, Z., Yang, B., Zhao, C., and Conti, M. (2017). TextDroid: Semantics-based detection of mobile malware using network flows. 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 18-23.
Wei, J., and Zou, K. (2019). EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 6382–6388.
Yang, P., Li, D., and Li, P. (2022). Continual Learning for Natural Language Generations with Transformer Calibration. Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL) Abu Dhabi, United Arab Emirates (Hybrid).
Yin, R. K. (2017). Case Study Research and Applications: Design and Methods (6th ed.). SAGE Publications. https://books.google.com/books?id=uX1ZDwAAQBAJ
Zang, Y., Qi, F., Hu, J., Liu, Z., Zhang, R., Liu, Q., and Sun, M. (2020). Word-level Textual Adversarial Attacking as Combinatorial Optimization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
Zhang, H., Zhou, H., Miao, N., and Li, L. (2019). Generating Fluent Adversarial Examples for Natural Languages. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
Zhang, N., Xue, J., Ma, Y., Zhang, R., Liang, T., and Tan, Y.-a. (2021). Hybrid sequencebased Android malware detection using natural language processing. International Journal of Intelligent Systems, 36(10), 5770-5784.
Zhang, W. E., Sheng, Q. Z., Alhazmi, A., and Li, C. (2020). Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey. ACM Trans. Intell. Syst. Technol., 11(3), Article 24.
Zhang, X., Xie, X., Ma, L., Du, X., Hu, Q., Liu, Y.,…Sun, M. (2020). Towards characterizing adversarial defects of deep learning software from the lens of uncertainty. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, 739–751.
Zhang, Y., Saberi, M., and Chang, E. (2017). Semantic-based lightweight ontology learning framework: a case study of intrusion detection ontology. Proceedings of the International Conference on Web Intelligence, 1171–1177.
Zhang, Y., Shao, K., Yang, J., and Liu, H. (2021). Adversarial Attacks and defenses on Deep Learning Models in Natural Language Processing. 2021 IEEE 5th Information Technology,Networking,Electronic and Automation Control Conference (ITNEC), 5, 1281-1285.
Zhou, X., and Zafarani, R. (2020). A Survey of Fake News: Fundamental Theories, Detection Methods, and Opportunities. ACM Comput. Surv., 53(5), Article 109.
