A Study Towards Building Content Aware Models in NLP using Genetic Algorithms
Keywords:Content awareness, Large language models, data poisoning, genetic algorithms
INTRODUCTION: With the advancement in the large language models, often called LLMs, there has been increasing concerns around the usage of these models. As they can generate human-like text and can also perform a number of tasks such as generating code, question answering, essay writing and even generating text for research papers.
OBJECTIVES: The generated text is subject to the usage of the original data (using which models are trained) which might be protected or may be personal/private data. The detailed description of such concerns and various potential solutions is discussed in ‘Generative language models and automated influence operations: Emerging threats and potential mitigations’. METHODS: Addressing these concerns becomes the paramount for LLMs usability. There are several directions explored by the researchers and one of the interesting works is around building content aware models. The idea is that the model is aware of the type of content it is learning from and aware what type of content should be used to generate a response to a specific query.
RESULTS: In our work we explored direction by applying poisoning techniques to contaminate data and then applying genetic algorithms to extract the non-poisoned content from the poisoned content that can generate a good response when paraphrased.
CONCLUSION: While we demonstrated the idea using poisoning techniques and tried to make the model aware of copyrighted content, the same can be extended to detect other types of contents or any other use cases where content awareness is required.
Josh A Goldstein, Girish Sastry, Micah Musser, Renee DiResta, Matthew Gentzel, and Katerina Sedova. Generative language models and automated influence operations: Emerging threats and potential mitigations. arXiv preprint arXiv:2301.04246, 2023.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
Nicholas Boucher, Ilia Shumailov, Ross Anderson, and Nicolas Papernot. Bad characters: Imperceptible nlp attacks. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1987–2004. IEEE, 2022.
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
Wirsansky, E., 2020. Hands-on genetic algorithms with Python: applying genetic algorithms to solve real-world deep learning and artificial intelligence problems. Packt Publishing Ltd.
Chen, W., Ramos, K., Mullaguri, K.N. and Wu, A.S., 2021. Genetic al gorithms for extractive summarization. arXiv preprint arXiv:2105.02365.
Manzoni, L., Jakobovic, D., Mariot, L., Picek, S. and Castelli, M., 2020, June. Towards an evolutionary-based approach for natural language processing. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (pp. 985-993).
Wallace, E., Zhao, T.Z., Feng, S. and Singh, S., 2020. Concealed data poisoning attacks on nlp models. arXiv preprint arXiv:2010.12563.
Xiang, T., Xie, C., Guo, S., Li, J. and Zhang, T., 2021. Protecting Your NLG Models with Semantic and Robust Watermarks. arXiv preprint arXiv:2112.05428.
Pajola, L. and Conti, M., 2021, September. Fall of Giants: How popular text-based MLaaS fall against a simple evasion attack. In 2021 IEEE European Symposium on Security and Privacy (EuroS&P) (pp. 198-211). IEEE.
Michel, P., Li, X., Neubig, G. and Pino, J.M., 2019. On evaluation of adversarial perturbations for sequence-to-sequence models. arXiv preprint arXiv:1903.06620.
Russo, A., 2023, June. Analysis and Detectability of Offline Data Poisoning Attacks on Linear Dynamical Systems. In Learning for Dynamics and Control Conference (pp. 1086-1098). PMLR.
Evans, O., Cotton-Barratt, O., Finnveden, L., Bales, A., Balwit, A., Wills, P., Righetti, L. and Saunders, W., 2021. Truthful AI: Developing and governing AI that does not lie. arXiv preprint arXiv:2110.06674.
Bang, Y., Cahyawijaya, S., Lee, N., Dai, W., Su, D., Wilie, B., Lovenia, H., Ji, Z., Yu, T., Chung, W. and Do, Q.V., 2023. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023.
Megahed, F.M., Chen, Y.J., Ferris, J.A., Knoth, S. and Jones-Farmer, L.A., 2023. How generative ai models such as chatgpt can be (mis) used in spc practice, education, and research? an exploratory study. Quality Engineering, pp.1-29.
Borji, A., 2023. A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494.
Sheera Frenkel, “Iranian Disinformation Effort Went Small to Stay Under Big Tech’s Radar,” New York Times, June 30, 2021, https://www.nytimes.com/2021/06/30/technology/disinformation-message-apps.html.
Xiang, Tao, Chunlong Xie, Shangwei Guo, Jiwei Li, and Tianwei Zhang. “Protecting Your NLG Models with Semantic and Robust Watermarks.” arxiv:2112.05428 [cs.MM], December 10, 2021. https: //doi.org/10.48550/arxiv.2112.05428.
Sablayrolles, Alexandre, Matthijs Douze, Cordelia Schmid, and Hervé Jégou. “Radioactive data: tracing through training.” 37th International Conference on Machine Learning, ICML 2020 PartF168147-11 (February 3, 2020): 8296–8305. https://doi.org/10.48550/arxiv.2002.00937.
Ziegler, Z.M., Deng, Y. and Rush, A.M., 2019. Neural linguistic steganography. arXiv preprint arXiv:1909.01496.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V. and Zettlemoyer, L., 2019. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461.
How to Cite
Copyright (c) 2023 Umesh Tank, Saranya Arirangan , Anwesh Reddy Paduri, Narayana Darapaneni
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.