Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks

Authors

DOI:

https://doi.org/10.4108/eetiot.6566

Keywords:

Malware, Synthetic Malware, GAN, VAE

Abstract

The effectiveness of detecting malicious files heavily relies on the quality of the training dataset, particularly its size and authenticity. However, the lack of high-quality training data remains one of the biggest challenges in achieving widespread adoption of malware detection by trained machine and deep learning models. In response to this challenge, researchers have made initial strides by employing generative techniques to create synthetic malware samples. This work utilizes deep variational autoencoders (VAE) and generative adversarial networks (GAN) to produce malware samples as opcode sequences. The generated malware opcodes are then distinguished from authentic opcode samples using machine and deep learning techniques as validation methods. The primary objective of this study was to compare synthetic malware generated using VAE and GAN technologies. The results showed that neither approach could create synthetic malware that could deceive machine learning classification. However, the WGAN-GP algorithm showed more promise by requiring a higher number of synthetic malware samples in the train set to effectively be detected, proving it
a better approach in synthetic malware generation.

Downloads

Download data is not yet available.
<br data-mce-bogus="1"> <br data-mce-bogus="1">

References

Cisco. What is Malware? - Definition and Examples;. Accessed Jun 26, 2024. https://www.cisco.com/c/en/us/products/security/advanced-malware-protection/what-is-malware.html.

Baker K. 12 Types of Malware + Examples ThatYou Should Know;. Accessed Jun 26, 2024. https://www.crowdstrike.com/cybersecurity-101/malware/types-of-malware/.

Ucci D, Aniello L, Baldoni R. Survey of Machine Learning Techniques for Malware Analysis. Computers Security. 2019 Mar;81:123-47. DOI: https://doi.org/10.1016/j.cose.2018.11.001

Aslan Ö, Samet R. A Comprehensive Review on Malware Detection Approaches. IEEE Access. 2020;8:6249-71. DOI: https://doi.org/10.1109/ACCESS.2019.2963724

Trehan H, Di Troia F. Fake Malware Generation Using HMM and GAN. Journal Name. 2022;02:3-21. DOI: https://doi.org/10.1007/978-3-030-96057-5_1

Illes D. On the impact of dataset size and class imbalance in evaluating machine-learning-based windows malware detection techniques. arXiv preprint arXiv:220606256. 2022.

Vemparala S, Di Troia F, Corrado VA, Austin TH, Stamo M. Malware detection using dynamic birthmarks. In: Proceedings of the 2016 ACM on international workshop on security and privacy analytics; 2016. p. 41-6. DOI: https://doi.org/10.1145/2875475.2875476

Yajamanam S, Selvin VRS, Di Troia F, Stamp M. Deep Learning versus Gist Descriptors for Image-based Malware Classification. In: Icissp; 2018. p. 553-61. DOI: https://doi.org/10.5220/0006685805530561

Iadarola G, Martinelli F, Mercaldo F, Santone A, et al. Image-based Malware Family Detection: An Assessment between Feature Extraction and Classification Techniques. In: IoTBDS; 2020. p. 499-506. DOI: https://doi.org/10.5220/0009817804990506

Santos I, Brezo F, Nieves J, Penya YK, Sanz B, Laorden C, et al. Idea: Opcode-sequence-based malware detection. In: Engineering Secure Software and Systems: Second International Symposium, ESSoS 2010, Pisa, Italy, February 3-4, 2010. Proceedings 2. Springer; 2010. p. 35-43. DOI: https://doi.org/10.1007/978-3-642-11747-3_3

Santos I, Brezo F, Ugarte-Pedrero X, Bringas PG. Opcode sequences as representation of executables for datamining-based unknown malware detection. information Sciences. 2013;231:64-82. DOI: https://doi.org/10.1016/j.ins.2011.08.020

Gittins Z, Soltys M. Malware persistence mechanisms. Procedia Computer Science. 2020;176:88-97. DOI: https://doi.org/10.1016/j.procs.2020.08.010

Cesare S, Xiang Y. Classification of malware using structured control flow. In: Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing-Volume 107. Citeseer; 2010. p. 61-70.

Yan J, Yan G, Jin D. Classifying malware represented as control flow graphs using deep graph convolutional neural network. In: 2019 49th annual IEEE/IFIP international conference on dependable systems and networks (DSN). IEEE; 2019. p. 52-63. DOI: https://doi.org/10.1109/DSN.2019.00020

Kumar A, Kuppusamy K, Aghila G. A learning model to detect maliciousness of portable executable using integrated feature set. Journal of King Saud University- Computer and Information Sciences. 2019;31(2):252-65. DOI: https://doi.org/10.1016/j.jksuci.2017.01.003

Morales JA, Al-Bataineh A, Xu S, Sandhu R. Analyzing and exploiting network behaviors of malware. In: Security and Privacy in Communication Networks: 6th International ICST Conference, SecureComm 2010, Singapore, September 7-9, 2010. Proceedings 6. Springer; 2010. p. 20-34. DOI: https://doi.org/10.1007/978-3-642-16161-2_2

Messabi KA, Aldwairi M, Yousif AA, Thoban A, Belqasmi F. Malware detection using dns records and domain name features. In: Proceedings of the 2nd International Conference on Future Networks and Distributed Systems; 2018. p. 1-7. DOI: https://doi.org/10.1145/3231053.3231082

Maniriho P, Mahmood AN, Chowdhury MJM. A Survey of Recent Advances in Deep Learning Models for Detecting Malware in Desktop and Mobile Platforms. arXiv [csCR]. 2022.

Burks R, Islam KA, Lu Y, Li J. Data Augmentation with Generative Models for Improved Malware Detection: A Comparative Study. In: 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON); 2019. p. 0660-5. DOI: https://doi.org/10.1109/UEMCON47517.2019.8993085

Ahmadi M, Giacinto G, Ulyanov D, Semenov S, Trofimov M. Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification. CoRR. 2015;abs/1511.04317. DOI: https://doi.org/10.1145/2857705.2857713

Lu Y, Li J. Generative Adversarial Network for Improving Deep Learning Based Malware Classification. In: 2019 Winter Simulation Conference (WSC); 2019. p. 584-93. DOI: https://doi.org/10.1109/WSC40007.2019.9004932

Bae J, Lee C. Easy Data Augmentation for Improved Malware Detection: A Comparative Study. In: 2021 IEEE International Conference on Big Data and Smart Computing (BigComp); 2021. p. 214-8. DOI: https://doi.org/10.1109/BigComp51126.2021.00048

Saxena S. Understanding Embedding Layers in Keras;. Accessed Jun 26, 2024. https://medium.com/analytics-vidhya/understanding-embedding-layer-in-keras-bbe3ff1327ce.

Downloads

Published

09-07-2024

How to Cite

[1]
A. Choi, A. Giang, S. Jumani, D. Luong, and F. Di Troia, “Synthetic Malware Using Deep Variational Autoencoders and Generative Adversarial Networks”, EAI Endorsed Trans IoT, vol. 10, Jul. 2024.