Machine Learning in Cybersecurity: Advanced Detection and Classification Techniques for Network Traffic Environments

Authors

  • Samer El Hajj Hassan IU International University of Applied Sciences image/svg+xml
  • Nghia Duong-Trung German Research Centre for Artificial Intelligence image/svg+xml

DOI:

https://doi.org/10.4108/eetinis.v11i3.5237

Keywords:

Machine Learning, Cybersecurity, Network Analysis, Anomaly Detection, Data Security, Traffic Classification, Network Optimization, Traffic Volume

Abstract

In the digital age, the integrity of business operations and the smoothness of their execution heavily depend on cybersecurity and network efficiency. The need for robust solutions to prevent cyber threats and enhance network functionality has never been more critical. This research aims to utilize machine learning (ML) techniques for the meticulous analysis of network traffic, with the dual goals of detecting anomalies and categorizing network activities to bolster security and performance. Employing a detailed methodology, this study begins with data preparation and progresses through to the deployment of advanced ML models, including logistic regression, decision trees, and ensemble learning techniques. This approach ensures the accuracy of the analysis and facilitates a nuanced understanding of network dynamics. Our findings indicate a notable enhancement in identifying network inefficiencies and in the more accurate classification of network traffic. The application of ML models significantly reduces network delays and bottlenecks by providing a strong defence strategy against cyber threats and network shortcomings, thereby improving user satisfaction, and boosting the organizational reputation as a secure and effective service layer. Conclusively, the research highlights the pivotal role of machine learning in network traffic analysis, offering innovative insights and fresh perspectives on anomaly detection and the identification of malicious activities. It lays a foundation for future explorations and acts as an evaluation benchmark in the fields of cybersecurity and network management.

Downloads

Download data is not yet available.

References

Bierbrauer DA, Chang A, Kritzer W, Bastian ND. Cybersecurity anomaly detection in adversarial environments. arXiv preprint arXiv:2105.06742. 2021 May 14.

Sarker IH, Furhad MH, Nowrozy R. Ai-driven cybersecurity: an overview, security intelligence modeling and research directions. SN Computer Science. 2021 May;2(3):173. DOI: https://doi.org/10.1007/s42979-021-00557-0

Makridis CA. Do data breaches damage reputation? Evidence from 45 companies between 2002 and 2018. Journal of Cybersecurity. 2021 Jan 1;7(1):tyab021. DOI: https://doi.org/10.1093/cybsec/tyab021

Kim J, Lennon SJ. Effects of reputation and website quality on online consumers' emotion, perceived risk and purchase intention: Based on the stimulus‐organism‐response model. J Res Interact Mark. 2013;7(1):33-56. https://doi.org/10.1108/17505931311316734. DOI: https://doi.org/10.1108/17505931311316734

Sarker IH, Kayes AS, Badsha S, Alqahtani H, Watters P, Ng A. Cybersecurity data science: an overview from machine learning perspective. Journal of Big data. 2020 Dec;7:1-29. DOI: https://doi.org/10.1186/s40537-020-00318-5

Alani MM. Big data in cybersecurity: a survey of applications and future trends. Journal of Reliable Intelligent Environments. 2021 Jun;7(2):85-114. DOI: https://doi.org/10.1007/s40860-020-00120-3

Rojas JS. Labeled network traffic flows 114 applications [Internet]. [cited 2020 Apr 7]. Available from: https://www.kaggle.com/jsrojas/labeled-network-traffic-flows-114-applications.

Mvula PK, Branco P, Jourdan GV, et al. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning. Discov Data. 2023;1:4. https://doi.org/10.1007/s44248-023-00003-x DOI: https://doi.org/10.1007/s44248-023-00003-x

Badonnel R, Fung C, Scott-Hayward S, Li Q, Valenza F, Hesselman C. Guest editors introduction: special section on recent advances in network security management. IEEE Trans Netw Serv Manag. 2022;19(3):2251-2254. https://doi.org/10.1109/TNSM.2022.3202426 DOI: https://doi.org/10.1109/TNSM.2022.3202426

Zakroum M, Francois J, Chrisment I, Ghogho M. Monitoring Network Telescopes and Inferring Anomalous Traffic Through the Prediction of Probing Rates. IEEE Trans Netw Serv Manag. 2022; pp.1-1. https://doi.org/10.1109/TNSM.2022.3183497. DOI: https://doi.org/10.1109/TNSM.2022.3183497

Alqudah N, Yaseen Q. Machine Learning for Traffic Analysis: A Review. Procedia Comput Sci. 2020;170:911-916. https://doi.org/10.1016/j.procs.2020.03.111 DOI: https://doi.org/10.1016/j.procs.2020.03.111

Kumar P, Pandey D, Srivastav RK, Pandey PK. Network Traffic Analysis and Prediction Using Machine Learning. Int J Res Publ Rev. 2023;4(8):2071-2075.

Bozkır R, Ci̇ci̇oğlu M, Çalhan A, Toğay C. A new platform for machine-learning-based network traffic classification. Comput Commun. 2023;208:1-14. https://doi.org/10.1016/j.comcom.2023.05.010 DOI: https://doi.org/10.1016/j.comcom.2023.05.010

Almuhammadi S, Alnajim A, Ayub M. QUIC Network Traffic Classification Using Ensemble Machine Learning Techniques. Appl Sci. 2023;13(8):4725. https://doi.org/10.3390/app13084725. DOI: https://doi.org/10.3390/app13084725

Girubagari N, Ravi TN. Methods of anomaly detection for the prevention and detection of cyber attacks. Int J Intell Eng Inform. 2023;11(4):299-316. https://doi.org/10.1504/IJIEI.2023.136097. DOI: https://doi.org/10.1504/IJIEI.2023.136097

Nour M, Slay J. The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Information Security Journal: A Global Perspective. 2016;25(3):18-31. DOI: https://doi.org/10.1080/19393555.2015.1125974

Fosić I, Žagar D, Grgić K, Križanović V. Anomaly detection in NetFlow network traffic using supervised machine learning algorithms. Journal of industrial information integration. 2023 Apr 20:100466. DOI: https://doi.org/10.1016/j.jii.2023.100466

Zoghi Z, Serpen G. UNSW‐NB15 computer security dataset: Analysis through visualization. Security and Privacy. 2024 Jan;7(1):e331. DOI: https://doi.org/10.1002/spy2.331

Moustafa N, Slay J. UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In2015 military communications and information systems conference (MilCIS) 2015 Nov 10 (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/MilCIS.2015.7348942

Ahmed Y, Azad MA, Asyhari T. Rapid Forecasting of Cyber Events Using Machine Learning-Enabled Features. Information. 2024 Jan 11;15(1):36. DOI: https://doi.org/10.3390/info15010036

Sarker IH. Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects. Ann Data Sci. 2023;10:1473–1498. https://doi.org/10.1007/s40745-022-00444-2 DOI: https://doi.org/10.1007/s40745-022-00444-2

Nassar A, Kamal M. Machine Learning and Big Data analytics for Cybersecurity Threat Detection: A Holistic review of techniques and case studies. Journal of Artificial Intelligence and Machine Learning in Management. 2021 Feb 6;5(1):51-63.

Amit I, Matherly J, Hewlett W, Xu Z, Meshi Y, Weinberger Y. Machine Learning in Cyber-Security - Problems, Challenges and Data Sets. In: The AAAI-19 Workshop on Engineering Dependable and Secure Machine Learning Systems; 2019. Available from: https://arxiv.org/abs/1812.07858v3.

Guo K, Tan Z, Luo E, Zhou X. Machine learning: The cyber-security, privacy, and public safety opportunities and challenges for emerging applications. Security and Communication Networks. 2021 Dec 3;2021:1-2. DOI: https://doi.org/10.1155/2021/9870129

Shaukat K, Luo S, Varadharajan V, Hameed IA, Chen S, Liu D, Li J. Performance comparison and current challenges of using machine learning techniques in cybersecurity. Energies. 2020 May 15;13(10):2509. DOI: https://doi.org/10.3390/en13102509

Sarker IH. Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective. SN Comput Sci. 2021;2:377. https://doi.org/10.1007/s42979-021-00765-8 DOI: https://doi.org/10.1007/s42979-021-00765-8

Wu HY, Klein K, Yan D. Effective Network Analytics: Network Visualization and Graph Data Management. IEEE Computer Graphics and Applications. 2023 May 17;43(3):10-1. DOI: https://doi.org/10.1109/MCG.2023.3267210

Aouedi O, Piamrat K, Hamma S, Perera JM. Network traffic analysis using machine learning: an unsupervised approach to understand and slice your network. Annals of Telecommunications. 2022 Jun;77(5):297-309. DOI: https://doi.org/10.1007/s12243-021-00889-1

Sarker IH. Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep Learning Perspective. SN Comput Sci. 2021;2:154. https://doi.org/10.1007/s42979-021-00535-6. DOI: https://doi.org/10.1007/s42979-021-00535-6

DeCastro-García N, Pinto E. A Data Quality Assessment Model and Its Application to Cybersecurity Data Sources. In: Herrero Á, Cambra C, Urda D, Sedano J, Quintián H, Corchado E, editors. 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020). Advances in Intelligent Systems and Computing, vol 1267. Springer, Cham; 2021. https://doi.org/10.1007/978-3-030-57805-3_25. DOI: https://doi.org/10.1007/978-3-030-57805-3_25

Bhuyan MH, Bhattacharyya DK, Kalita JK. Towards Generating Real-life Datasets for Network Intrusion Detection. Int J Netw Secur. 2015;17:683–701

Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. DOI: https://doi.org/10.1109/CISDA.2009.5356528

Deshmukh DH, Ghorpade T, Padiya P. Improving classification using preprocessing and machine learning algorithms on NSL-KDD dataset. In: Proceedings of the 2015 International Conference on Communication, Information and Computing Technology (ICCICT), Mumbai, India, 15–17 January 2015. DOI: https://doi.org/10.1109/ICCICT.2015.7045674

Nehinbe JO. A critical evaluation of datasets for investigating IDSs and IPSs researches. In Proceedings of the 2011 IEEE 10th International Conference on Cybernetic Intelligent Systems (CIS); 2016 Sep 1-2; London, UK. DOI: https://doi.org/10.1109/CIS.2011.6169141

Sharafaldin I, Lashkari AH, Ghorbani AA. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In ICISSP; 2018; Fredericton, NB, Canada. DOI: https://doi.org/10.5220/0006639801080116

Sayed A, Abdel-Hamid Y, Hefny HA. Artificial intelligence-based traffic flow prediction: a comprehensive review. J Electr Syst Inf Technol. 2023;10(13). DOI: https://doi.org/10.1186/s43067-023-00081-6

Parisineni SRA, Pal M. Enhancing trust and interpretability of complex machine learning models using local interpretable model agnostic shap explanations. Int J Data Sci Anal. 2023. DOI: https://doi.org/10.1007/s41060-023-00458-w

Aloqaily M, Kanhere S, Bellavista P, Nogueira M. Special Issue on Cybersecurity Management in the Era of AI. J Netw Syst Manag. 2022;30(39).4. DOI: https://doi.org/10.1007/s10922-022-09659-3

Kerkdijk R, Tesink S, CISA CISM CISSP, Fransen F, Falconieri F. Evidence-Based Prioritization of Cybersecurity Threats. ISACA J. 2021;65.

Chaudhary S, Gkioulos V, Katsikas S. Developing metrics to assess the effectiveness of cybersecurity awareness program. J Cybersecur. 2022;8(1).6. DOI: https://doi.org/10.1093/cybsec/tyac006

Shahzad F, Mannan A, Javed AR, Almadhor AS, Baker T, Al-Jumeily D. Cloud-based multiclass anomaly detection and categorization using ensemble learning. J Cloud Comput. 2022;11(74).7. DOI: https://doi.org/10.1186/s13677-022-00329-y

Sarker IH. Machine learning: Algorithms, real-world applications and research directions. SN computer science. 2021 May;2(3):160. DOI: https://doi.org/10.1007/s42979-021-00592-x

Jurafsky D, Martin JH. Logistic Regression. In: Speech and Language Processing. Stanford University.

Kotsiantis SB. Decision trees: a recent overview. Artificial Intelligence Review. 2013 Apr;39:261-83. DOI: https://doi.org/10.1007/s10462-011-9272-4

Breiman L. Random Forests. Machine Learning. 2001;45:5–32. https://doi.org/10.1023/A:1010933404324. DOI: https://doi.org/10.1023/A:1010933404324

Ding Y, Zhu H, Chen R, Li R. An efficient AdaBoost algorithm with the multiple thresholds classification. Applied sciences. 2022 Jun 9;12(12):5872. DOI: https://doi.org/10.3390/app12125872

Ramakrishna MT, Venkatesan VK, Izonin I, Havryliuk M, Bhat CR. Homogeneous Adaboost Ensemble Machine Learning Algorithms with Reduced Entropy on Balanced Data. Entropy. 2023;25:245. https://doi.org/10.3390/e25020245 DOI: https://doi.org/10.3390/e25020245

Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front Neurorobot. 2013 Dec 4;7:21. Available from: https://doi.org/10.3389/fnbot.2013.00021. DOI: https://doi.org/10.3389/fnbot.2013.00021

He Z, Lin D, Lau T, Wu M. Gradient boosting machine: a survey. arXiv preprint arXiv:1908.06951. 2019 Aug 19.

Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16). ACM, New York, NY, USA. 2016. pp. 785-794. doi: 10.1145/2939672.2939785. DOI: https://doi.org/10.1145/2939672.2939785

Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y. LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS'17. Red Hook, NY, USA: Curran Associates Inc.; 2017. p. 3149–3157.

Gan M, Pan S, Chen Y, Cheng C, Pan H, Zhu X. Application of the Machine Learning LightGBM Model to the Prediction of the Water Levels of the Lower Columbia River. J Mar Sci Eng. 2021;9(5):496. https://doi.org/10.3390/jmse9050496 DOI: https://doi.org/10.3390/jmse9050496

McCarty DA, Kim HW, Lee HK. Evaluation of light gradient boosted machine learning technique in large scale land use and land cover classification. Environments. 2020 Oct 3;7(10):84. DOI: https://doi.org/10.3390/environments7100084

Jain AK. Data clustering: 50 years beyond K-means. Pattern Recognit Lett. 2010;31(8):651-666. DOI: https://doi.org/10.1016/j.patrec.2009.09.011

Ahmed M, Seraj R, Islam SM. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics. 2020 Aug 12;9(8):1295. DOI: https://doi.org/10.3390/electronics9081295

Zubair M, Iqbal MA, Shil A, et al. An Improved K-means Clustering Algorithm Towards an Efficient Data-Driven Modeling. Ann Data Sci. 2022. https://doi.org/10.1007/s40745-022-00428-2 DOI: https://doi.org/10.1007/s40745-022-00428-2

Liu FT, Ting KM, Zhou Z-H. Isolation Forest. 2008 Eighth IEEE International Conference on Data Mining. doi:10.1109/ICDM.2008.17. DOI: https://doi.org/10.1109/ICDM.2008.17

Ripan RC, Sarker IH, Anwar MM, Furhad MH, Rahat F, Hoque MM, et al. An Isolation Forest Learning Based Outlier Detection Approach for Effectively Classifying Cyber Anomalies. In: Hybrid Intelligent Systems. Advances in Intelligent Systems and Computing. Springer; 2021. p. 270-279. doi: 10.1007/978-3-030-73050-5_271. DOI: https://doi.org/10.1007/978-3-030-73050-5_27

Sari FA, Alrammahi AA, Hameed AS, Alrikabi HM, Abdul–Razaq AA, Nasser HK, AL-Rifaie MF. Networks Cyber Security Model by Using Machine Learning Techniques. International Journal of Intelligent Systems and Applications in Engineering. 2022 Dec 31;10(3s):257-63.

Mughal AA. Cyber Attacks on OSI Layers: Understanding the Threat Landscape. Journal of Humanities and Applied Science Research. 2020 Jan 15;3(1):1-8.

Mayukha S, Vadivel R. Various Possible Attacks and Mitigations of the OSI Model Layers Through Pentesting – An Overview. In: Srivastava R, Pundir AKS, editors. New Frontiers in Communication and Intelligent Systems. SCRS, India; 2023. p. 799-809. doi:10.52458/978-81-95502-00-4-78. DOI: https://doi.org/10.52458/978-81-95502-00-4-78

Aslan Ö, Aktuğ SS, Ozkan-Okay M, Yilmaz AA, Akin E. A comprehensive review of cyber security vulnerabilities, threats, attacks, and solutions. Electronics. 2023 Mar 11;12(6):1333. DOI: https://doi.org/10.3390/electronics12061333

Himmat M, Ibrahim MA, Hammam N, Eldirdiery HF, Algazoli G. The Current Trends, Techniques, and Challenges of Cybersecurity. European Journal of Information Technologies and Computer Science. 2023 Oct 30;3(4):1-5. DOI: https://doi.org/10.24018/compute.2023.3.4.93

Poslavskaya E, Korolev A. Encoding categorical data: Is there yet anything 'hotter' than one-hot encoding?. arXiv preprint arXiv:2312.16930. 2023 Dec 28.

Pudjihartono N, Fadason T, Kempa-Liehr AW, O'Sullivan JM. A review of feature selection methods for machine learning-based disease risk prediction. Frontiers in Bioinformatics. 2022 Jun 27;2:927312. DOI: https://doi.org/10.3389/fbinf.2022.927312

Ahsan MM, Mahmud MP, Saha PK, Gupta KD, Siddique Z. Effect of data scaling methods on machine learning algorithms and model performance. Technologies. 2021 Jul 24;9(3):52. DOI: https://doi.org/10.3390/technologies9030052

Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. 2019;6:27. Available from: https://doi.org/10.1186/s40537-019-0192-5 DOI: https://doi.org/10.1186/s40537-019-0192-5

Elgeldawi E, Sayed A, Galal AR, Zaki AM. Hyperparameter Tuning for Machine Learning Algorithms Used for Arabic Sentiment Analysis. Informatics. 2021;8(4):79. Available from: https://doi.org/10.3390/informatics8040079 DOI: https://doi.org/10.3390/informatics8040079

Erickson BJ, Kitamura F. Magician's Corner: 9. Performance Metrics for Machine Learning Models. Radiol Artif Intell. 2021 May 12;3(3):e200126. doi: 10.1148/ryai.2021200126. DOI: https://doi.org/10.1148/ryai.2021200126

Downloads

Published

01-07-2024

How to Cite

El Hajj Hassan, S., & Duong-Trung, N. (2024). Machine Learning in Cybersecurity: Advanced Detection and Classification Techniques for Network Traffic Environments. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 11(3). https://doi.org/10.4108/eetinis.v11i3.5237