A Machine Learning Approach to Identifying Phishing Websites: A Comparative Study of Classification Models and Ensemble Learning Techniques
DOI:
https://doi.org/10.4108/eetsis.vi.3300Keywords:
Web Phishing, Classification techniques, Ensemble learning, Machine LearningAbstract
Phishing assaults are one of the more prevalent types of cybercrime in the world today. To steal information, users are sent emails and messages. Moreover, websites are used for it. Phishing primarily targets corporate web-sites, such as those for e-commerce, finance, and governmental organizations. In order to obtain sensitive user information, attackers impersonate websites, a phenomenon known as phishing. In addition to exploring the use of machine learning algorithms to identify and stop web phishing assaults, this research suggests utilizing machine learning techniques to detect phish-ing URLs by analysing various aspects of the URLs. The study includes classification models like Logistic Regression, Random Forest, Decision trees, KNN, Naive bayes, SVM and other ensemble learning techniques like Gradient Boosting, XGBoost, Histogram Gradient Boosting, Light Gradient Boosting and AdaBoost were used to detect phishing websites.
References
Odeh A, Keshta I, Abdelfattah E. Machine LearningTechniquesfor detection of website phishing: A review for promises and challenges. In: 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC). IEEE; 2021.
Chiew KL, Tan CL, Wong K, Yong KSC, Tiong WK. A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf Sci (Ny) [Internet]. 2019;484:153–66. Available from: http://dx.doi.org/10.1016/j.ins.2019.01.064
H. Bleau, "Global fraud and cybercrime forecast," ed: Retrieved from RSA: https://www. rsa. com/en-us/resources/2017-global-fraud/… 2017
Sirisha A, Nihitha V, Deepika B. Phishing URL detection using machine learning techniques. In: Lecture Notes in Electrical Engineering. Singapore: Springer Nature Singapore; 2021. p. 1067–80
Feroz MN, Mengel S. Phishing URL detection using URL ranking. In: 2015 IEEE International Congress on Big Data. IEEE; 2015.
Rao RS, Pais AR. Detection of phishing websites using an efficient feature-based machine learning framework. Neural Comput Appl [Internet]. 2019;31(8):3851–73. Available from: http://dx.doi.org/10.1007/s00521-017-3305-0
Sahingoz OK, Buber E, Demir O, Diri B. Machine learning based phishing de-tection from URLs. Expert Systems with Applications. 2019;117:345–57.
Vijayalakshmi M, Mercy Shalinie S, Yang MH, Meenakshi R. Web phishing detection techniques: a survey on the state‐of‐the‐art, taxonomy and future directions. IET Netw [Internet]. 2020;9(5):235–46. Available from: http://dx.doi.org/10.1049/iet-net.2020.0078
Sahingoz O, Koray E, Buber O, Demir B. Machine learn-ing based phishing detection from URLs. Expert Systems with Applications. 2019;117:345–57.
Jain AK, Gupta BB. A machine learning based approach for phishing detection using hyperlinks information. J Ambient Intell Humaniz Comput [Internet]. 2019;10(5):2015–28. Available from: http://dx.doi.org/10.1007/s12652-018-0798-z
Vrbančič G, Fister I Jr, Podgorelec V. Datasets for phishing websites detection. Data Brief [Internet]. 2020;33(106438):106438. Available from: http://dx.doi.org/10.1016/j.dib.2020.106438
Karabatak M, Mustafa T. Performance comparison of classifiers on re-duced phishing website dataset. In: 2018 6th International Symposium on Digital Fo-rensic and Security (ISDFS). IEEE; 2018. p. 1–5.
Odeh AJ, Keshta I, Abdelfattah E. Efficient detection of phishing websites using multilayer perceptron. Int J Interact Mob Technol [Internet]. 2020;14(11):22. Available from: http://dx.doi.org/10.3991/ijim.v14i11.13903
Hossin M, Nasir Sulaiman M. International journal of data mining & knowledge manage-ment process. 2015;5.
Becchetti L, Castillo C, Donato D, Leonardi S, Ba-Eza-Yates RA. Link-based characterization and detection of web spam. In: AIRWeb. 2006. p. 1–8.
Roul RK, Asthana SR, Shah M, Parikh D. Detecting spam web pages using content and link-based techniques. Sadhana [Internet]. 2016;41(2):193–202. Available from: http://dx.doi.org/10.1007/s12046-015-0460-9
Shahrivari V, Darabi MM, Izadi M. Phishing detec-tion using machine learning techniques. 2020.
Zamir A. Phishing web site detection using diverse machine learning algo-rithms. In: The Electronic Library. 2020.
Singh J, Singh J. A survey on machine learning-based malware detection in exe-cutable files. Journal of Systems Architecture. 2020;
Jyothi UP, Dabbiru M, Bonthu S, Dayal A, Kandula NR. Comparative analysis of classification methods to predict diabetes mellitus on noisy data. In: Lecture Notes in Electrical Engineering. Singapore: Springer Nature Singapore; 2023. p. 301–13.
Silpa, Rao DVVRM. Enriched big data pre-processing model with machine learning approach to investigate web user usage behavioury. Indian J Comput Sci Eng [Internet]. 2021;12(5):1248–56. Available from: http://dx.doi.org/10.21817/indjcse/2021/v12i5/211205050
Akinyelu AA, Adewumi AO. Classification of phishing email using random forest machine learning technique. J Appl Math [Internet]. 2014;2014:1–6. Available from: http://dx.doi.org/10.1155/2014/425731
Subasi A, Molah E, Almkallawi F, Chaudhery TJ. Intel-ligent phishing website detection using random forest classifier. In: 2017 International conference on electrical and computing technologies and applications (ICECTA). IEEE; 2017. p. 1–5.
Othman N, Fadzilah WISW. Youtube spam detection framework us-ing naïve bayes and logistic regression. Indonesian Journal of Electrical Engineering and Computer Science. 2019;14(3):1508–17.
Zouina M, Outtaj B. A novel lightweight URL phishing detection system using SVM and similarity index. Hum-centric Comput Inf Sci [Internet]. 2017;7(1). Available from: http://dx.doi.org/10.1186/s13673-017-0098-1
Altaher A. Phishing websites classification using hybrid SVM and KNN ap-proach. International Journal of Advanced Computer Science and Applications. 2017;8(6).
Stobbs J, Issac B, Jacob SM. Phishing web page detection using optimised machine learning. In: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). IEEE; 2020.
Pavan R, Nara M, Gopinath S, Patil N. Bayesian optimization and gradient boosting to detect phishing websites. In: 2021 55th Annual Conference on Information Sciences and Systems (CISS). IEEE; 2021.
Oram E, Dash PB, Naik B, Nayak J, Vimal S, Nataraj SK. Light gradient boosting machine-based phishing webpage detection model using phisher website features of mimic URLs. Pattern Recognit Lett [Internet]. 2021;152:100–6. Available from: http://dx.doi.org/10.1016/j.patrec.2021.09.018
Subasi A, Kremic E. Comparison of adaboost with MultiBoosting for phishing website detection. Procedia Comput Sci [Internet]. 2020;168:272–8. Available from: http://dx.doi.org/10.1016/j.procs.2020.02.251
Yang P, Zhao G, Zeng P. Phishing website detection based on multidimensional features driven by deep learning. IEEE Access [Internet]. 2019;7:15196–209. Available from: http://dx.doi.org/10.1109/access.2019.2892066
Feng F, Zhou Q, Shen Z, Yang X, Han L, Wang J. The application of a novel neural network in the detection of phishing websites. J Ambient Intell Humaniz Comput [Internet]. 2018; Available from: http://dx.doi.org/10.1007/s12652-018-0786-3
Pan Y, Sun F, Teng Z, White J, Schmidt DC, Staples J, et al. Detecting web attacks with end-to-end deep learning. J Internet Serv Appl [Internet]. 2019;10(1). Available from: http://dx.doi.org/10.1186/s13174-019-0115-x
Sridevi S. Improving the performance of automatic short answer grading using transfer learning and augmentation. Artificial Intelligence. 2023;123.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Padma Jyothi Uppalapati, Bhogesh Karthik Gontla, Priyanka Gundu, S Mahaboob Hussain , Kandula Narasimharao
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.