Artificial Intelligence-Driven Early Prediction of Student Dropout and Academic Outcomes in Higher Education: A Comparative Study of Advanced Machine Learning Approaches
DOI:
https://doi.org/10.4108/eetinis.131.11758Keywords:
Dropout prediction, machine learning, student performance predictionAbstract
Student dropout in higher education remains a critical challenge with significant academic, social, and economic implications. Early identification of students at risk of dropout enables institutions to design timely and targeted interventions that support academic success and improve retention rates. This study proposes a machine learning (ML)–driven framework for the early prediction of student dropout and academic outcomes in higher education using a comprehensive, real-world dataset collected from a higher education institution. The prediction task is formulated as a multiclass classification problem with three outcomes: dropout, enrolled, and graduate. To evaluate the effectiveness of different modeling approaches, we conduct a comparative analysis of widely used ML algorithms, including Logistic Regression, Naïve Bayes, k-Nearest Neighbors, Support Vector Machine, Decision Trees, Random Forest (RF), AdaBoost, XGBoost, LightGBM, and CatBoost. Results indicate that ensemble models achieve the best performance. RF attains the highest test accuracy (0.7797) and ROC-AUC (OvR) (0.8919), while LightGBM yields the best Macro-F1 (0.7082). Feature importance analysis shows that early academic progress indicators (approved units and semester grades) are the strongest predictors, followed by selected administrative/contextual factors such as tuition-fee status and course. Overall, this study provides empirical evidence supporting the use of ML techniques as effective decision-support tools for higher education institutions. The proposed framework offers actionable insights for administrators and policymakers seeking to develop data-driven strategies aimed at reducing dropout rates, improving academic success, and promoting equitable access to educational opportunities.
Downloads
References
[1] J. Rajni and D. B. Malaya, “Predictive analytics in a higher education context,” IT Prof., vol. 17, no. 4, pp. 24– 33, Jul. 2015.
[2] Y. Wang, F. You, and Q. Li, “Machine learning algorithms for fostering innovative education for university students,” Electronics, vol. 13, no. 8, pp. 1506–1519, Apr. 2024.
[3] J. L. Rastrollo-Guerrero, J. A. Gomez-Pulido, and A. Duran-Dominguez, “Analyzing and predicting students’ performance by means of machine learning: A review,” Appl. Sci., vol. 10, no. 3, pp. 1042–1057, Feb. 2020.
[4] K. Ahmad, W. Iqbal, A. El-Hassan, J. Qadir, D. Benhaddou, M. Ayyash, and A. Al-Fuqaha, “Data-driven artificial intelligence in education: A comprehensive review,” IEEE Trans. Learn. Technol., vol. 17, pp. 12–31, Sep. 2023.
[5] T. Shaik, X. Tao, Y. Li, C. Dann, J. McDonald, P. Redmond, and L. Galligan, “A review of the trends and challenges in adopting natural language processing methods for education feedback analysis,” IEEE Access, vol. 10, pp. 56 720–56 739, May 2022.
[6] I. Gligorea, M. Cioca, R. Oancea, A.-T. Gorski, H. Gorski, and P. Tudorache,“Adaptive learning using artificial intelligence in e-learning: A literature review,” Educ. Sci., vol. 13, no. 12, pp. 1216–1242, Dec. 2023.
[7] B. Albreiki, N. Zaki, and H. Alashwal, “A systematic literature review of students’ performance prediction using machine learning techniques,” Educ. Sci., vol. 11, no. 9, pp. 552–578, Sep. 2021.
[8] B. Sekeroglu, R. Abiyev, A. Ilhan, M. Arslan, and J. B. Idoko, “Systematic literature review on machine learning and student performance prediction: Critical gaps and possible remedies,” Appl. Sci., vol. 11, no. 22, pp. 10 907–10 929, Nov. 2021.
[9] L. Yan and Y. Liu, “An ensemble prediction model for potential student recommendation using machine learning,” Symmetry, vol. 12, no. 5, pp. 728–744, May 2020.
[10] F. Saleem, Z. Ullah, B. Fakieh, and F. Kateb, “Intelligent decision support system for predicting students’ e-learning performance using ensemble machine learning,” Mathematics, vol. 9, no. 17, pp. 2078–2099, Aug. 2021.
[11] N. A. Butt, Z. Mahmood, K. Shakeel, S. Alfarhood, M. Safran, and I. Ashraf, “Performance prediction of students in higher education using multi-model ensemble approach,”IEEE Access, vol. 11, pp. 136 091– 136 108, Dec. 2023.
[12] E. Alhazmi and A. Sheneamer, “Early predicting of students performance in higher education,”IEEE Access, vol. 11, pp. 27 579–27 589, Mar. 2023.
[13] B. Pardamean, T. Suparyanto, T. W. Cenggoro, D. Sudigyo, and A. Anugrahana, “AI-based learning style prediction in online learning for primary education,” IEEE Access, vol. 10, pp. 35 725–35 735, Apr. 2022.
[14] M. Zafari, A. Sadeghi-Niaraki, S.-M. Choi, and A. Esmaeily,“A practical model for the evaluation of high school student performance based on machine learning,” Appl. Sci., vol. 11, no. 23, pp. 11 534–11 550, Dec. 2021.
[15] D. Sobnath, T. Kaduk, I. U. Rehman, and O. Isiaq, “Feature selection for UK disabled students’ engagement post higher education: A machine learning approach for a predictive employment model,” IEEE Access, vol. 8, pp. 159 530–159 541, Sep. 2020.
[16] H. E. Abdelkader, A. G. Gad, A. A. Abohany, and S. E. Sorour, “An efficient data mining technique for assessing satisfaction level with online learning for higher education students during the COVID-19,” IEEE Access, vol. 10, pp. 6286–6303, Jan. 2022.
[17] M. Nafuri, A. F. Sani, N. S. Zainudin, A. H. A. Rahman, and M. Aliff, “Clustering analysis for classifying student academic performance in higher education,”Appl. Sci., vol. 12, no. 19, pp. 9467–9488, Sep. 2022.
[18] G. Latif, S. E. Abdelhamid, K. S. Fawagreh, G. B. Brahim, and R. Alghazo, “Machine learning in higher education: Students’ performance assessment considering online activity logs,” IEEE Access, vol. 11, pp. 69 586–69 600, Jul. 2023.
[19] N. I. Mohd Talib, N. A. Abd Majid, and S. Sahran, “Identification of student behavioral patterns in higher education using k-means clustering and support vector machine,” Appl. Sci., vol. 13, no. 5, pp. 3267–3280, Mar. 2023.
[20] M. V. Martins, L. Baptista, J. Machado, and V. Realinho, “Multi-class phased prediction of academic performance and dropout in higher education,” Appl. Sci., vol. 13, no. 8, pp. 4702–4716, Apr. 2023.
[21] A. S. Mohammad, M. T. S. Al-Kaltakchi, J. Alshehabi Al- Ani, and J. A. Chambers, “Comprehensive evaluations of student performance estimation via machine learning,” Mathematics, vol. 11, no. 14, pp. 3153–3168, Jul. 2023.
[22] D. Alboaneen, M. Almelihi, R. Alsubaie, R. Alghamdi, L. Alshehri, and R. Alharthi, “Development of a web-based prediction system for students’ academic performance,” Data, vol. 7, no. 2, pp. 21–39, Jan. 2022.
[23] M. Maphosa,W. Doorsamy, and B. Paul, “Improving academic advising in engineering education with machine learning using a real-world dataset,” Algorithms, vol. 17, no. 2, pp. 85–107, Feb. 2024.
[24] B. Holicza and A. Kiss, “Predicting and comparing students’ online and offline academic performance using machine learning algorithms,” Behav. Sci., vol. 13, no. 4, pp. 289–309, Mar. 2023.
[25] M. O. Hegazi, B. Almaslukh, and K. Siddig, “A fuzzy model for reasoning and predicting student’s academic performance,” Appl. Sci., vol. 13, no. 8, pp. 5140–5163, Apr. 2023.
[26] V. Realinho, J. Machado, L. Baptista, and M. V. Martins, “Predict students’ dropout and academic success,” Dec. 2021. [Online]. Available: https: //doi.org/10.5281/zenodo.5777340
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Nghia Trong Vo, Quang Nhat Le, Hang Le

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 3.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.