Predicting Student Dropout based on Machine Learning and Deep Learning: A Systematic Review

Authors

DOI:

https://doi.org/10.4108/eetsis.3586

Keywords:

prediction, student attrition, machine learning, deep learning

Abstract

Student dropout is one of the most complex challenges facing the education system worldwide. In order to evaluate the success of Machine Learning and Deep Learning algorithms in predicting student dropout, a systematic review was conducted. The search was carried out in several electronic bibliographic databases, including Scopus, IEEE, and Web of Science, covering up to June 2023, having 246 articles as search reports. Exclusion criteria, such as review articles, editorials, letters, and comments, were established. The final review included 23 studies in which performance metrics such as accuracy/precision, sensitivity/recall, specificity, and area under the curve (AUC) were evaluated. In addition, aspects related to study modality, training, testing strategy, cross-validation, and confounding matrix were considered. The review results revealed that the most used Machine Learning algorithm was Random Forest, present in 21.73% of the studies; this algorithm obtained an accuracy of 99% in the prediction of student dropout, higher than all the algorithms used in the total number of studies reviewed.

References

Kim D, Kim S. Sustainable Education: Analyzing the Determinants of University Student Dropout by Nonlinear Panel Data Models. Sustainability 2018;10:954. https://doi.org/10.3390/su10040954.

Niyogisubizo J, Liao L, Nziyumva E, Murwanashyaka E, Nshimyumukiza PC. Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Computers and Education: Artificial Intelligence 2022;3:100066. https://doi.org/10.1016/j.caeai.2022.100066.

Del Savio AA, Galantini K, Pachas A. Exploring the relationship between mental health-related problems and undergraduate student dropout: A case study within a civil engineering program. Heliyon 2022;8:e09504. https://doi.org/10.1016/j.heliyon.2022.e09504.

Alban M, Technical University of Cotopaxi, Faculty of Computer Science and Computer Systems, Ecuador;, Mauricio D, National University of San Marcos, Artificial Intelligence Group, Perú; Predicting University Dropout trough Data Mining: A systematic Literature. Indian Journal of Science and Technology 2019;12:1-12. https://doi.org/10.17485/ijst/2019/v12i4/139729.

Castro R. LF, Espitia P. E, Montilla AF. Applying CRISP-DM in a KDD Process for the Analysis of Student Attrition. En: Serrano C. JE, Martínez-Santos JC, editores. Advances in Computing, vol. 885, Cham: Springer International Publishing; 2018, p. 386-401. https://doi.org/10.1007/978-3-319-98998-3_30.

Andrade-Girón D, Carreño-Cisneros E, Mejía-Dominguez C, Marín-Rodriguez W, Villarreal-Torres H. Comparison of Machine Learning Algorithms for Predicting Patients with Suspected COVID-19. Salud Cienc Tecnol 2023:336. https://doi.org/10.56294/saludcyt2023336.

Murthygowda MY, Krishnegowda RG, Venkataramu SS. Crowd Behavior Analysis and Prediction using the Feature Fusion Framework. Salud Cienc Tecnol 2022:251. https://doi.org/10.56294/saludcyt2022251.

Sumathi S, Gunaseelan HG. A Review of Data and Document Clustering pertaining to various Distance Measures. Salud Cienc Tecnol 2022;2:194. https://doi.org/10.56294/saludcyt2022194.

Tyagi S. Research Productivity on Manuscripts in the field of Social Science (2010-2020). Scopus Database. Bibliotecas Anales de Investigación 2022;18.

Piscitello J, Kim YK, Orooji M, Robison S. Sociodemographic risk, school engagement, and community characteristics: A mediated approach to understanding high school dropout. Children and Youth Services Review 2022;133:106347. https://doi.org/10.1016/j.childyouth.2021.106347.

Sletten MA, Tøge AG, Malmberg-Heimonen I. Effects of an early warning system on student absence and completion in Norwegian upper secondary schools: a cluster-randomised study. Scandinavian Journal of Educational Research 2022:1-15. https://doi.org/10.1080/00313831.2022.2116481.

Mikkay Ei Leen W, Jalil NA, Salleh NM, Idris I. Dropout Early Warning System (DEWS) in Malaysia’s Primary and Secondary Education: A Conceptual Paper. En: Al-Emran M, Al-Sharafi MA, Shaalan K, editores. International Conference on Information Systems and Intelligent Applications, vol. 550, Cham: Springer International Publishing; 2023, p. 427-34. https://doi.org/10.1007/978-3-031-16865-9_33.

Chung JY, Lee S. Dropout early warning systems for high school students using machine learning. Children and Youth Services Review 2019;96:346-53. https://doi.org/10.1016/j.childyouth.2018.11.030.

Aljameel SS, Khan IU, Aslam N, Aljabri M, Alsulmi ES. Machine Learning-Based Model to Predict the Disease Severity and Outcome in COVID-19 Patients. Scientific Programming 2021;2021:1-10. https://doi.org/10.1155/2021/5587188.

Del Binifro F, Maurizio G, Giuseppe L, Stefano P. Predicción de la deserción estudiantil. Inteligencia artificial en la educación. 21a Conferencia Internacional AIED 2020, Marruecos: Springer International Publishing; 2020, p. 129-40.

Kelleher J, Mac Namee B, D’arcy A. Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies. MIT press 2020.

Géron A. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. Third edition. Beijing Boston Farnham Sebastopol Tokyo: O’Reilly; 2023.

Kabathova J, Drlik M. Towards Predicting Student’s Dropout in University Courses Using Different Machine Learning Techniques. Applied Sciences 2021;11:3130. https://doi.org/10.3390/app11073130.

Pajankar A, Joshi A. Hands-on machine learning with python-ımplement neural network solutions with scikit-learn and pytorch. NY: Apress 2022.

Orooji M, Chen J. Predicting Louisiana public high school dropout through imbalanced learning techniques. 18th IEEE International Conference On Machine Learning And Applications (ICMLA), USA: IEEE; 2019, p. 456-61.

Caballero-Cantu JJ, Chavez-Ramirez ED, Lopez-Almeida ME, Inciso-Mendo ES, Méndez Vergaray J. El aprendizaje autónomo en educación superior. Revisión sistemática. Salud, Ciencia y Tecnología 2023;3:391. https://doi.org/10.56294/saludcyt2023391.

Kishore Veparala V, Kalpana V. Big Data y diferentes enfoques de clustering subespacial: De la promoción en redes sociales al mapeo genómico. Salud, Ciencia y Tecnología 2023;3:413. https://doi.org/10.56294/saludcyt2023413.

Kumar D, Haque A, Mishra K, Islam F, Kumar Mishra B, Ahmad S. Exploring the Transformative Role of Artificial Intelligence and Metaverse in Education: A Comprehensive Review. Metaverse Basic and Applied Research 2023;2:55. https://doi.org/10.56294/mr202355.

Silva-Sánchez CA. Psychometric properties of an instrument to assess the level of knowledge about artificial intelligence in university professors. Metaverse Basic and Applied Research 2022:14. https://doi.org/10.56294/mr202214.

Sánchez Meca J. Cómo realizar una revisión sistemática y un meta-análisis. Aula abierta 2010, v 38, n 2 ; p 53-64 2010.

Serrano S, Navarro I, González M. ¿ Cómo hacer una revisión sistemática siguiendo el protocolo PRISMA?: Usos y estrategias fundamentales para su aplicación en el ámbito educativo a través de un caso práctico. Revista de pedagogía 2022;74:51-66.

Schwarzer G, Carpenter JR, Rücker G. Meta-Analysis with R. Cham: Springer International Publishing; 2015. https://doi.org/10.1007/978-3-319-21416-0.

Alexander PA. Methodological Guidance Paper: The Art and Science of Quality Systematic Reviews. Review of Educational Research 2020;90:6-23. https://doi.org/10.3102/0034654319854352.

Pigott TD, Polanin JR. Methodological Guidance Paper: High-Quality Meta-Analysis in a Systematic Review. Review of Educational Research 2020;90:24-46. https://doi.org/10.3102/0034654319877153.

Stern C, Lizarondo L, Carrier J, Godfrey C, Rieger K, Salmond S, et al. Methodological guidance for the conduct of mixed methods systematic reviews. JBI Evidence Synthesis 2020;18:2108-18. https://doi.org/10.11124/JBISRIR-D-19-00169.

Kiss V, Maldonado E, Segall M. The Use of Semester Course Data for Machine Learning Prediction of College Dropout Rates. Journal of Higher Education Theory and Practice 2022;22:64-74.

Nagy M, Molontay R. Predicting Dropout in Higher Education Based on Secondary School Performance. 2018 IEEE 22nd International Conference on Intelligent Engineering Systems (INES), Las Palmas de Gran Canaria: IEEE; 2018, p. 000389-94. https://doi.org/10.1109/INES.2018.8523888.

Rodríguez P, Villanueva A, Dombrovskaia L, Valenzuela JP. A methodology to design, develop, and evaluate machine learning models for predicting dropout in school systems: the case of Chile. Educ Inf Technol 2023. https://doi.org/10.1007/s10639-022-11515-5.

Sandoval-Palis I, Naranjo D, Vidal J, Gilar-Corbi R. Early Dropout Prediction Model: A Case Study of University Leveling Course Students. Sustainability 2020;12:9314. https://doi.org/10.3390/su12229314.

Tan M, Shao P. Prediction of Student Dropout in E-Learning Program Through the Use of Machine Learning Method. Int J Emerg Technol Learn 2015;10:11. https://doi.org/10.3991/ijet.v10i1.4189.

Dass S, Gary K, Cunningham J. Predicting Student Dropout in Self-Paced MOOC Course Using Random Forest Model. Information 2021;12:476. https://doi.org/10.3390/info12110476.

Kemper L, Vorhoff G, Wigger BU. Predicting student dropout: A machine learning approach. European Journal of Higher Education 2020;10:28-47. https://doi.org/10.1080/21568235.2020.1718520.

Aulck L, Velagapudi N, Blumenstock J, West J. Predicting Student Dropout in Higher Education 2016. https://doi.org/10.48550/ARXIV.1606.06364.

Wan Yaacob WF, Mohd Sobri N, Nasir SAM, Wan Yaacob WF, Norshahidi ND, Wan Husin WZ. Predicting Student Drop-Out in Higher Institution Using Data Mining Techniques. J Phys: Conf Ser 2020;1496:012005. https://doi.org/10.1088/1742-6596/1496/1/012005.

Lee S, Chung JY. The Machine Learning-Based Dropout Early Warning System for Improving the Performance of Dropout Prediction. Applied Sciences 2019;9:3093. https://doi.org/10.3390/app9153093.

Kashyap A, Nayak A. Different Machine Learning Models to Predict Dropouts in MOOCs. 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Bangalore: IEEE; 2018, p. 80-5. https://doi.org/10.1109/ICACCI.2018.8554547.

Liang J, Li C, Zheng L. Machine learning application in MOOCs: Dropout prediction. 2016 11th International Conference on Computer Science & Education (ICCSE), Nagoya, Japan: IEEE; 2016, p. 52-7. https://doi.org/10.1109/ICCSE.2016.7581554.

Delen D. A comparative analysis of machine learning techniques for student retention management. Decision Support Systems 2010;49:498-506. https://doi.org/10.1016/j.dss.2010.06.003.

Dekker G, Pechenizkiy M, Vleeshouwers J. Predicting Students Drop Out. A Case Study. International Working Group on Educational Data Mining. Educational Data Mining 2009:41-50.

Rodríguez-Muñiz LJ, Bernardo AB, Esteban M, Díaz I. Dropout and transfer paths: What are the risky profiles when analyzing university persistence with machine learning techniques? PLoS ONE 2019;14:e0218796. https://doi.org/10.1371/journal.pone.0218796.

Lázaro Alvarez N, Callejas Z, Griol D. Predicting Computer Engineering students’ dropout in Cuban Higher Education with pre-enrollment and early performance data. J Technol Sci Educ 2020;10:241. https://doi.org/10.3926/jotse.922.

Yukselturk E, Ozekes S, Turel Y. Predicting Dropout Student: An Application of Data Mining Methods in an Online Education Program. European Journal of Open, Distance and e‐Learning 2014;17:118-33.

Yadav SK, Bharadwaj B, Pal S. Mining Education Data to Predict Student’s Retention: A comparative Study 2012. https://doi.org/10.48550/ARXIV.1203.2987.

Dewan MAA, Lin F, Wen D, Kinshuk. Predicting Dropout-Prone Students in E-Learning Education System. 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom), Beijing: IEEE; 2015, p. 1735-40. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.315.

Bayona Arévalo Y, Bolaño García M. Scientific production on dialogical pedagogy: a bibliometric analysis. Data & Metadata 2023:7. https://doi.org/10.56294/dm20237.

Gonzalez-Argote D. Thematic Specialization of Institutions with Academic Programs in the Field of Data Science. Data & Metadata 2023:24. https://doi.org/10.56294/dm202324.

Olusegun Oyetola S, Oladokun BD, Ezinne Maxwell C, Obotu Akor S. Artificial intelligence in the library: Gauging the potential application and implications for contemporary library services in Nigeria. Data & Metadata 2023;2:36. https://doi.org/10.56294/dm202336.

Schunck PJ. Construir el conocimiento interdisciplinar desde experiencias critico-decoloniales en educación. Salud, Ciencia y Tecnología - Serie de Conferencias 2023;2:74. https://doi.org/10.56294/sctconf202374.

Vergara Danies SD, Ariza Celis DC, Perpiñan Duitama LM. Strategic guidelines for intelligent traffic control. Data & Metadata 2023;2:51. https://doi.org/10.56294/dm202351.

Xiao T, Zhu J, Liu T. Bagging and Boosting statistical machine translation systems. Artificial Intelligence 2013;195:496-527. https://doi.org/10.1016/j.artint.2012.11.005.

Charles Z, Papailiopoulos D. Stability and generalization of learning algorithms that converge to global optima. International Conference on Machine Learning, s. f., p. 745-54.

Ying X. An Overview of Overfitting and its Solutions. J Phys: Conf Ser 2019;1168:022022. https://doi.org/10.1088/1742-6596/1168/2/022022.

Ghimire B, Rogan J, Galiano VR, Panday P, Neeti N. An Evaluation of Bagging, Boosting, and Random Forests for Land-Cover Classification in Cape Cod, Massachusetts, USA. GIScience & Remote Sensing 2012;49:623-43. https://doi.org/10.2747/1548-1603.49.5.623.

Yaman E, Subasi A. Comparison of Bagging and Boosting Ensemble Machine Learning Methods for Automated EMG Signal Classification. BioMed Research International 2019;2019:1-13. https://doi.org/10.1155/2019/9152506.

Altman N, Krzywinski M. Ensemble methods: bagging and random forests. Nature Methods 2017:933-5.

Bacigalupe MDLA. Emociones y movimiento en el estudio inter(trans)disciplinario del comportamiento humano desde dentro. Salud, Ciencia y Tecnología - Serie de Conferencias 2023;2:83. https://doi.org/10.56294/sctconf202383.

Gamboa Rosales NK, Celaya-Padilla JM, Galván-Tejada CE, Galván-Tejada JI, Luna-García H, Gamboa-Rosales H, et al. Infotainment technology based on artificial intelligence: Current research trends and future directions. Iberoamerican Journal of Science Measurement and Communication 2022;2. https://doi.org/10.47909/ijsmc.144.

Jiménez-Pitre I, Molina-Bolívar G, Gámez Pitre R. Systemic vision of the technological educational context in Latin America. Region Cientifica 2023:202358. https://doi.org/10.58763/rc202358.

Laplagne Sarmiento C, Urnicia JJ. B-learning protocols for information literacy in Higher Education. Region Cientifica 2023:202373. https://doi.org/10.58763/rc202373.

Silva Júnior EMD, Dutra ML. A roadmap toward the automatic composition of systematic literature reviews. Iberoamerican Journal of Science Measurement and Communication 2021;1:1-22. https://doi.org/10.47909/ijsmc.52.

Kavzoglu T, Teke A. Predictive Performances of Ensemble Machine Learning Algorithms in Landslide Susceptibility Mapping Using Random Forest, Extreme Gradient Boosting (XGBoost) and Natural Gradient Boosting (NGBoost). Arab J Sci Eng 2022;47:7367-85. https://doi.org/10.1007/s13369-022-06560-8.

Basantes E, Ortega C, Valle V. Innovadora gestión del conocimiento para el aprendizaje cooperativo en la Educación Básica Superior. Bibliotecas Anales de Investigación 2023;19.

Musiño C, Alvarado J. Las metodologías aplicadas en los artículos científicos de las Ciencias Bibliotecaria y de la Información, y Big Data. Bibliotecas Anales de Investigación 2021;17.

Tiwari P, Chaudhary S, Majhi D, Mukherjee B. Comparing research trends through author-provided keywords with machine extracted terms: A ML algorithm approach using publications data on neurological disorders. Iberoamerican Journal of Science Measurement and Communication 2023;3. https://doi.org/10.47909/ijsmc.36.

Takaki P, Dutra M. Data science in education: interdisciplinary contributions. En: Rodrigues Dias TM, editor. Advanced Notes in Information Science, vol. 2, ColNes Publishing; 2022. https://doi.org/10.47909/anis.978-9916-9760-3-6.94.

Ruiz-Mori I, Romero-Carazas R, Espíritu-Martínez A, Mamani-Jilaja D, Valero-Ancco N, Flores-Chambilla S. Análisis bibliométrico de la producción científica sobre competencia y brecha digitales. Bibliotecas Anales de Investigación 2023.

Zaina R, Ramos VFC, De Araujo GM. Automated triage of financial intelligence reports. En: Rodrigues Dias TM, editor. Advanced Notes in Information Science, vol. 2, ColNes Publishing; 2022. https://doi.org/10.47909/anis.978-9916-9760-3-6.115.

Adetunji AB, Akande ON, Ajala FA, Oyewo O, Akande YF, Oluwadara G. House Price Prediction using Random Forest Machine Learning Technique. Procedia Computer Science 2022;199:806-13. https://doi.org/10.1016/j.procs.2022.01.100.

Albarracín Vanoy RJ. STEM Education as a Teaching Method for the Development of XXI Century Competencies. Metaverse Basic and Applied Research 2022:21. https://doi.org/10.56294/mr202221.

Catrambone R, Ledwith A. Enfoque interdisciplinario en el acompañamiento de las trayectorias académicas: formación docente y psicopedagógica en acción. Interdisciplinary Rehabilitation / Rehabilitación Interdisciplinaria 2021;3.

Junco Luna G. Study on the impact of artificial intelligence tools in the development of university classes at the school of communication of the Universidad Nacional José Faustino Sánchez Carrión. Metaverse Basic and Applied Research 2023;2:51. https://doi.org/10.56294/mr202351.

Nahi HA, Asaad Hasan M, Hussein Lazem A, Ayad Alkhafaji M. Securing Virtual Architecture of Smartphones based on Network Function Virtualization. Metaverse Basic and Applied Research 2023:37. https://doi.org/10.56294/mr202337.

Simhan L, Basupi G. None Deep Learning Based Analysis of Student Aptitude for Programming at College Freshman Level. Data & Metadata 2023;2:38. https://doi.org/10.56294/dm202338.

Malek N, Yaacob W, Wah Y, Md Nasir S, Shaadam N, Indratno S. Comparison of ensemble hybrid sampling with bagging and boosting machine learning approach for imbalanced data. IJEECS s. f.;29.

Pu L, Shamir R. 4CAC: 4-class classifier of metagenome contigs using machine learning and assembly graphs. Bioinformatics; 2023. https://doi.org/10.1101/2023.01.20.524935.

Downloads

Published

18-07-2023

How to Cite

1.
Andrade-Girón D, Sandivar-Rosas J, Marín-Rodriguez W, Susanibar-Ramirez E, Toro-Dextre E, Ausejo-Sanchez J, Villarreal-Torres H, Angeles-Morales J. Predicting Student Dropout based on Machine Learning and Deep Learning: A Systematic Review. EAI Endorsed Scal Inf Syst [Internet]. 2023 Jul. 18 [cited 2024 May 5];10(5). Available from: https://publications.eai.eu/index.php/sis/article/view/3586

Most read articles by the same author(s)