Classification model for student dropouts using machine learning: A case study




autoML, machine learning, Student dropout, higher education,, data mining


Information and communication technologies have been fulfilling a highly relevant role in the different fields of knowledge, addressing problems in various disciplines; there is an increased capacity to identify patterns and anomalies in an organization's data using data mining; In this context, the study aimed to develop a classification model for student dropout, applying machine learning with the autoML method of the framework; the dimensionality of the socioeconomic and academic characteristics has been taken into account, with the purpose that the directors make reasonable decisions to counteract the abandonment of the students in the study programs. The methodology used was of a technological type, purposeful level, incremental innovation, temporal scope, and synchronous; data collection was prospective. For this, a 20-item questionnaire was applied to 237 students enrolled in the master's degree programs in the education of the Graduate School. The research resulted in a supervised machine learning model, Gradient Reinforcement Machine (GBM), to classify student dropout, thus identifying the main associated factors that influence dropout, obtaining a Gini coefficient of 92.20%, AUC of 96.10% and a LogLoss of 24.24% representing a model with efficient performance.


Ajgaonkar, S. (2022). Practical Automated Machine Learning Using Discover the power of automated machine learning, from experimentation through to deployment to production. Packt Publishing.

Andrade-Girón, D., Carreño-Cisneros, E., Mejía-Dominguez, C., Marín-Rodriguez, W., & Villarreal-Torres, H. (2023). Comparación de Algoritmos Machine Learning para la Predicción de Pacientes con Sospecha de COVID-19. Salud, Ciencia Y Tecnología, 3, 336.

Anzanello, M. J., & Fogliatto, F. S. (2011). Learning curve models and applications: Literature review and research directions. International Journal of Industrial Ergonomics, 41(5), 573–583.

Aragón-Royón, F., Jiménez-Vílchez, A., Arauzo-Azofra, A. & Benitez, J. (2020). “FSinR: an exhaustive package for feature selection.” arXiv e-prints, arXiv: 2002. 10330. 2002. 10330,

AutoML. (2022, 15 de diciembre). AutoML | Home.

Bean, J. P. & Eaton, S. (2001). The psychology underlying successful retention practices. Journal of College Student Retention Research, Theory & Practice Vol. 3, N° 1: 73-89.

Berger, J. & Milem, J. (2000). Organizational Behavior in Higher Education and Student Outcomes. In: J. Smart (Ed.), Higher Education: Handbook of theory and research. Vol. 15: 268-338.

Berger, J. (2002). Understanding the Organizational Nature of Student Persistence: Empirically based Recommendations for Practice. Journal of College Student Retention: Research, Theory and Practice. Vol. 3, N° 1: 3-21.

Bayona Arévalo, Y., & Bolaño García, M. (2023). Scientific production on dialogical pedagogy: a bibliometric analysis. Data & Metadata, 2, 7.

Cabrera, A., Nora, A. & Castañeda, M. (1992). The role of finances in the persistence process: a structural model. Research in Higher Education. Vol 33, N° 5: 303-336.

Cabrera, A., Nora, A. & Castañeda, M. (1993). College Persistence: structural Equations modelling test of Integrated model of student retention. Journal of Higher Education. Vol. 64, N° 2: 123-320.

Carrión Ramírez, B. M., Córdova Medina, H. M., Murillo Párraga, M. V., & Del Campo Saltos, G. S. (2023). Health and Inclusive Higher Education: Evaluation of the Impact of Policies and Programs for People with Disabilities in Ecuador. Salud, Ciencia Y Tecnología, 3, 361.

Castellanos, S., & Figueroa, C. (2023). Cognitive accessibility in health care institutions. Pilot study and instrument proposal. Data & Metadata, 2, 22.

Chatterjee, P., Yazdani, M., Fernández-Navarro, F., & Pérez-Rodríguez, J. (2023). Machine Learning Algorithms and Applications in Engineering. CRC Press.

Deng, H. (2013). Guided Random Forest in the RRF Package. ArXiv: 1306.0237 (9 de noviembre de 2021). Tasa de deserción en educación universitaria. Diario oficial El Peruano

Díaz, C. (2008). Modelo Conceptual para la Deserción Estudiantil Universitaria Chilena. Estudios Pedagógicos (Valdivia), 34(2), 65-86.

Do Carmo, D., & da Silva Lemos, D. L. (2022). Quality standards for data and metadata addressed to data science applications. Advanced Notes in Information Science, 2, 161–170.

Driss Hanafi, M., Lali, K., Kably, H., & Chakor, A. (2023). The English Proficiency and the Inevitable Resort to Digitalization: A Direction to Follow and Adopt to Guarantee the Success of Women Entrepreneurs in the World of Business and Enterprises. Data & Metadata, 2, 42.

Dwi, M., Prasetya, A., & Pujianto, U. (2018). Technology acceptance model of student ability and tendency classification system. Bulletin of Social Informatics Theory and Application, 2(2), 47–57.

Eccles, J. P., Adler, T. & Meece, J. (1984). Sex differences in achievement: a test of alternate theories. Journal of Personality and Social Psychology. Vol. 46, N° 1: 26-43.

Ethington, C. (1990). A psychological model of student persistence. Research in Higher Education. N° 31, Vol. 31: 279-293.

Fishbein, M. & Ajzen, I. (1975). Attitudes toward objects as predictors of simple and multiple behavioural criteria. Psycological Review. N° 81: 59-74.

González, L. E. (2005). Estudio sobre la repitencia y deserción en la educación superior chilena. Digital Observatory for higher education in Latin America and The Caribbean. IESALC – UNESCO.

González Vallejo, R. (2023). Metaverse, Society & Education. Metaverse Basic and Applied Research, 2, 49.

Haque, A. (2022). Feature Engineering & Selection for Explainable Models: A second course for data scientists. LULU Internacional.

He, X., Zhao, K., & Chu, X. (2020). AutoML: A survey of the state-of-the-art. Knowledge-Based Systems, 106622. https://doi:10.1016/j.knosys.2020.106622

Jiménez-Pitre, I., Molina-Bolívar, G., & Gámez Pitre, R. (2023). Visión sistémica del contexto educativo tecnológico en Latinoamérica. Región Científica, 2(1), 202358.

Junco Luna, G. J. (2023). Study on the impact of artificial intelligence tools in the development of university classes at the school of communication of the Universidad Nacional José Faustino Sánchez Carrión. Metaverse Basic and Applied Research, 2, 51.

Jung, A. (2022). Machine Learning. Springer Singapore.

Kim, L. (2016). _Information: Data Exploration with Information Theory (Weight-of-Evidence and Information Value). R package version 0.0.9,

Kodelja, Z. (2019). Is Machine Learning Real Learning? Robotisation, Automatisation, the End of Work and the Future of Education. CEPS Journal Vol 9 No 3. Educational Research Institute, Ljubljana, Slovenia.

Kuh, G. (2002). Organizational culture and student persistence: prospects and puzzles. Journal of College Student Retention. Vol. 3, N° 1: 23-39.

Kursa, M. B., & Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1–13.

LeDell, E. & Poirier, S. (2020). H2O AutoML: Scalable Automatic Machine Learning. 7th ICML Workshop on Automated Machine Learning (AutoML), July 2020. URL

LeDell, E., Gill, N., Aiello, S., Fu, A., Candel, A., Click, C., Kraljevic, T., Nykodym, T., Aboyoun, P., Kurka, M. & Malohlava, M. (2022). _h2o: R Interface for the 'H2O' Scalable Machine Learning Platform_. R package version,

Martínez Sánchez, R. (2023). Transforming online education: the impact of gamification on teacher training in a university environment. Metaverse Basic and Applied Research, 2, 47.

Mejías, M., Guarate Coronado, Y. C., & Jiménez Peralta, A. L. (2022). Artificial intelligence in the field of nursing. Attendance, administration and education implications. Salud, Ciencia Y Tecnología, 2, 88.

Melgar, A. S., Garay-Argandoña, R., Aranda, E. A. E., & Hernández, R. M. (2020). Management risk factors in educational institutions and their impact on peruvian student dropout. Elementary Education Online, 19(4), 226–233.

Montes, H. (2002). La transición de la educación media a la educación superior, Retención y movilidad estudiantil en la educación superior: calidad en la educación, pp. 269-276. Publicación del Consejo Superior de Educación. Santiago.

Mushtaq, I., & Khan, S. (2012). Factors Affecting Students' Academic Performance. Global Journal of Management and Business Redearch, 12(9), 17-22. ISSN: 2249-4588

Nagarajah, T., & Poravi, G. (2019). A Review on Automated Machine Learning (AutoML) Systems. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT). https://doi:10.1109/i2ct45611.2019.9033810

Nye, J. (1976). Independence and Interdependence. Foreign Policy. Spring, Nº 22: 130-161.

Obregón Espinoza, E. L., Neri Ayala, A. C., Ramos y Yovera, S. E., Caro Soto, F. G., & Muñoz Vilela, A. J. (2023). Design Thinking as a tool for fostering innovation and entrepreneurship. Salud, Ciencia Y Tecnología, 3, 368.

OECD (2022), Education at a Glance 2022: OECD Indicators, OECD Publishing, Paris,

OECD (2021), Education at a Glance 2021: OECD Indicators, OECD Publishing, Paris,

OECD (2020), Education at a Glance 2020: OECD Indicators, OECD Publishing, Paris,

OECD (2019), Education at a Glance 2019: OECD Indicators, OECD Publishing, Paris,

Olusegun Oyetola, S., Oladokun, B. D., Ezinne Maxwell, C., & Obotu Akor, S. (2023). Artificial intelligence in the library: Gauging the potential application and implications for contemporary library services in Nigeria. Data & Metadata, 2, 36.

Prakash, A., Haque, A., Islam, F., & Sonal, D. (2023). Exploring the Potential of Metaverse for Higher Education: Opportunities, Challenges, and Implications. Metaverse Basic and Applied Research, 2, 40.

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL

Rincon Soto, I. B., & Sanchez Leon, N. S. (2022). How artificial intelligence will shape the future of metaverse. A qualitative perspective. Metaverse Basic and Applied Research, 1, 12.

Rincón Soto, I. B., Soledispa-Cañarte, B. J., Soledispa-Cañarte, P. A., Cañarte-Rodríguez, T. C., & Sarmiento-Tomalá, G. M. (2023). Neurociencia y educación en la era de la sociedad del tecno-conocimiento. Salud, Ciencia Y Tecnología - Serie De Conferencias, 2(2), 176.

RStudio Team (2022). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL

Samuel, A. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 44(1), 211-229. https://doi:10.1147/rd.441.0206

Samuel, A. M., & Garcia-Constantino, M. (2022). User-centred prototype to support wellbeing and isolation of software developers using smartwatches. Advanced Notes in Information Science, 1, 140–151.

Santos Amaral, L., Medeiros de Araújo, G., & Reinaldo de Moraes, R. A. (2022). Analysis of the factors that influence the performance of an energy demand forecasting model. Advanced Notes in Information Science, 2, 92–102.

Sharmeela, C., Sanjeevikumar, P., Sivaraman, P, & Meera, J. (2022). IoT, Machine Learning and Blockchain Technologies for Renewable Energy and Modern Hybrid Power Systems. River Publishers.

Simhan, L., & Basupi, G. (2023). None Deep Learning Based Analysis of Student Aptitude for Programming at College Freshman Level. Data & Metadata, 2, 38.

Spady, W. (1970). Dropouts from higher education: an interdisciplinary review and synthesis. Interchange. Vol. 19, Nº 1: 109-121.

St. John, E., Cabrera, A., Nora, A. & Asker, E. (2000). Economic influences on persistence. In: J. M. Braxton. Reworking the student departure puzzle: New theory and research on college student retention. Nashville: Vanderbilt University Press. pp. 29-47.

Superintendencia Nacional de Educación Superior Universitaria [SUNEDU]. (2020). II Informe bienal sobre la realidad universitaria en el Perú.

Takaki, P., & Dutra, M. (2022). Data science in education: interdisciplinary contributions. Advanced Notes in Information Science, 2, 149–160.

Tinto, V. (1982). Limits of theory and practice of student attrition. Journal of Higher Education. Vol. 3, Nº 6: 687-700.

Tinto, V. (1989). Definir la deserción: una cuestión de perspectiva. Revista de Educación Superior Nº 71, ANUIES, México.

Truong, A., Walters, A., Goodsitt, J., Hines, K., Bruss, C. B., & Farivar, R. (2019). Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools. 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). https://doi:10.1109/ictai.2019.00209

Vakhrushev, A., Ryzhkov, A., Savchenko, M., Simakov, D., Damdinov, R. and Tuzhilin, A. (2021). LightAutoML: AutoML Solution for a Large Financial Services Ecosystem. Choice Reviews Online, 45(02), 45–0602—45–0602.

Villarreal-Torres, H., Marín-Rodriguez, W., Ángeles-Morales, J. & Cano-Mejía, J. (2021). Gestión de Tecnología de Información para universidades peruanas aplicando computación en la nube. Revista Venezolana de Gerencia, 26 (Especial 6), 665-679. e6.40

Xu, W., & Li, W. (2014). Granular Computing Approach to Two-Way Learning Based on Formal Concept Analysis in Fuzzy Datasets. IEEE Transactions on Cybernetics, 46(2), 366–379. https://doi:10.1109/tcyb.2014.2361772

Zaina, R. Z., Culmant Ramos, V. F., & Medeiros de Araujo, G. (2022). Automated triage of financial intelligence reports. Advanced Notes in Information Science, 2, 24–33.

Zambrano Verdesoto, G. J., Rincon Soto, I. B., & Castro Alfaro, A. (2023). Contributions of neurosciences, neuromarketing and learning processes in innovation. Salud, Ciencia Y Tecnología, 3, 396.

Zöller, M. y Huber, M. (2021). Benchmark and Survey of Automated Machine Learning Frameworks. Journal of Artificial Intelligence Research, 70, 409–472.

Zwanenburg, A. & Löck, S. (2021). Familiar: End-to-End Automated Machine Learning and Model Evaluation.

Zwanenburg, A. (2021). Familiar: Vignettes and Documentation.




How to Cite

Villarreal-Torres H, Ángeles-Morales J, Marín-Rodriguez W, Andrade-Girón D, Cano-Mejía J, Mejía-Murillo C, Flores-Reyes G, Palomino-Márquez M. Classification model for student dropouts using machine learning: A case study. EAI Endorsed Scal Inf Syst [Internet]. 2023 Jun. 15 [cited 2024 Jul. 19];10(5). Available from:

Most read articles by the same author(s)