Classification model for student dropouts using machine learning: A case study
DOI:
https://doi.org/10.4108/eetsis.vi.3455Keywords:
autoML, machine learning, Student dropout, higher education, H2O.ai, data miningAbstract
Information and communication technologies have been fulfilling a highly relevant role in the different fields of knowledge, addressing problems in various disciplines; there is an increased capacity to identify patterns and anomalies in an organization's data using data mining; In this context, the study aimed to develop a classification model for student dropout, applying machine learning with the autoML method of the H2O.ai framework; the dimensionality of the socioeconomic and academic characteristics has been taken into account, with the purpose that the directors make reasonable decisions to counteract the abandonment of the students in the study programs. The methodology used was of a technological type, purposeful level, incremental innovation, temporal scope, and synchronous; data collection was prospective. For this, a 20-item questionnaire was applied to 237 students enrolled in the master's degree programs in the education of the Graduate School. The research resulted in a supervised machine learning model, Gradient Reinforcement Machine (GBM), to classify student dropout, thus identifying the main associated factors that influence dropout, obtaining a Gini coefficient of 92.20%, AUC of 96.10% and a LogLoss of 24.24% representing a model with efficient performance.
References
Ajgaonkar, S. (2022). Practical Automated Machine Learning Using H2O.ai: Discover the power of automated machine learning, from experimentation through to deployment to production. Packt Publishing.
Andrade-Girón, D., Carreño-Cisneros, E., Mejía-Dominguez, C., Marín-Rodriguez, W., & Villarreal-Torres, H. (2023). Comparación de Algoritmos Machine Learning para la Predicción de Pacientes con Sospecha de COVID-19. Salud, Ciencia Y Tecnología, 3, 336. https://doi.org/10.56294/saludcyt2023336
Anzanello, M. J., & Fogliatto, F. S. (2011). Learning curve models and applications: Literature review and research directions. International Journal of Industrial Ergonomics, 41(5), 573–583. https://doi.org/10.1016/j.ergon.2011.05.001
Aragón-Royón, F., Jiménez-Vílchez, A., Arauzo-Azofra, A. & Benitez, J. (2020). “FSinR: an exhaustive package for feature selection.” arXiv e-prints, arXiv: 2002. 10330. 2002. 10330, https://arxiv.org/abs/2002.10330.
AutoML. (2022, 15 de diciembre). AutoML | Home. https://www.automl.org/automl/
Bean, J. P. & Eaton, S. (2001). The psychology underlying successful retention practices. Journal of College Student Retention Research, Theory & Practice Vol. 3, N° 1: 73-89.
Berger, J. & Milem, J. (2000). Organizational Behavior in Higher Education and Student Outcomes. In: J. Smart (Ed.), Higher Education: Handbook of theory and research. Vol. 15: 268-338.
Berger, J. (2002). Understanding the Organizational Nature of Student Persistence: Empirically based Recommendations for Practice. Journal of College Student Retention: Research, Theory and Practice. Vol. 3, N° 1: 3-21.
Bayona Arévalo, Y., & Bolaño García, M. (2023). Scientific production on dialogical pedagogy: a bibliometric analysis. Data & Metadata, 2, 7. https://doi.org/10.56294/dm20237
Cabrera, A., Nora, A. & Castañeda, M. (1992). The role of finances in the persistence process: a structural model. Research in Higher Education. Vol 33, N° 5: 303-336.
Cabrera, A., Nora, A. & Castañeda, M. (1993). College Persistence: structural Equations modelling test of Integrated model of student retention. Journal of Higher Education. Vol. 64, N° 2: 123-320.
Carrión Ramírez, B. M., Córdova Medina, H. M., Murillo Párraga, M. V., & Del Campo Saltos, G. S. (2023). Health and Inclusive Higher Education: Evaluation of the Impact of Policies and Programs for People with Disabilities in Ecuador. Salud, Ciencia Y Tecnología, 3, 361. https://doi.org/10.56294/saludcyt2023361
Castellanos, S., & Figueroa, C. (2023). Cognitive accessibility in health care institutions. Pilot study and instrument proposal. Data & Metadata, 2, 22. https://doi.org/10.56294/dm202322
Chatterjee, P., Yazdani, M., Fernández-Navarro, F., & Pérez-Rodríguez, J. (2023). Machine Learning Algorithms and Applications in Engineering. CRC Press. https://doi.org/10.1201/9781003104858
Deng, H. (2013). Guided Random Forest in the RRF Package. ArXiv: 1306.0237 (9 de noviembre de 2021). Tasa de deserción en educación universitaria. Diario oficial El Peruano https://elperuano.pe/noticia/132960-tasa-de-desercion-en-educacion-universitaria-se-redujo-a-115
Díaz, C. (2008). Modelo Conceptual para la Deserción Estudiantil Universitaria Chilena. Estudios Pedagógicos (Valdivia), 34(2), 65-86. https://dx.doi.org/10.4067/S0718-07052008000200004
Do Carmo, D., & da Silva Lemos, D. L. (2022). Quality standards for data and metadata addressed to data science applications. Advanced Notes in Information Science, 2, 161–170. https://doi.org/10.47909/anis.978-9916-9760-3-6.116
Driss Hanafi, M., Lali, K., Kably, H., & Chakor, A. (2023). The English Proficiency and the Inevitable Resort to Digitalization: A Direction to Follow and Adopt to Guarantee the Success of Women Entrepreneurs in the World of Business and Enterprises. Data & Metadata, 2, 42. https://doi.org/10.56294/dm202342
Dwi, M., Prasetya, A., & Pujianto, U. (2018). Technology acceptance model of student ability and tendency classification system. Bulletin of Social Informatics Theory and Application, 2(2), 47–57. https://doi.org/10.31763/businta.v2i2.113
Eccles, J. P., Adler, T. & Meece, J. (1984). Sex differences in achievement: a test of alternate theories. Journal of Personality and Social Psychology. Vol. 46, N° 1: 26-43.
Ethington, C. (1990). A psychological model of student persistence. Research in Higher Education. N° 31, Vol. 31: 279-293.
Fishbein, M. & Ajzen, I. (1975). Attitudes toward objects as predictors of simple and multiple behavioural criteria. Psycological Review. N° 81: 59-74.
González, L. E. (2005). Estudio sobre la repitencia y deserción en la educación superior chilena. Digital Observatory for higher education in Latin America and The Caribbean. IESALC – UNESCO.
González Vallejo, R. (2023). Metaverse, Society & Education. Metaverse Basic and Applied Research, 2, 49. https://doi.org/10.56294/mr202349
Haque, A. (2022). Feature Engineering & Selection for Explainable Models: A second course for data scientists. LULU Internacional.
He, X., Zhao, K., & Chu, X. (2020). AutoML: A survey of the state-of-the-art. Knowledge-Based Systems, 106622. https://doi:10.1016/j.knosys.2020.106622
Jiménez-Pitre, I., Molina-Bolívar, G., & Gámez Pitre, R. (2023). Visión sistémica del contexto educativo tecnológico en Latinoamérica. Región Científica, 2(1), 202358. https://doi.org/10.58763/rc202358
Junco Luna, G. J. (2023). Study on the impact of artificial intelligence tools in the development of university classes at the school of communication of the Universidad Nacional José Faustino Sánchez Carrión. Metaverse Basic and Applied Research, 2, 51. https://doi.org/10.56294/mr202351
Jung, A. (2022). Machine Learning. Springer Singapore. https://doi.org/10.1007/978-981-16-8193-6
Kim, L. (2016). _Information: Data Exploration with Information Theory (Weight-of-Evidence and Information Value). R package version 0.0.9, https://CRAN.R-project.org/package=Information.
Kodelja, Z. (2019). Is Machine Learning Real Learning? Robotisation, Automatisation, the End of Work and the Future of Education. CEPS Journal Vol 9 No 3. Educational Research Institute, Ljubljana, Slovenia. https://doi.org/10.26529/cepsj.709
Kuh, G. (2002). Organizational culture and student persistence: prospects and puzzles. Journal of College Student Retention. Vol. 3, N° 1: 23-39.
Kursa, M. B., & Rudnicki, W. R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1–13. https://doi.org/10.18637/jss.v036.i11.
LeDell, E. & Poirier, S. (2020). H2O AutoML: Scalable Automatic Machine Learning. 7th ICML Workshop on Automated Machine Learning (AutoML), July 2020. URL https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf.
LeDell, E., Gill, N., Aiello, S., Fu, A., Candel, A., Click, C., Kraljevic, T., Nykodym, T., Aboyoun, P., Kurka, M. & Malohlava, M. (2022). _h2o: R Interface for the 'H2O' Scalable Machine Learning Platform_. R package version 3.38.0.1, https://github.com/h2oai/h2o-3
Martínez Sánchez, R. (2023). Transforming online education: the impact of gamification on teacher training in a university environment. Metaverse Basic and Applied Research, 2, 47. https://doi.org/10.56294/mr202347
Mejías, M., Guarate Coronado, Y. C., & Jiménez Peralta, A. L. (2022). Artificial intelligence in the field of nursing. Attendance, administration and education implications. Salud, Ciencia Y Tecnología, 2, 88. https://doi.org/10.56294/saludcyt202288
Melgar, A. S., Garay-Argandoña, R., Aranda, E. A. E., & Hernández, R. M. (2020). Management risk factors in educational institutions and their impact on peruvian student dropout. Elementary Education Online, 19(4), 226–233. https://doi.org/10.17051/ILKONLINE.2020.04.124
Montes, H. (2002). La transición de la educación media a la educación superior, Retención y movilidad estudiantil en la educación superior: calidad en la educación, pp. 269-276. Publicación del Consejo Superior de Educación. Santiago.
Mushtaq, I., & Khan, S. (2012). Factors Affecting Students' Academic Performance. Global Journal of Management and Business Redearch, 12(9), 17-22. ISSN: 2249-4588
Nagarajah, T., & Poravi, G. (2019). A Review on Automated Machine Learning (AutoML) Systems. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT). https://doi:10.1109/i2ct45611.2019.9033810
Nye, J. (1976). Independence and Interdependence. Foreign Policy. Spring, Nº 22: 130-161.
Obregón Espinoza, E. L., Neri Ayala, A. C., Ramos y Yovera, S. E., Caro Soto, F. G., & Muñoz Vilela, A. J. (2023). Design Thinking as a tool for fostering innovation and entrepreneurship. Salud, Ciencia Y Tecnología, 3, 368. https://doi.org/10.56294/saludcyt2023368
OECD (2022), Education at a Glance 2022: OECD Indicators, OECD Publishing, Paris, https://doi.org/10.1787/3197152b-en
OECD (2021), Education at a Glance 2021: OECD Indicators, OECD Publishing, Paris, https://doi.org/10.1787/b35a14e5-en.
OECD (2020), Education at a Glance 2020: OECD Indicators, OECD Publishing, Paris, https://doi.org/10.1787/69096873-en.
OECD (2019), Education at a Glance 2019: OECD Indicators, OECD Publishing, Paris, https://doi.org/10.1787/f8d7880d-en.
Olusegun Oyetola, S., Oladokun, B. D., Ezinne Maxwell, C., & Obotu Akor, S. (2023). Artificial intelligence in the library: Gauging the potential application and implications for contemporary library services in Nigeria. Data & Metadata, 2, 36. https://doi.org/10.56294/dm202336
Prakash, A., Haque, A., Islam, F., & Sonal, D. (2023). Exploring the Potential of Metaverse for Higher Education: Opportunities, Challenges, and Implications. Metaverse Basic and Applied Research, 2, 40. https://doi.org/10.56294/mr202340
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Rincon Soto, I. B., & Sanchez Leon, N. S. (2022). How artificial intelligence will shape the future of metaverse. A qualitative perspective. Metaverse Basic and Applied Research, 1, 12. https://doi.org/10.56294/mr202212
Rincón Soto, I. B., Soledispa-Cañarte, B. J., Soledispa-Cañarte, P. A., Cañarte-Rodríguez, T. C., & Sarmiento-Tomalá, G. M. (2023). Neurociencia y educación en la era de la sociedad del tecno-conocimiento. Salud, Ciencia Y Tecnología - Serie De Conferencias, 2(2), 176. https://doi.org/10.56294/sctconf2023176
RStudio Team (2022). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/.
Samuel, A. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 44(1), 211-229. https://doi:10.1147/rd.441.0206
Samuel, A. M., & Garcia-Constantino, M. (2022). User-centred prototype to support wellbeing and isolation of software developers using smartwatches. Advanced Notes in Information Science, 1, 140–151. https://doi.org/10.47909/anis.978-9916-9760-0-5.125
Santos Amaral, L., Medeiros de Araújo, G., & Reinaldo de Moraes, R. A. (2022). Analysis of the factors that influence the performance of an energy demand forecasting model. Advanced Notes in Information Science, 2, 92–102. https://doi.org/10.47909/anis.978-9916-9760-3-6.111
Sharmeela, C., Sanjeevikumar, P., Sivaraman, P, & Meera, J. (2022). IoT, Machine Learning and Blockchain Technologies for Renewable Energy and Modern Hybrid Power Systems. River Publishers.
Simhan, L., & Basupi, G. (2023). None Deep Learning Based Analysis of Student Aptitude for Programming at College Freshman Level. Data & Metadata, 2, 38. https://doi.org/10.56294/dm202338
Spady, W. (1970). Dropouts from higher education: an interdisciplinary review and synthesis. Interchange. Vol. 19, Nº 1: 109-121.
St. John, E., Cabrera, A., Nora, A. & Asker, E. (2000). Economic influences on persistence. In: J. M. Braxton. Reworking the student departure puzzle: New theory and research on college student retention. Nashville: Vanderbilt University Press. pp. 29-47.
Superintendencia Nacional de Educación Superior Universitaria [SUNEDU]. (2020). II Informe bienal sobre la realidad universitaria en el Perú. https://cdn.www.gob.pe/uploads/document/file/1230044/Informe%20Bienal.pdf
Takaki, P., & Dutra, M. (2022). Data science in education: interdisciplinary contributions. Advanced Notes in Information Science, 2, 149–160. https://doi.org/10.47909/anis.978-9916-9760-3-6.94
Tinto, V. (1982). Limits of theory and practice of student attrition. Journal of Higher Education. Vol. 3, Nº 6: 687-700.
Tinto, V. (1989). Definir la deserción: una cuestión de perspectiva. Revista de Educación Superior Nº 71, ANUIES, México.
Truong, A., Walters, A., Goodsitt, J., Hines, K., Bruss, C. B., & Farivar, R. (2019). Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools. 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). https://doi:10.1109/ictai.2019.00209
Vakhrushev, A., Ryzhkov, A., Savchenko, M., Simakov, D., Damdinov, R. and Tuzhilin, A. (2021). LightAutoML: AutoML Solution for a Large Financial Services Ecosystem. Choice Reviews Online, 45(02), 45–0602—45–0602. https://doi.org/10.5860/choice.45-0602
Villarreal-Torres, H., Marín-Rodriguez, W., Ángeles-Morales, J. & Cano-Mejía, J. (2021). Gestión de Tecnología de Información para universidades peruanas aplicando computación en la nube. Revista Venezolana de Gerencia, 26 (Especial 6), 665-679. https://doi.org/10.52080/rvgluz.26. e6.40
Xu, W., & Li, W. (2014). Granular Computing Approach to Two-Way Learning Based on Formal Concept Analysis in Fuzzy Datasets. IEEE Transactions on Cybernetics, 46(2), 366–379. https://doi:10.1109/tcyb.2014.2361772
Zaina, R. Z., Culmant Ramos, V. F., & Medeiros de Araujo, G. (2022). Automated triage of financial intelligence reports. Advanced Notes in Information Science, 2, 24–33. https://doi.org/10.47909/anis.978-9916-9760-3-6.115
Zambrano Verdesoto, G. J., Rincon Soto, I. B., & Castro Alfaro, A. (2023). Contributions of neurosciences, neuromarketing and learning processes in innovation. Salud, Ciencia Y Tecnología, 3, 396. https://doi.org/10.56294/saludcyt2023396
Zöller, M. y Huber, M. (2021). Benchmark and Survey of Automated Machine Learning Frameworks. Journal of Artificial Intelligence Research, 70, 409–472. https://doi.org/10.1613/jair.1.11854
Zwanenburg, A. & Löck, S. (2021). Familiar: End-to-End Automated Machine Learning and Model Evaluation. https://github.com/alexzwanenburg/familiar.
Zwanenburg, A. (2021). Familiar: Vignettes and Documentation. https://github.com/alexzwanenburg/familiar.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Henry Villarreal-Torres , Julio Ángeles-Morales, William Marín-Rodriguez , Daniel Andrade-Girón, Jenny Cano-Mejía , Carmen Mejía-Murillo, Gumercindo Flores-Reyes, Manuel Palomino-Márquez
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.