Development of a Classification Model for Predicting Student Payment Behavior Using Artificial Intelligence and Data Science Techniques




Automated Machine Learning, Higher Education, Data Mining, Delinquency


Artificial intelligence today has become a valuable tool for decision-making, where universities have to adapt and optimize their processes, improving the quality of their services. In this context, the economic income from collections is vital for sustainability. There are several problems that can contribute to student delinquency, such as economic, financial, academic, family, and personal. For this reason, the study aimed to develop a classification model to predict the payment behavior of enrolled students. The methodology is a proactive, technological study of incremental innovation with a synchronous temporal scope. The study population consisted of 8,495 undergraduate students enrolled in the 2022 - II academic semester, containing information on academic performance, financial situation, and personal factors. The result is a classification model using the platform, discretization algorithms, data balancing, and the R language. Data science algorithms obtained the base from the institution's computer system. The data sets for training and testing correspond to 70% and 30%, obtaining the GBM Grid model whose performance metrics are AUC of 0.905, AUCPR of 0.926, and logLoss equivalent to 0.311; that is, the model efficiently complies with the classification of student debtors to provide them with early intervention service and help them complete their studies.


Abdul, M., Yusoff, M. & Mohamed, A. (2022). Survey on Highly Imbalanced Multi-class Data, International Journal of Advanced Computer Science and Applications (IJACSA), 13(6).

Albarracín Vanoy, R. J. (2022). STEM Education as a Teaching Method for the Development of XXI Century Competencies. Metaverse Basic and Applied Research, 1, 21.

Andrade-Girón, D., Carreño-Cisneros, E., Mejía-Dominguez, C., Marín-Rodriguez, W., & Villarreal-Torres, H. (2023). Comparison of Machine Learning Algorithms for Predicting Patients with Suspected COVID-19. Salud, Ciencia Y Tecnología, 3, 336.

Angelov, P.P., Soares, E.A., Jiang, R., Arnold, N.I., & Atkinson, P.M. (2021). Explainable artificial intelligence: an analytical review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11. DOI:10.1002/widm.1424

Arrieta, A.B., Rodríguez, N.D., Ser, J.D., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2019). Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. ArXiv, abs/1910.10045.

Bartschat, A., Reischl, M., & Mikut, R. (2019). Data mining tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1309. doi:10.1002/widm.1309

Benito, P. V. (2022). Contemporary art and networks: Analysis of the Venus Project using the UCINET software. AWARI, 3.

Bolaño García, M., Duarte Acosta, N., & González Castro, K. (2023). Scientific production on the use of ICT as a tool for social inclusion for deaf people: a bibliometric analysis. Salud, Ciencia Y Tecnología, 3, 318.

Cárdenas Espinosa, R. D., Caicedo-Erazo, J. C., Arbeláez Londoño, M., & Jimenez Pitre, I. (2023). Inclusive Innovation through Arduino Embedded Systems and ChatGPT. Metaverse Basic and Applied Research, 2, 52.

Catrambone, A. R., & Ledwith, A. S. (2023). Acompañamiento interdisciplinar de las trayectorias académicas, en formación docente y psicopedagógica. Salud, Ciencia Y Tecnología - Serie De Conferencias, 2(1), 186.

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Comput. Electr. Eng., 40, 16-28. DOI: 10.1016/j.compeleceng.2013.11.024

Chatterjee, J., Garg, H. & Thakur, R.N. (2023). A Roadmap for Enabling Industry 4.0 by Artificial Intelligence. Wiley. ISBN 978-1-119-90485-4

Chawla, N.V. (2005). Data Mining for Imbalanced Datasets: An Overview. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA.

Chawla, N.V. (2009). Data Mining for Imbalanced Datasets: An Overview. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA.

Chen, L., Chen, P. & Lin, Z. (2020). Artificial Intelligence in Education: A Review, in IEEE Access, vol. 8, pp. 75264-75278, doi: 10.1109/ACCESS.2020.2988510.

Chryssolouris, G., Alexopoulos, K. & Arkouli, Z. (2023). Perspective on Artificial Intelligence in Manufacturing. Springer.

Cordón, I., García, S., Fernández, A. & Herrera, F. (2018). Imbalance: Oversampling algorithms for imbalanced classification in R. Knowledge-Based Systems, 161, 329-341.

Corrêa da Silva, F. C. (2022). The value of information in the face of new global disorder. AWARI, 3.

Correa Moreno, M. C., & González Castro, G. L. (2023). Unveiling Public Information in the Metaverse and AI Era: Challenges and Opportunities. Metaverse Basic and Applied Research, 2, 35.

de Araújo Telmo, F., Matos Autran, M. de M., & Araújo da Silva, A. K. (2021). Scientific production on open science in Information Science: a study based on the ENANCIB event. AWARI, 2, e027.

de Araújo Telmo, F., Matos Autran, M. de M., & Araújo da Silva, A. K. (2021). Scientific production on open science in Information Science: a study based on the ENANCIB event. AWARI, 2, e027.

do Carmo, D., & da Silva Lemos, D. L. (2022). Quality standards for data and metadata addressed to data science applications. Advanced Notes in Information Science, 2, 161–170.

Driss Hanafi, M., Lali, K., Kably, H., & Chakor, A. (2023). The English Proficiency and the Inevitable Resort to Digitalization: A Direction to Follow and Adopt to Guarantee the Success of Women Entrepreneurs in the World of Business and Enterprises. Data & Metadata, 2, 42.

Francis, B.K., Babu, S.S. Predicting Academic Performance of Students Using a Hybrid Data Mining Approach. J Med Syst 43, 162 (2019).

Fryda, T., LeDell, E., Gill, N., Aiello, S., Fu, A., Candel, A., Click, C., Kraljevic, T., Nykodym, T., Aboyoun, P., Kurka, M., Malohlava, M., Poirier, S., Wong, W. (2023). h2o: R Interface for the 'H2O' Scalable Machine Learning Platform. R package version,

Garg, S.K., & Sharma, A.K. (2013). Comparative Analysis of Various Data Mining Techniques on Educational Datasets. International Journal of Computer Applications, 74, 1-5.

Gazzola, A. (18 de octubre de 2021). Educación superior en América Latina y Caribe, presente y futuro. UNESCO.

Ghanem, A. S., Venkatesh, S., & West, G. (2008). Learning in imbalanced relational data. 2008 19th International Conference on Pattern Recognition. doi:10.1109/icpr.2008.4761095

Hall, M. (1999). Correlation-based Feature Selection for Machine Learning [Tesis doctoral, Universidad de Waikato]. Repositorio institucional de la Universidad Waikato

Hancock, J.T., Khoshgoftaar, T.M. & Johnson, J.M. Evaluating classifier performance with highly imbalanced Big Data. J Big Data 10, 42 (2023).

Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31, 685-695.

Jones, R. W. (2019). The Impact of Student Loan Debt and Student Loan Delinquency on Total, Sex‐, and Age‐specific Suicide Rates during the Great Recession. Sociological Inquiry, 89(4), 677–702. doi:10.1111/soin.12278

Junco Luna, G. J. (2023). Study on the impact of artificial intelligence tools in the development of university classes at the school of communication of the Universidad Nacional José Faustino Sánchez Carrión. Metaverse Basic and Applied Research, 2, 51.

Kaplan, J. (2020). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. R package version 1.6.3,

Khalaf, A.S., Dahr, J.M., Najim, I.A., Kamel, M.B., Hashim, A.S., Awadh, W.A., & Humadi, A.M. (2021). Supervised Learning Algorithms in Educational Data Mining: A Systematic Review.

Kim, L. (2016). _Information: Data Exploration with Information Theory (Weight-of-Evidence and Information Value). R package version 0.0.9,

Koedinger, K. R., D’Mello, S., McLaughlin, E. A., Pardos, Z. A., & Rosé, C. P. (2015). Data mining and education. Wiley Interdisciplinary Reviews: Cognitive Science, 6(4), 333–353. doi:10.1002/wcs.1350

Kühl, N., Schemmer, M., & Goutier, M. (2022). Satzger, G. Artificial intelligence and machine learning. Electron Markets 32, 2235–2244.

Kursa, M.B. & Rudnicki, W.R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1-13.

Lali, K., & Chakor, A. (2023a). Improving the Security and Reliability of a Quality Marketing Information System: A Priority Prerequisite for Good Strategic Management of a Successful Entrepreneurial Project. Data & Metadata, 2, 40.

Lali, K., Chakor, A., & El Boukhari, H. (2023b). The Digitalization of Production Processes : A Priority Condition for the Success of an Efficient Marketing Information System. Case of the Swimwear Anywhere Company. Data & Metadata, 2, 41.

Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., & Liu, H. (2016). Feature Selection. ACM Computing Surveys (CSUR), 50, 1 - 45. DOI:10.1145/3136625

Liu, C., Jin, S., Wang, D., Luo, Z., Yu, J., Zhou, B., & Yang, C. (2020). Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets with Class Overlapping. IEEE Access, 1–1. doi:10.1109/access.2020.3018911

López Pérez, T. E., Manzano Pérez, R. S., Manzano Pérez, R. J., & Zumbana Herrera, L. F. (2022). Methodological strategies to strengthen the teaching-learning process in basic education children. Salud, Ciencia Y Tecnología, 2(S1), 254.

Lunardon, N., Menardi, G., Torelli, N. (2014). ROSE: a Package for Binary Imbalanced Learning. R Journal, 6(1), 82-92.

Macea-Anaya, M., Baena-Navarro, R., Carriazo-Regino, Y., Alvarez-Castillo, J., & Contreras-Florez, J. (2023). Designing a Framework for the Appropriation of Information Technologies in University Teachers: A Four-Phase Approach. Data & Metadata, 2, 53.

Marinho de Sousa, R. P., & Shintaku, M. (2022). Data privacy policy: relevant observations for its implementation. Advanced Notes in Information Science, 2, 82–91.

Martín Ferron, L. (2022). Jumping the Gap: developing an innovative product from a Social Network Analysis perspective. AWARI, 2, e026.

McKay, T., Naidoo, A. & Simpson, Z. (2021). Exploring the Challenges of First-Year Student Funding: An Intra-Institutional Case Study. DOI: 10.24085/jsaa.v6i1.3063

Mejías, M., Guarate Coronado, Y. C., & Jiménez Peralta, A. L. (2022). Inteligencia artificial en el campo de la enfermería. Implicaciones en la asistencia, administración y educación. Salud, Ciencia Y Tecnología, 2, 88.

Mense, E. G., Lemoine, P. A., & Richardson, M. D. (2020). Data Mining in Global Higher Education: Opportunities and Challenges for Learning. In C. Bhatt, P. Sajja, & S. Liyanage (Eds.), Utilizing Educational Data Mining Techniques for Improved Learning: Emerging Research and Opportunities (pp. 86-120). IGI Global.

Miller, T. (2017). Explanation in Artificial Intelligence: Insights from the Social Sciences. Artif. Intell., 267, 1-38. DOI: 10.1016/J.ARTINT.2018.07.007

Minh, D., Wang, H.X., Li, Y.F., & Nguyen, T.N. (2021). Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review, 55, 3503 - 3568. DOI: 10.1007/s10462-021-10088-y

Mirande, S. N., & Martínez Debat, C. (2023). Conflictos de Intereses, Ghostwriting, Invasiones Epistémicas, Principio Precautorio y un Análisis de Riesgo de las vacunas de ARNm modificado. Salud, Ciencia Y Tecnología - Serie De Conferencias, 2(1), 105.

OECD (2022), Education at a Glance 2022: OECD Indicators, OECD Publishing, Paris,

Olufemi, J. (2021). The Concept of Data Mining. Artificial Intelligence. DOI:10.5772/intechopen.99417

Olusegun Oyetola, S., Oladokun, B. D., Ezinne Maxwell, C., & Obotu Akor, S. (2023). Artificial intelligence in the library: Gauging the potential application and implications for contemporary library services in Nigeria. Data & Metadata, 2, 36.

Posit Team (2023). RStudio: Integrated Development Environment for R. Posit Software, PBC, Boston, MA. URL

Quinto, C. (6 de agosto de 2020). El 15% de estudiantes abandonó la universidad durante el estado de emergencia, según gremio de instituciones privadas. RPP.

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL

Ridley, M. (2022). Explainable Artificial Intelligence (XAI). Information Technology and Libraries.

Romero, C. & Ventura, S. (2012). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), 12–27. doi:10.1002/widm.1075

Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., & Zhong, C. (2021). Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges. ArXiv, abs/2103.11251.

Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23 19, 2507-17. DOI:10.1093/bioinformatics/btm344

Sarker, I.H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. Sn Computer Science, 2. 160.

Sharma, P., & Sharma, D. S. (2018). DATA MINING TECHNIQUES FOR EDUCATIONAL DATA: A REVIEW. International Journal of Engineering Technologies and Management Research, 5(2), 166–177.

Shi, Y. (2022). Feature Selection. In: Advances in Big Data Analytics. Springer, Singapore.

Silva Coimbra, F., & Rodrigues Dias, T. M. (2022). A process for the identification and analysis of scientific articles in conference proceedings. Advanced Notes in Information Science, 2, 74–81.

Silva, E. (2022). Digital transformation and knowledge management: relationships in scientific production. Advanced Notes in Information Science, 2, 43–52.

Silva-Sánchez, C. A. (2022). Psychometric properties of an instrument to assess the level of knowledge about artificial intelligence in university professors. Metaverse Basic and Applied Research, 1, 14.

Subbarayan, S., & Gunaseelan, H. G. (2022). A Review of Data and Document Clustering pertaining to various Distance Measures. Salud, Ciencia Y Tecnología, 2(S2), 194.

Sumitha, R., & Vinothkumar, E. (2016). Prediction of Students Outcome Using Data Mining Techniques.

Superintendencia Nacional de Educación Superior (2021). III Informe Bienal sobre la Realidad Universitaria en el Perú.

Takaki, P., & Dutra, M. (2022). Data science in education: interdisciplinary contributions. Advanced Notes in Information Science, 2, 149–160.

Tan, P., Steinbach, M.S., & Kumar, V. (2022). Introduction to Data Mining. Data Mining and Machine Learning Applications.

Vähäkainu, P. & Lehto, M. (2023). Use of Artificial Intelligence in a Cybersecurity Environment. En T. Sipola, T. Kokkonen & M. Karjalainen (Eds.). Artificial Intelligence and Cybersecurity: Theory and Applications (pp. 3 - 27). Springer.

Venkatesh, B. & Anuradha, J. (2019). A Review of Feature Selection and Its Methods. Cybernetics and Information Technologies, 19(1) 3-26. DOI:

Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., & Kennedy, P. J. (2016). Training deep neural networks on imbalanced data sets. 2016 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn.2016.7727770

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L.D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T.L., Miller, E., Bache, S.M., Müller, K., Ooms, J., Robinson, D., Seidel, D.P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 <

Wiesmüller, S. (2023). The Relational Governance of Artificial Intelligence, Forms and Interactions. Springer.

Yağcı, M. (2022). Educational data mining: prediction of students' academic performance using machine learning algorithms. Smart Learn. Environ. 9, 11.

Yin, J., Gan, C., Zhao, K., Lin, X., Quan, Z., & Wang, Z.-J. (2020). A Novel Model for Imbalanced Data Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04), 6680-6687.

Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., & Saeed, J. (2020). A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. Journal of Applied Science and Technology Trends, 1(2), 56 - 70.

Zhong, S., Zhang, K., Bagheri, M., Burken, J. G., Gu, A., Li, B., Ma, X., Marrone, B. L., Ren, Z. J., Schrier, J., Shi, W., Tan, H., Wang, T., Wang, X., Wong, B. M., Xiao, X., Yu, X., Zhu, J. J., & Zhang, H. (2021). Machine Learning: New Ideas and Tools in Environmental Science and Engineering. Environmental science & technology, 55(19), 12741–12754.

Zhou, J., Gandomi, A.H., Chen, F., & Holzinger, A. (2021). Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics. Electronics, 10, 593.

Zwanenburg, A. (2021). familiar: Vignettes and Documentation.

Zwanenburg, A., & Löck, S. (2021). familiar: End-to-End Automated Machine Learning and Model Evaluation.




How to Cite

Villarreal-Torres H, Ángeles-Morales J, Marín-Rodriguez W, Andrade-Girón D, Carreño-Cisneros E, Cano-Mejía J, Mejía-Murillo C, Boscán-Carroz MC, Flores-Reyes G, Cruz-Cruz O. Development of a Classification Model for Predicting Student Payment Behavior Using Artificial Intelligence and Data Science Techniques. EAI Endorsed Scal Inf Syst [Internet]. 2023 Jun. 26 [cited 2024 Jul. 19];10(5). Available from:

Most read articles by the same author(s)