Development of a Classification Model for Predicting Student Payment Behavior Using Artificial Intelligence and Data Science Techniques

Authors

DOI:

https://doi.org/10.4108/eetsis.3489

Keywords:

Automated Machine Learning, Higher Education, Data Mining, Delinquency

Abstract

Artificial intelligence today has become a valuable tool for decision-making, where universities have to adapt and optimize their processes, improving the quality of their services. In this context, the economic income from collections is vital for sustainability. There are several problems that can contribute to student delinquency, such as economic, financial, academic, family, and personal. For this reason, the study aimed to develop a classification model to predict the payment behavior of enrolled students. The methodology is a proactive, technological study of incremental innovation with a synchronous temporal scope. The study population consisted of 8,495 undergraduate students enrolled in the 2022 - II academic semester, containing information on academic performance, financial situation, and personal factors. The result is a classification model using the H2O.ai platform, discretization algorithms, data balancing, and the R language. Data science algorithms obtained the base from the institution's computer system. The data sets for training and testing correspond to 70% and 30%, obtaining the GBM Grid model whose performance metrics are AUC of 0.905, AUCPR of 0.926, and logLoss equivalent to 0.311; that is, the model efficiently complies with the classification of student debtors to provide them with early intervention service and help them complete their studies.

References

Abdul, M., Yusoff, M. & Mohamed, A. (2022). Survey on Highly Imbalanced Multi-class Data, International Journal of Advanced Computer Science and Applications (IJACSA), 13(6). http://dx.doi.org/10.14569/IJACSA.2022.0130627

Albarracín Vanoy, R. J. (2022). STEM Education as a Teaching Method for the Development of XXI Century Competencies. Metaverse Basic and Applied Research, 1, 21. https://doi.org/10.56294/mr202221

Andrade-Girón, D., Carreño-Cisneros, E., Mejía-Dominguez, C., Marín-Rodriguez, W., & Villarreal-Torres, H. (2023). Comparison of Machine Learning Algorithms for Predicting Patients with Suspected COVID-19. Salud, Ciencia Y Tecnología, 3, 336. https://doi.org/10.56294/saludcyt2023336

Angelov, P.P., Soares, E.A., Jiang, R., Arnold, N.I., & Atkinson, P.M. (2021). Explainable artificial intelligence: an analytical review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11. DOI:10.1002/widm.1424

Arrieta, A.B., Rodríguez, N.D., Ser, J.D., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2019). Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. ArXiv, abs/1910.10045.

Bartschat, A., Reischl, M., & Mikut, R. (2019). Data mining tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1309. doi:10.1002/widm.1309

Benito, P. V. (2022). Contemporary art and networks: Analysis of the Venus Project using the UCINET software. AWARI, 3. https://doi.org/10.47909/awari.166

Bolaño García, M., Duarte Acosta, N., & González Castro, K. (2023). Scientific production on the use of ICT as a tool for social inclusion for deaf people: a bibliometric analysis. Salud, Ciencia Y Tecnología, 3, 318. https://doi.org/10.56294/saludcyt2023318

Cárdenas Espinosa, R. D., Caicedo-Erazo, J. C., Arbeláez Londoño, M., & Jimenez Pitre, I. (2023). Inclusive Innovation through Arduino Embedded Systems and ChatGPT. Metaverse Basic and Applied Research, 2, 52. https://doi.org/10.56294/mr202352

Catrambone, A. R., & Ledwith, A. S. (2023). Acompañamiento interdisciplinar de las trayectorias académicas, en formación docente y psicopedagógica. Salud, Ciencia Y Tecnología - Serie De Conferencias, 2(1), 186. https://doi.org/10.56294/sctconf2023186

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Comput. Electr. Eng., 40, 16-28. DOI: 10.1016/j.compeleceng.2013.11.024

Chatterjee, J., Garg, H. & Thakur, R.N. (2023). A Roadmap for Enabling Industry 4.0 by Artificial Intelligence. Wiley. ISBN 978-1-119-90485-4

Chawla, N.V. (2005). Data Mining for Imbalanced Datasets: An Overview. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_40

Chawla, N.V. (2009). Data Mining for Imbalanced Datasets: An Overview. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_45

Chen, L., Chen, P. & Lin, Z. (2020). Artificial Intelligence in Education: A Review, in IEEE Access, vol. 8, pp. 75264-75278, doi: 10.1109/ACCESS.2020.2988510.

Chryssolouris, G., Alexopoulos, K. & Arkouli, Z. (2023). Perspective on Artificial Intelligence in Manufacturing. Springer. https://doi.org/10.1007/978-3-031-21828-6

Cordón, I., García, S., Fernández, A. & Herrera, F. (2018). Imbalance: Oversampling algorithms for imbalanced classification in R. Knowledge-Based Systems, 161, 329-341. https://doi.org/10.1016/j.knosys.2018.07.035.

Corrêa da Silva, F. C. (2022). The value of information in the face of new global disorder. AWARI, 3. https://doi.org/10.47909/awari.165

Correa Moreno, M. C., & González Castro, G. L. (2023). Unveiling Public Information in the Metaverse and AI Era: Challenges and Opportunities. Metaverse Basic and Applied Research, 2, 35. https://doi.org/10.56294/mr202335

de Araújo Telmo, F., Matos Autran, M. de M., & Araújo da Silva, A. K. (2021). Scientific production on open science in Information Science: a study based on the ENANCIB event. AWARI, 2, e027. https://doi.org/10.47909/awari.127

de Araújo Telmo, F., Matos Autran, M. de M., & Araújo da Silva, A. K. (2021). Scientific production on open science in Information Science: a study based on the ENANCIB event. AWARI, 2, e027. https://doi.org/10.47909/awari.127

do Carmo, D., & da Silva Lemos, D. L. (2022). Quality standards for data and metadata addressed to data science applications. Advanced Notes in Information Science, 2, 161–170. https://doi.org/10.47909/anis.978-9916-9760-3-6.116

Driss Hanafi, M., Lali, K., Kably, H., & Chakor, A. (2023). The English Proficiency and the Inevitable Resort to Digitalization: A Direction to Follow and Adopt to Guarantee the Success of Women Entrepreneurs in the World of Business and Enterprises. Data & Metadata, 2, 42. https://doi.org/10.56294/dm202342

Francis, B.K., Babu, S.S. Predicting Academic Performance of Students Using a Hybrid Data Mining Approach. J Med Syst 43, 162 (2019). https://doi.org/10.1007/s10916-019-1295-4

Fryda, T., LeDell, E., Gill, N., Aiello, S., Fu, A., Candel, A., Click, C., Kraljevic, T., Nykodym, T., Aboyoun, P., Kurka, M., Malohlava, M., Poirier, S., Wong, W. (2023). h2o: R Interface for the 'H2O' Scalable Machine Learning Platform. R package version 3.40.0.4, https://CRAN.R-project.org/package=h2o

Garg, S.K., & Sharma, A.K. (2013). Comparative Analysis of Various Data Mining Techniques on Educational Datasets. International Journal of Computer Applications, 74, 1-5. https://research.ijcaonline.org/volume74/number5/pxc3889673.pdf

Gazzola, A. (18 de octubre de 2021). Educación superior en América Latina y Caribe, presente y futuro. UNESCO. https://www.iesalc.unesco.org/2021/10/18/educacion-superior-en-america-latina-y-caribe-presente-y-futuro/

Ghanem, A. S., Venkatesh, S., & West, G. (2008). Learning in imbalanced relational data. 2008 19th International Conference on Pattern Recognition. doi:10.1109/icpr.2008.4761095

Hall, M. (1999). Correlation-based Feature Selection for Machine Learning [Tesis doctoral, Universidad de Waikato]. Repositorio institucional de la Universidad Waikato https://www.cs.waikato.ac.nz/~mhall/thesis.pdf

Hancock, J.T., Khoshgoftaar, T.M. & Johnson, J.M. Evaluating classifier performance with highly imbalanced Big Data. J Big Data 10, 42 (2023). https://doi.org/10.1186/s40537-023-00724-5

Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31, 685-695. https://doi.org/10.1007/s12525-021-00475-2

Jones, R. W. (2019). The Impact of Student Loan Debt and Student Loan Delinquency on Total, Sex‐, and Age‐specific Suicide Rates during the Great Recession. Sociological Inquiry, 89(4), 677–702. doi:10.1111/soin.12278

Junco Luna, G. J. (2023). Study on the impact of artificial intelligence tools in the development of university classes at the school of communication of the Universidad Nacional José Faustino Sánchez Carrión. Metaverse Basic and Applied Research, 2, 51. https://doi.org/10.56294/mr202351

Kaplan, J. (2020). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. R package version 1.6.3, https://CRAN.R-project.org/package=fastDummies.

Khalaf, A.S., Dahr, J.M., Najim, I.A., Kamel, M.B., Hashim, A.S., Awadh, W.A., & Humadi, A.M. (2021). Supervised Learning Algorithms in Educational Data Mining: A Systematic Review.

Kim, L. (2016). _Information: Data Exploration with Information Theory (Weight-of-Evidence and Information Value). R package version 0.0.9, https://CRAN.R-project.org/package=Information.

Koedinger, K. R., D’Mello, S., McLaughlin, E. A., Pardos, Z. A., & Rosé, C. P. (2015). Data mining and education. Wiley Interdisciplinary Reviews: Cognitive Science, 6(4), 333–353. doi:10.1002/wcs.1350

Kühl, N., Schemmer, M., & Goutier, M. (2022). Satzger, G. Artificial intelligence and machine learning. Electron Markets 32, 2235–2244. https://doi.org/10.1007/s12525-022-00598-0

Kursa, M.B. & Rudnicki, W.R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1-13. https://doi.org/10.18637/jss.v036.i11.

Lali, K., & Chakor, A. (2023a). Improving the Security and Reliability of a Quality Marketing Information System: A Priority Prerequisite for Good Strategic Management of a Successful Entrepreneurial Project. Data & Metadata, 2, 40. https://doi.org/10.56294/dm202340

Lali, K., Chakor, A., & El Boukhari, H. (2023b). The Digitalization of Production Processes : A Priority Condition for the Success of an Efficient Marketing Information System. Case of the Swimwear Anywhere Company. Data & Metadata, 2, 41. https://doi.org/10.56294/dm202341

Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., & Liu, H. (2016). Feature Selection. ACM Computing Surveys (CSUR), 50, 1 - 45. DOI:10.1145/3136625

Liu, C., Jin, S., Wang, D., Luo, Z., Yu, J., Zhou, B., & Yang, C. (2020). Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets with Class Overlapping. IEEE Access, 1–1. doi:10.1109/access.2020.3018911

López Pérez, T. E., Manzano Pérez, R. S., Manzano Pérez, R. J., & Zumbana Herrera, L. F. (2022). Methodological strategies to strengthen the teaching-learning process in basic education children. Salud, Ciencia Y Tecnología, 2(S1), 254. https://doi.org/10.56294/saludcyt2022254

Lunardon, N., Menardi, G., Torelli, N. (2014). ROSE: a Package for Binary Imbalanced Learning. R Journal, 6(1), 82-92.

Macea-Anaya, M., Baena-Navarro, R., Carriazo-Regino, Y., Alvarez-Castillo, J., & Contreras-Florez, J. (2023). Designing a Framework for the Appropriation of Information Technologies in University Teachers: A Four-Phase Approach. Data & Metadata, 2, 53. https://doi.org/10.56294/dm202353

Marinho de Sousa, R. P., & Shintaku, M. (2022). Data privacy policy: relevant observations for its implementation. Advanced Notes in Information Science, 2, 82–91. https://doi.org/10.47909/anis.978-9916-9760-3-6.112

Martín Ferron, L. (2022). Jumping the Gap: developing an innovative product from a Social Network Analysis perspective. AWARI, 2, e026. https://doi.org/10.47909/awari.128

McKay, T., Naidoo, A. & Simpson, Z. (2021). Exploring the Challenges of First-Year Student Funding: An Intra-Institutional Case Study. DOI: 10.24085/jsaa.v6i1.3063

Mejías, M., Guarate Coronado, Y. C., & Jiménez Peralta, A. L. (2022). Inteligencia artificial en el campo de la enfermería. Implicaciones en la asistencia, administración y educación. Salud, Ciencia Y Tecnología, 2, 88. https://doi.org/10.56294/saludcyt202288

Mense, E. G., Lemoine, P. A., & Richardson, M. D. (2020). Data Mining in Global Higher Education: Opportunities and Challenges for Learning. In C. Bhatt, P. Sajja, & S. Liyanage (Eds.), Utilizing Educational Data Mining Techniques for Improved Learning: Emerging Research and Opportunities (pp. 86-120). IGI Global. https://doi.org/10.4018/978-1-7998-0010-1.ch005

Miller, T. (2017). Explanation in Artificial Intelligence: Insights from the Social Sciences. Artif. Intell., 267, 1-38. DOI: 10.1016/J.ARTINT.2018.07.007

Minh, D., Wang, H.X., Li, Y.F., & Nguyen, T.N. (2021). Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review, 55, 3503 - 3568. DOI: 10.1007/s10462-021-10088-y

Mirande, S. N., & Martínez Debat, C. (2023). Conflictos de Intereses, Ghostwriting, Invasiones Epistémicas, Principio Precautorio y un Análisis de Riesgo de las vacunas de ARNm modificado. Salud, Ciencia Y Tecnología - Serie De Conferencias, 2(1), 105. https://doi.org/10.56294/sctconf2023105

OECD (2022), Education at a Glance 2022: OECD Indicators, OECD Publishing, Paris, https://doi.org/10.1787/3197152b-en.

Olufemi, J. (2021). The Concept of Data Mining. Artificial Intelligence. DOI:10.5772/intechopen.99417

Olusegun Oyetola, S., Oladokun, B. D., Ezinne Maxwell, C., & Obotu Akor, S. (2023). Artificial intelligence in the library: Gauging the potential application and implications for contemporary library services in Nigeria. Data & Metadata, 2, 36. https://doi.org/10.56294/dm202336

Posit Team (2023). RStudio: Integrated Development Environment for R. Posit Software, PBC, Boston, MA. URL http://www.posit.co/.

Quinto, C. (6 de agosto de 2020). El 15% de estudiantes abandonó la universidad durante el estado de emergencia, según gremio de instituciones privadas. RPP. https://rpp.pe/peru/actualidad/covid-19-el-15-de-estudiantes-abandono-la-universidad-durante-el-estado-de-emergencia-segun-gremio-de-instituciones-privadas-noticia-1283361?ref=rpp

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Ridley, M. (2022). Explainable Artificial Intelligence (XAI). Information Technology and Libraries.

Romero, C. & Ventura, S. (2012). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), 12–27. doi:10.1002/widm.1075

Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., & Zhong, C. (2021). Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges. ArXiv, abs/2103.11251.

Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23 19, 2507-17. DOI:10.1093/bioinformatics/btm344

Sarker, I.H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. Sn Computer Science, 2. 160. https://doi.org/10.1007/s42979-021-00592-x

Sharma, P., & Sharma, D. S. (2018). DATA MINING TECHNIQUES FOR EDUCATIONAL DATA: A REVIEW. International Journal of Engineering Technologies and Management Research, 5(2), 166–177. https://doi.org/10.29121/ijetmr.v5.i2.2018.641

Shi, Y. (2022). Feature Selection. In: Advances in Big Data Analytics. Springer, Singapore. https://doi.org/10.1007/978-981-16-3607-3_4

Silva Coimbra, F., & Rodrigues Dias, T. M. (2022). A process for the identification and analysis of scientific articles in conference proceedings. Advanced Notes in Information Science, 2, 74–81. https://doi.org/10.47909/anis.978-9916-9760-3-6.93

Silva, E. (2022). Digital transformation and knowledge management: relationships in scientific production. Advanced Notes in Information Science, 2, 43–52. https://doi.org/10.47909/anis.978-9916-9760-3-6.107

Silva-Sánchez, C. A. (2022). Psychometric properties of an instrument to assess the level of knowledge about artificial intelligence in university professors. Metaverse Basic and Applied Research, 1, 14. https://doi.org/10.56294/mr202214

Subbarayan, S., & Gunaseelan, H. G. (2022). A Review of Data and Document Clustering pertaining to various Distance Measures. Salud, Ciencia Y Tecnología, 2(S2), 194. https://doi.org/10.56294/saludcyt2022194

Sumitha, R., & Vinothkumar, E. (2016). Prediction of Students Outcome Using Data Mining Techniques.

Superintendencia Nacional de Educación Superior (2021). III Informe Bienal sobre la Realidad Universitaria en el Perú. https://www.gob.pe/institucion/sunedu/informes-publicaciones/2824150-iii-informe-bienal-sobre-la-realidad-universitaria-en-el-peru.

Takaki, P., & Dutra, M. (2022). Data science in education: interdisciplinary contributions. Advanced Notes in Information Science, 2, 149–160. https://doi.org/10.47909/anis.978-9916-9760-3-6.94

Tan, P., Steinbach, M.S., & Kumar, V. (2022). Introduction to Data Mining. Data Mining and Machine Learning Applications. https://doi.org/10.1002/9781119792529.ch1

Vähäkainu, P. & Lehto, M. (2023). Use of Artificial Intelligence in a Cybersecurity Environment. En T. Sipola, T. Kokkonen & M. Karjalainen (Eds.). Artificial Intelligence and Cybersecurity: Theory and Applications (pp. 3 - 27). Springer. https://doi.org/10.1007/978-3-031-15030-2

Venkatesh, B. & Anuradha, J. (2019). A Review of Feature Selection and Its Methods. Cybernetics and Information Technologies, 19(1) 3-26. DOI: https://doi.org/10.2478/cait-2019-0001

Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., & Kennedy, P. J. (2016). Training deep neural networks on imbalanced data sets. 2016 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn.2016.7727770

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L.D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T.L., Miller, E., Bache, S.M., Müller, K., Ooms, J., Robinson, D., Seidel, D.P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686.

Wiesmüller, S. (2023). The Relational Governance of Artificial Intelligence, Forms and Interactions. Springer. https://doi.org/10.1007/978-3-031-25023-1

Yağcı, M. (2022). Educational data mining: prediction of students' academic performance using machine learning algorithms. Smart Learn. Environ. 9, 11. https://doi.org/10.1186/s40561-022-00192-z

Yin, J., Gan, C., Zhao, K., Lin, X., Quan, Z., & Wang, Z.-J. (2020). A Novel Model for Imbalanced Data Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04), 6680-6687. https://doi.org/10.1609/aaai.v34i04.6145

Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., & Saeed, J. (2020). A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. Journal of Applied Science and Technology Trends, 1(2), 56 - 70. https://doi.org/10.38094/jastt1224

Zhong, S., Zhang, K., Bagheri, M., Burken, J. G., Gu, A., Li, B., Ma, X., Marrone, B. L., Ren, Z. J., Schrier, J., Shi, W., Tan, H., Wang, T., Wang, X., Wong, B. M., Xiao, X., Yu, X., Zhu, J. J., & Zhang, H. (2021). Machine Learning: New Ideas and Tools in Environmental Science and Engineering. Environmental science & technology, 55(19), 12741–12754. https://doi.org/10.1021/acs.est.1c01339

Zhou, J., Gandomi, A.H., Chen, F., & Holzinger, A. (2021). Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics. Electronics, 10, 593. https://doi.org/10.3390/electronics10050593

Zwanenburg, A. (2021). familiar: Vignettes and Documentation. https://github.com/alexzwanenburg/familiar.

Zwanenburg, A., & Löck, S. (2021). familiar: End-to-End Automated Machine Learning and Model Evaluation. https://github.com/alexzwanenburg/familiar.

Downloads

Published

26-06-2023

How to Cite

1.
Villarreal-Torres H, Ángeles-Morales J, Marín-Rodriguez W, Andrade-Girón D, Carreño-Cisneros E, Cano-Mejía J, Mejía-Murillo C, Boscán-Carroz MC, Flores-Reyes G, Cruz-Cruz O. Development of a Classification Model for Predicting Student Payment Behavior Using Artificial Intelligence and Data Science Techniques. EAI Endorsed Scal Inf Syst [Internet]. 2023 Jun. 26 [cited 2024 May 5];10(5). Available from: https://publications.eai.eu/index.php/sis/article/view/3489

Most read articles by the same author(s)