Development of a Classification Model for Predicting Student Payment Behavior Using Artificial Intelligence and Data Science Techniques
DOI:
https://doi.org/10.4108/eetsis.3489Keywords:
Automated Machine Learning, Higher Education, Data Mining, DelinquencyAbstract
Artificial intelligence today has become a valuable tool for decision-making, where universities have to adapt and optimize their processes, improving the quality of their services. In this context, the economic income from collections is vital for sustainability. There are several problems that can contribute to student delinquency, such as economic, financial, academic, family, and personal. For this reason, the study aimed to develop a classification model to predict the payment behavior of enrolled students. The methodology is a proactive, technological study of incremental innovation with a synchronous temporal scope. The study population consisted of 8,495 undergraduate students enrolled in the 2022 - II academic semester, containing information on academic performance, financial situation, and personal factors. The result is a classification model using the H2O.ai platform, discretization algorithms, data balancing, and the R language. Data science algorithms obtained the base from the institution's computer system. The data sets for training and testing correspond to 70% and 30%, obtaining the GBM Grid model whose performance metrics are AUC of 0.905, AUCPR of 0.926, and logLoss equivalent to 0.311; that is, the model efficiently complies with the classification of student debtors to provide them with early intervention service and help them complete their studies.
References
Abdul, M., Yusoff, M. & Mohamed, A. (2022). Survey on Highly Imbalanced Multi-class Data, International Journal of Advanced Computer Science and Applications (IJACSA), 13(6). http://dx.doi.org/10.14569/IJACSA.2022.0130627
Albarracín Vanoy, R. J. (2022). STEM Education as a Teaching Method for the Development of XXI Century Competencies. Metaverse Basic and Applied Research, 1, 21. https://doi.org/10.56294/mr202221
Andrade-Girón, D., Carreño-Cisneros, E., Mejía-Dominguez, C., Marín-Rodriguez, W., & Villarreal-Torres, H. (2023). Comparison of Machine Learning Algorithms for Predicting Patients with Suspected COVID-19. Salud, Ciencia Y Tecnología, 3, 336. https://doi.org/10.56294/saludcyt2023336
Angelov, P.P., Soares, E.A., Jiang, R., Arnold, N.I., & Atkinson, P.M. (2021). Explainable artificial intelligence: an analytical review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11. DOI:10.1002/widm.1424
Arrieta, A.B., Rodríguez, N.D., Ser, J.D., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2019). Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. ArXiv, abs/1910.10045.
Bartschat, A., Reischl, M., & Mikut, R. (2019). Data mining tools. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1309. doi:10.1002/widm.1309
Benito, P. V. (2022). Contemporary art and networks: Analysis of the Venus Project using the UCINET software. AWARI, 3. https://doi.org/10.47909/awari.166
Bolaño García, M., Duarte Acosta, N., & González Castro, K. (2023). Scientific production on the use of ICT as a tool for social inclusion for deaf people: a bibliometric analysis. Salud, Ciencia Y Tecnología, 3, 318. https://doi.org/10.56294/saludcyt2023318
Cárdenas Espinosa, R. D., Caicedo-Erazo, J. C., Arbeláez Londoño, M., & Jimenez Pitre, I. (2023). Inclusive Innovation through Arduino Embedded Systems and ChatGPT. Metaverse Basic and Applied Research, 2, 52. https://doi.org/10.56294/mr202352
Catrambone, A. R., & Ledwith, A. S. (2023). Acompañamiento interdisciplinar de las trayectorias académicas, en formación docente y psicopedagógica. Salud, Ciencia Y Tecnología - Serie De Conferencias, 2(1), 186. https://doi.org/10.56294/sctconf2023186
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Comput. Electr. Eng., 40, 16-28. DOI: 10.1016/j.compeleceng.2013.11.024
Chatterjee, J., Garg, H. & Thakur, R.N. (2023). A Roadmap for Enabling Industry 4.0 by Artificial Intelligence. Wiley. ISBN 978-1-119-90485-4
Chawla, N.V. (2005). Data Mining for Imbalanced Datasets: An Overview. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_40
Chawla, N.V. (2009). Data Mining for Imbalanced Datasets: An Overview. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09823-4_45
Chen, L., Chen, P. & Lin, Z. (2020). Artificial Intelligence in Education: A Review, in IEEE Access, vol. 8, pp. 75264-75278, doi: 10.1109/ACCESS.2020.2988510.
Chryssolouris, G., Alexopoulos, K. & Arkouli, Z. (2023). Perspective on Artificial Intelligence in Manufacturing. Springer. https://doi.org/10.1007/978-3-031-21828-6
Cordón, I., García, S., Fernández, A. & Herrera, F. (2018). Imbalance: Oversampling algorithms for imbalanced classification in R. Knowledge-Based Systems, 161, 329-341. https://doi.org/10.1016/j.knosys.2018.07.035.
Corrêa da Silva, F. C. (2022). The value of information in the face of new global disorder. AWARI, 3. https://doi.org/10.47909/awari.165
Correa Moreno, M. C., & González Castro, G. L. (2023). Unveiling Public Information in the Metaverse and AI Era: Challenges and Opportunities. Metaverse Basic and Applied Research, 2, 35. https://doi.org/10.56294/mr202335
de Araújo Telmo, F., Matos Autran, M. de M., & Araújo da Silva, A. K. (2021). Scientific production on open science in Information Science: a study based on the ENANCIB event. AWARI, 2, e027. https://doi.org/10.47909/awari.127
de Araújo Telmo, F., Matos Autran, M. de M., & Araújo da Silva, A. K. (2021). Scientific production on open science in Information Science: a study based on the ENANCIB event. AWARI, 2, e027. https://doi.org/10.47909/awari.127
do Carmo, D., & da Silva Lemos, D. L. (2022). Quality standards for data and metadata addressed to data science applications. Advanced Notes in Information Science, 2, 161–170. https://doi.org/10.47909/anis.978-9916-9760-3-6.116
Driss Hanafi, M., Lali, K., Kably, H., & Chakor, A. (2023). The English Proficiency and the Inevitable Resort to Digitalization: A Direction to Follow and Adopt to Guarantee the Success of Women Entrepreneurs in the World of Business and Enterprises. Data & Metadata, 2, 42. https://doi.org/10.56294/dm202342
Francis, B.K., Babu, S.S. Predicting Academic Performance of Students Using a Hybrid Data Mining Approach. J Med Syst 43, 162 (2019). https://doi.org/10.1007/s10916-019-1295-4
Fryda, T., LeDell, E., Gill, N., Aiello, S., Fu, A., Candel, A., Click, C., Kraljevic, T., Nykodym, T., Aboyoun, P., Kurka, M., Malohlava, M., Poirier, S., Wong, W. (2023). h2o: R Interface for the 'H2O' Scalable Machine Learning Platform. R package version 3.40.0.4, https://CRAN.R-project.org/package=h2o
Garg, S.K., & Sharma, A.K. (2013). Comparative Analysis of Various Data Mining Techniques on Educational Datasets. International Journal of Computer Applications, 74, 1-5. https://research.ijcaonline.org/volume74/number5/pxc3889673.pdf
Gazzola, A. (18 de octubre de 2021). Educación superior en América Latina y Caribe, presente y futuro. UNESCO. https://www.iesalc.unesco.org/2021/10/18/educacion-superior-en-america-latina-y-caribe-presente-y-futuro/
Ghanem, A. S., Venkatesh, S., & West, G. (2008). Learning in imbalanced relational data. 2008 19th International Conference on Pattern Recognition. doi:10.1109/icpr.2008.4761095
Hall, M. (1999). Correlation-based Feature Selection for Machine Learning [Tesis doctoral, Universidad de Waikato]. Repositorio institucional de la Universidad Waikato https://www.cs.waikato.ac.nz/~mhall/thesis.pdf
Hancock, J.T., Khoshgoftaar, T.M. & Johnson, J.M. Evaluating classifier performance with highly imbalanced Big Data. J Big Data 10, 42 (2023). https://doi.org/10.1186/s40537-023-00724-5
Janiesch, C., Zschech, P., & Heinrich, K. (2021). Machine learning and deep learning. Electronic Markets, 31, 685-695. https://doi.org/10.1007/s12525-021-00475-2
Jones, R. W. (2019). The Impact of Student Loan Debt and Student Loan Delinquency on Total, Sex‐, and Age‐specific Suicide Rates during the Great Recession. Sociological Inquiry, 89(4), 677–702. doi:10.1111/soin.12278
Junco Luna, G. J. (2023). Study on the impact of artificial intelligence tools in the development of university classes at the school of communication of the Universidad Nacional José Faustino Sánchez Carrión. Metaverse Basic and Applied Research, 2, 51. https://doi.org/10.56294/mr202351
Kaplan, J. (2020). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. R package version 1.6.3, https://CRAN.R-project.org/package=fastDummies.
Khalaf, A.S., Dahr, J.M., Najim, I.A., Kamel, M.B., Hashim, A.S., Awadh, W.A., & Humadi, A.M. (2021). Supervised Learning Algorithms in Educational Data Mining: A Systematic Review.
Kim, L. (2016). _Information: Data Exploration with Information Theory (Weight-of-Evidence and Information Value). R package version 0.0.9, https://CRAN.R-project.org/package=Information.
Koedinger, K. R., D’Mello, S., McLaughlin, E. A., Pardos, Z. A., & Rosé, C. P. (2015). Data mining and education. Wiley Interdisciplinary Reviews: Cognitive Science, 6(4), 333–353. doi:10.1002/wcs.1350
Kühl, N., Schemmer, M., & Goutier, M. (2022). Satzger, G. Artificial intelligence and machine learning. Electron Markets 32, 2235–2244. https://doi.org/10.1007/s12525-022-00598-0
Kursa, M.B. & Rudnicki, W.R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1-13. https://doi.org/10.18637/jss.v036.i11.
Lali, K., & Chakor, A. (2023a). Improving the Security and Reliability of a Quality Marketing Information System: A Priority Prerequisite for Good Strategic Management of a Successful Entrepreneurial Project. Data & Metadata, 2, 40. https://doi.org/10.56294/dm202340
Lali, K., Chakor, A., & El Boukhari, H. (2023b). The Digitalization of Production Processes : A Priority Condition for the Success of an Efficient Marketing Information System. Case of the Swimwear Anywhere Company. Data & Metadata, 2, 41. https://doi.org/10.56294/dm202341
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., & Liu, H. (2016). Feature Selection. ACM Computing Surveys (CSUR), 50, 1 - 45. DOI:10.1145/3136625
Liu, C., Jin, S., Wang, D., Luo, Z., Yu, J., Zhou, B., & Yang, C. (2020). Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets with Class Overlapping. IEEE Access, 1–1. doi:10.1109/access.2020.3018911
López Pérez, T. E., Manzano Pérez, R. S., Manzano Pérez, R. J., & Zumbana Herrera, L. F. (2022). Methodological strategies to strengthen the teaching-learning process in basic education children. Salud, Ciencia Y Tecnología, 2(S1), 254. https://doi.org/10.56294/saludcyt2022254
Lunardon, N., Menardi, G., Torelli, N. (2014). ROSE: a Package for Binary Imbalanced Learning. R Journal, 6(1), 82-92.
Macea-Anaya, M., Baena-Navarro, R., Carriazo-Regino, Y., Alvarez-Castillo, J., & Contreras-Florez, J. (2023). Designing a Framework for the Appropriation of Information Technologies in University Teachers: A Four-Phase Approach. Data & Metadata, 2, 53. https://doi.org/10.56294/dm202353
Marinho de Sousa, R. P., & Shintaku, M. (2022). Data privacy policy: relevant observations for its implementation. Advanced Notes in Information Science, 2, 82–91. https://doi.org/10.47909/anis.978-9916-9760-3-6.112
Martín Ferron, L. (2022). Jumping the Gap: developing an innovative product from a Social Network Analysis perspective. AWARI, 2, e026. https://doi.org/10.47909/awari.128
McKay, T., Naidoo, A. & Simpson, Z. (2021). Exploring the Challenges of First-Year Student Funding: An Intra-Institutional Case Study. DOI: 10.24085/jsaa.v6i1.3063
Mejías, M., Guarate Coronado, Y. C., & Jiménez Peralta, A. L. (2022). Inteligencia artificial en el campo de la enfermería. Implicaciones en la asistencia, administración y educación. Salud, Ciencia Y Tecnología, 2, 88. https://doi.org/10.56294/saludcyt202288
Mense, E. G., Lemoine, P. A., & Richardson, M. D. (2020). Data Mining in Global Higher Education: Opportunities and Challenges for Learning. In C. Bhatt, P. Sajja, & S. Liyanage (Eds.), Utilizing Educational Data Mining Techniques for Improved Learning: Emerging Research and Opportunities (pp. 86-120). IGI Global. https://doi.org/10.4018/978-1-7998-0010-1.ch005
Miller, T. (2017). Explanation in Artificial Intelligence: Insights from the Social Sciences. Artif. Intell., 267, 1-38. DOI: 10.1016/J.ARTINT.2018.07.007
Minh, D., Wang, H.X., Li, Y.F., & Nguyen, T.N. (2021). Explainable artificial intelligence: a comprehensive review. Artificial Intelligence Review, 55, 3503 - 3568. DOI: 10.1007/s10462-021-10088-y
Mirande, S. N., & Martínez Debat, C. (2023). Conflictos de Intereses, Ghostwriting, Invasiones Epistémicas, Principio Precautorio y un Análisis de Riesgo de las vacunas de ARNm modificado. Salud, Ciencia Y Tecnología - Serie De Conferencias, 2(1), 105. https://doi.org/10.56294/sctconf2023105
OECD (2022), Education at a Glance 2022: OECD Indicators, OECD Publishing, Paris, https://doi.org/10.1787/3197152b-en.
Olufemi, J. (2021). The Concept of Data Mining. Artificial Intelligence. DOI:10.5772/intechopen.99417
Olusegun Oyetola, S., Oladokun, B. D., Ezinne Maxwell, C., & Obotu Akor, S. (2023). Artificial intelligence in the library: Gauging the potential application and implications for contemporary library services in Nigeria. Data & Metadata, 2, 36. https://doi.org/10.56294/dm202336
Posit Team (2023). RStudio: Integrated Development Environment for R. Posit Software, PBC, Boston, MA. URL http://www.posit.co/.
Quinto, C. (6 de agosto de 2020). El 15% de estudiantes abandonó la universidad durante el estado de emergencia, según gremio de instituciones privadas. RPP. https://rpp.pe/peru/actualidad/covid-19-el-15-de-estudiantes-abandono-la-universidad-durante-el-estado-de-emergencia-segun-gremio-de-instituciones-privadas-noticia-1283361?ref=rpp
R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Ridley, M. (2022). Explainable Artificial Intelligence (XAI). Information Technology and Libraries.
Romero, C. & Ventura, S. (2012). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 3(1), 12–27. doi:10.1002/widm.1075
Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., & Zhong, C. (2021). Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges. ArXiv, abs/2103.11251.
Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23 19, 2507-17. DOI:10.1093/bioinformatics/btm344
Sarker, I.H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. Sn Computer Science, 2. 160. https://doi.org/10.1007/s42979-021-00592-x
Sharma, P., & Sharma, D. S. (2018). DATA MINING TECHNIQUES FOR EDUCATIONAL DATA: A REVIEW. International Journal of Engineering Technologies and Management Research, 5(2), 166–177. https://doi.org/10.29121/ijetmr.v5.i2.2018.641
Shi, Y. (2022). Feature Selection. In: Advances in Big Data Analytics. Springer, Singapore. https://doi.org/10.1007/978-981-16-3607-3_4
Silva Coimbra, F., & Rodrigues Dias, T. M. (2022). A process for the identification and analysis of scientific articles in conference proceedings. Advanced Notes in Information Science, 2, 74–81. https://doi.org/10.47909/anis.978-9916-9760-3-6.93
Silva, E. (2022). Digital transformation and knowledge management: relationships in scientific production. Advanced Notes in Information Science, 2, 43–52. https://doi.org/10.47909/anis.978-9916-9760-3-6.107
Silva-Sánchez, C. A. (2022). Psychometric properties of an instrument to assess the level of knowledge about artificial intelligence in university professors. Metaverse Basic and Applied Research, 1, 14. https://doi.org/10.56294/mr202214
Subbarayan, S., & Gunaseelan, H. G. (2022). A Review of Data and Document Clustering pertaining to various Distance Measures. Salud, Ciencia Y Tecnología, 2(S2), 194. https://doi.org/10.56294/saludcyt2022194
Sumitha, R., & Vinothkumar, E. (2016). Prediction of Students Outcome Using Data Mining Techniques.
Superintendencia Nacional de Educación Superior (2021). III Informe Bienal sobre la Realidad Universitaria en el Perú. https://www.gob.pe/institucion/sunedu/informes-publicaciones/2824150-iii-informe-bienal-sobre-la-realidad-universitaria-en-el-peru.
Takaki, P., & Dutra, M. (2022). Data science in education: interdisciplinary contributions. Advanced Notes in Information Science, 2, 149–160. https://doi.org/10.47909/anis.978-9916-9760-3-6.94
Tan, P., Steinbach, M.S., & Kumar, V. (2022). Introduction to Data Mining. Data Mining and Machine Learning Applications. https://doi.org/10.1002/9781119792529.ch1
Vähäkainu, P. & Lehto, M. (2023). Use of Artificial Intelligence in a Cybersecurity Environment. En T. Sipola, T. Kokkonen & M. Karjalainen (Eds.). Artificial Intelligence and Cybersecurity: Theory and Applications (pp. 3 - 27). Springer. https://doi.org/10.1007/978-3-031-15030-2
Venkatesh, B. & Anuradha, J. (2019). A Review of Feature Selection and Its Methods. Cybernetics and Information Technologies, 19(1) 3-26. DOI: https://doi.org/10.2478/cait-2019-0001
Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., & Kennedy, P. J. (2016). Training deep neural networks on imbalanced data sets. 2016 International Joint Conference on Neural Networks (IJCNN). doi:10.1109/ijcnn.2016.7727770
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L.D., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T.L., Miller, E., Bache, S.M., Müller, K., Ooms, J., Robinson, D., Seidel, D.P., Spinu, V., Takahashi, K., Vaughan, D., Wilke, C., Woo, K., Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686.
Wiesmüller, S. (2023). The Relational Governance of Artificial Intelligence, Forms and Interactions. Springer. https://doi.org/10.1007/978-3-031-25023-1
Yağcı, M. (2022). Educational data mining: prediction of students' academic performance using machine learning algorithms. Smart Learn. Environ. 9, 11. https://doi.org/10.1186/s40561-022-00192-z
Yin, J., Gan, C., Zhao, K., Lin, X., Quan, Z., & Wang, Z.-J. (2020). A Novel Model for Imbalanced Data Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04), 6680-6687. https://doi.org/10.1609/aaai.v34i04.6145
Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., & Saeed, J. (2020). A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. Journal of Applied Science and Technology Trends, 1(2), 56 - 70. https://doi.org/10.38094/jastt1224
Zhong, S., Zhang, K., Bagheri, M., Burken, J. G., Gu, A., Li, B., Ma, X., Marrone, B. L., Ren, Z. J., Schrier, J., Shi, W., Tan, H., Wang, T., Wang, X., Wong, B. M., Xiao, X., Yu, X., Zhu, J. J., & Zhang, H. (2021). Machine Learning: New Ideas and Tools in Environmental Science and Engineering. Environmental science & technology, 55(19), 12741–12754. https://doi.org/10.1021/acs.est.1c01339
Zhou, J., Gandomi, A.H., Chen, F., & Holzinger, A. (2021). Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics. Electronics, 10, 593. https://doi.org/10.3390/electronics10050593
Zwanenburg, A. (2021). familiar: Vignettes and Documentation. https://github.com/alexzwanenburg/familiar.
Zwanenburg, A., & Löck, S. (2021). familiar: End-to-End Automated Machine Learning and Model Evaluation. https://github.com/alexzwanenburg/familiar.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Henry Villarreal-Torres , Julio Ángeles-Morales, William Marín-Rodriguez, Daniel Andrade-Girón, Edgardo Carreño-Cisneros, Jenny Cano-Mejía, Carmen Mejía-Murillo, Mariby C. Boscán-Carroz, Gumercindo Flores-Reyes , Oscar Cruz-Cruz
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.