Assessment of CatBoost for Diabetes Prevention in Comparison to XGBoost: AI model capable of predicting the onset of diabetes

Authors

  • Jagadeesh Selvaraj Vel Tech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology image/svg+xml
  • G. Gifta Jerith Malla Reddy University
  • Karthikeyan R Vardhaman College of Engineering image/svg+xml
  • Senthil K Saveetha University image/svg+xml

DOI:

https://doi.org/10.4108/eetiot.5880

Abstract

The metabolic disease known as diabetes is defined by consistently elevated blood sugar levels. An increase in hunger, thirst, and urine production are symptoms of high blood sugar. Untreated diabetes may lead to a variety of complications. Acute complications of diabetes include hyperosmolarity, hyperglycemia, diabetic ketoacidosis, and perhaps death. The most devastating long-term effects are cardiovascular disease, cerebrovascular accident, chronic kidney disease, foot ulcers, and vision loss. The World Diabetes Organization estimates that 463 million people were diagnosed with diabetes in 2019. This population will increase by 578 million by 2030 and by 700 million by 2045, if forecasts pan out. The ability to quickly and accurately diagnose sickness is one of its current medical uses. Therefore, we might potentially reduce death rates via the use of machine learning by creating an AI model that can anticipate when diabetes will start. We will compare the CatBoost and XGBoost algorithms to find the one that is most suited for this purpose. Finally, using a number of health markers from the dataset, the study contrasted XGBoost and CatBoost, two models that may predict diabetes. We train and build our recommended system using Python on a real-world dataset taken from Kaggle. We evaluate our algorithms using precision, recall, F1score, and accuracy metrics, among other performance evaluation parameters. While XGBoost achieved an F1 Score of 91.75, an accuracy rate of 93.33%, a precision of 90.38%, and a recall of 90.63%. The accuracy, precision, recall, and F1 score for CatBoost are 96.09%, 93.38%, 91.38% and 92.13%, respectively. It's the most effective ensemble method, according to CatBoost.

Downloads

Download data is not yet available.
<br data-mce-bogus="1"> <br data-mce-bogus="1">

References

[1] Kharroubi, A.T., Darwish, H.M.: Diabetes mellitus: The epidemic of the century. World J. Diabetes 6, 850–867 (2015)

[2] Wu, Y., Ding, Y., Tanaka, Y., Zhang, W.: Risk factors contributing to type 2 diabetes and recent advances in the treatment and prevention. Int. J. Med. Sci. 11, 1185–1200 (2014)

[3] Papatheodorou, K., Banach, M., Edmonds, M., Papanas, N., Papazoglou, D.: Complications of diabetes. J. Diabetes Res. 2015, 1–6 (2015).

[4] International Diabetes Federation. Available online: https://diabetesatlas.org/ (accessed on 1 September 2022).

[5] Baliunas, D.O.; Taylor, B.J.; Irving, H.; Roerecke, M.; Patra, J.; Mohapatra, S.; Rehm, J. Alcohol as a risk factor for type 2 diabetes: A systematic review and meta-analysis. Diabetes Care 2009, 32, 2123–2132.

[6] Vazquez, G.; Duval, S.; Jacobs, D.R., Jr.; Silventoinen, K. Comparison of body mass index, waist circumference, and waist/hip ratio in predicting incident diabetes: A meta-analysis. Epidemiol. Rev. 2007, 29, 115–128.

[7] Odegaard, A.O.; Koh,W.-P.; Butler, L.M.; Duval, S.; Gross, M.D.; Yu, M.C.; Yuan, J.-M.; Pereira, M.A. Dietary patterns and incident type 2 diabetes in chinese men and women: The singapore chinese health study. Diabetes Care 2011, 34, 880–885.

[8] Smith, A.D.; Crippa, A.; Woodcock, J.; Brage, S. Physical activity and incident type 2 diabetes mellitus: A systematic review and dose–response meta-analysis of prospective cohort studies. Diabetologia 2016, 59, 2527–2545.

[9] Pan, A.; Wang, Y.; Talaei, M.; Hu, F.B.; Wu, T. Relation of active, passive, and quitting smoking with incident type 2 diabetes: A systematic review and meta-analysis. Lancet Diabetes Endocrinol. 2015, 3, 958–967.

[10] Li, M., Fu, X., and Li, D. (2020). Diabetes prediction based on XGBoost algorithm. IOP Conf. Ser. Mater. Sci. Eng. 768 (7), 072093. doi:10.1088/1757-899x/768/7/072093

[11] Mahabub, A. (2019). A robust voting approach for diabetes prediction using traditional machine learning techniques. SN Appl. Sci. 1 (12), 1667–1712. doi:10.1007/s42452-019-1759-7

[12] Mushtaq, Z., Ramzan, M. F., Ali, S., Baseer, S., Samad, A., and Husnain, M.(2022). Voting classification-based diabetes mellitus prediction using hypertuned machine-learning techniques. Mob. Inf. Syst. 2022, 1–16. doi:10.1155/2022/6521532

[13] Beschi Raja, J., Anitha, R., Sujatha, R., Roopa, V., and Sam Peter, S. (2019). Diabetics prediction using gradient boosted classifier. Int. J. Eng. Adv. Technol. 9 (1), 3181–3183. doi:10.35940/ijeat.a9898.109119.

[14] Khan, A. A., Qayyum, H., Liaqat, R., Ahmad, F., Nawaz, A., and Younis, B. (2021). “Optimised prediction model for type 2 diabetes mellitus using gradient boosting algorithm,” in Proceedings of the 2021 Mohammad Ali Jinnah University International Conference on Computing (MAJICC), Karachi, Pakistan, July 2021, 1–6.

[15] Lai, H., Huang, H., Keshavjee, K., Guergachi, A., and Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr. Disord. 19 (1), 101–109. doi:10.1186/s12902-019-0436-6

[16] Singh, A., Dhillon, A., Kumar, N., Hossain, M. S., Muhammad, G., and Kumar, M. (2021). eDiaPredict: an ensemble-based framework for diabetes prediction. ACM Trans. Multimedia Comput. Commun. Appl. 17 (2), 1–26. doi:10.1145/3415155

[17] Hasan, M. K., Alam, M. A., Das, D., Hossain, E., and Hasan, M. (2020). Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8, 76516–76531. doi:10.1109/access.2020.2989857

[18] Ganie, S. M., Pramanik, P. K. D.,Malik, M. B., Nayyar, A., and Kwak, K. S. (2023). An improved ensemble learning approach for heart disease prediction using boosting algorithms. Comput. Syst. Sci. Eng. 46 (3), 3993–4006. doi:10.32604/csse.2023.035244.

[19] Santhanam, R., Uzir, N., Raman, S., and Banerjee, S. (2016). Experimenting XGBoost algorithm for prediction and classification of different datasets. Int. J. Control Theory Appl. 9 (40), 651–662.

[20] Hancock, J. T., and Khoshgoftaar, T. M. (2020). CatBoost for big data: an interdisciplinary review. J. Big Data 7 (1), 94. doi:10.1186/s40537-020-00369-8.

[21] Ganie, S. M., Malik, M. B., and Arif, T. (2022b). “Machine learning techniques for diagnosis of type 2 diabetes using lifestyle data,” in Proceedings of the International Conference on Innovative Computing and Communications, New Delhi, India, August 2021, 487–497.

[22] Ganie SM, Pramanik PKD, Bashir Malik M, Mallik S and Qin H (2023), An ensemble learning approach for diabetes prediction using boosting techniques. Front. Genet. 14:1252159. doi: 10.3389/fgene.2023.1252159.

Downloads

Published

10-02-2025

How to Cite

[1]
J. Selvaraj, G. G. Jerith, K. R, and S. K, “Assessment of CatBoost for Diabetes Prevention in Comparison to XGBoost: AI model capable of predicting the onset of diabetes”, EAI Endorsed Trans IoT, vol. 11, Feb. 2025.