Analysis of Data Mining Techniques and Algorithms on Diabetes Dataset

Authors

DOI:

https://doi.org/10.4108/eetpht.11.5317

Keywords:

Data Mining Technics, Association Rule Mining Algorithms Diabetes, FP-Growth, SVM, Logisitic Regression, Random Forest

Abstract

The fundamental goal of this work is to prepare and carry out diabetes prediction using various Machine Learning techniques and conduct output analysis of those techniques to find the best classifier with the highest accuracy. This study use the Pima Indian Diabetes Dataset and applied the Machine Learning classification methods like Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR) for diabetes prediction. The performance of each algorithm is analysed to determine the one with the best accuracy. The dataset includes details like pregnancies, glucose levels, blood pressure, and other important health information. The focus of this study is to unify FP-Growth algorithm with ML algorithm in order to predict diabetes. The FP-Growth is used to extract the frequent items for data pre-processing before prediction. LR algorithm stands out with high accuracy, showing promise in predicting type 2 diabetes when using the risk factors identified by FP-Growth algorithm. The results help guide future research and make it easier to choose the best algorithms, especially ones that are fast, for medical decision support systems. LR algorithm stands out with high accuracy, showing promise in predicting type 2 diabetes when using the risk factors identified by FP-Growth algorithm.

Downloads

Download data is not yet available.

References

[1] F. A. Jaber and J. W. James, “Early Prediction of Diabetic Using Data Mining,” SN Comput. Sci., vol. 4, no. 2, p. 169, Jan. 2023, doi: 10.1007/s42979-022-01594-z.

[2] L. A. DiMeglio, C. Evans-Molina, and R. A. Oram, “Type 1 diabetes,” The Lancet, vol. 391, no. 10138, pp. 2449–2462, Jun. 2018, doi: 10.1016/S0140-6736(18)31320-5.

[3] H. Ikegami, Y. Hiromine, and S. Noso, “Insulin‐dependent diabetes mellitus in older adults: Current status and future prospects,” Geriatr. Gerontol. Int., vol. 22, no. 8, pp. 549–553, Aug. 2022, doi: 10.1111/ggi.14414.

[4] R. Modzelewski, M. M. Stefanowicz-Rutkowska, W. Matuszewski, and E. M. Bandurska-Stankiewicz, “Gestational Diabetes Mellitus—Recent Literature Review,” J. Clin. Med., vol. 11, no. 19, p. 5736, Sep. 2022, doi: 10.3390/jcm11195736

[5] Yan Niu and Shenglan Ye, Data Prediction Based on Support Vector Machine (SVM)—Taking Soil Quality Improvement Test Soil Organic Matter as an Example, IOP Conference Series: Earth and Environmental Sciences, 295 (2019) 012021, doi:10.1088/1755-1315/295/2/012021.

[6] M. Maalouf, “Logistic regression in data analysis: an overview,” Int. J. Data Anal. Tech. Strateg., vol. 3, no. 3, p. 281, 2011, doi: 10.1504/IJDATS.2011.041335

[7] R. D. Joshi and C. K. Dhakal, “Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches,” Int. J. Environ. Res. Public. Health, vol. 18, no. 14, p. 7346, Jul. 2021, doi: 10.3390/ijerph18147346.

[8] S. Hegelich, “Decision Trees and Random Forests: Machine Learning Techniques to Classify Rare Events,” Eur. Policy Anal., vol. 2, no. 1, pp. 98–120, Mar. 2016, doi: 10.18278/epa.2.1.7

[9] Leo Breiman, Random Forests, “Machine Learning, 45, pp: 5–32, 2001 Kluwer Academic Publishers. Manufactured in The Netherlands.”, 2001.

[10] A. Verikas, A. Gelzinis, and M. Bacauskiene, “Mining data with random forests: A survey and results of new tests,” Pattern Recognition, vol. 44, no. 2, pp. 330–349, Feb. 2011, doi: 10.1016/j.patcog.2010.08.011

[11] Youssef FAKIR, Rachid ELAYACHI , Btissam MAHI, Clustering objects for spatial data mining: a comparative study, Journal of Big Data Research, vol.1, issue 1, 2020

[12] B. Nigam, A. Nigam, and P. Dalal, “Comparative Study of Top 10 Algorithms for Association Rule Mining,” Int. J. Comput. Sci. Eng., vol. 5, no. 8, pp. 190–195, Aug. 2017, doi: 10.26438/ijcse/v5i8.190195.

[13] Victor Chang, Jozeene Bailey, Qianwen Ariel Xu, Zhili Sun, “Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms - PubMed.”, part of Springer Nature 2022

[14] C. C. Olisah L. Smith, and M. Smith, "Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective," Computer Methods and Programs in Biomedicine, vol. 220, p. 106773, 2022

[15] Youssef Fakir, Abdelfatah Maarouf, Rachid El Ayachi, Mining Frequents Itemset and Association Rules in Diabetic Dataset Lecture Notes in Business Information Processing ((LNBIP,volume 449)).

[16] U. E. Laila, K. Mahboob, A. W. Khan, F. Khan, and W. Taekeun, "An ensemble approach to predict early-stage diabetes risk using machine learning: An empirical study," Sensors, vol. 22, no. 14, p. 5247, 2022

[17] T. Mahesh et al., "Blended ensemble learning prediction model for strengthening diagnosis and treatment of chronic diabetes disease," Computational Intelligence and Neuroscience, vol. 2022, 2022.

[18] B. S. Ahamed, M. S. Arya, and A. O. V. Nancy, "Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation," Advances in Human-Computer Interaction, 2022.

[19] B. Kurt et al., "Prediction of gestational diabetes using deep learning and Bayesian optimization and traditional machine learning techniques," Medical & Biological Engineering & Computing, pp. 1-12, 2023.

[20] Ran Rong Liu , LiJun Wang, Rong Miao, A Data Mining Algorithm for Association Rules with Chronic Disease Constraints, Hindawi Computational Intelligence and Neuroscience Volume 2022, https://doi.org/10.1155/2022/8526256

[21] Youssef Fakir, R. El Ayachi and Mohamed Fakir, Mining Frequent Pattern by, International Journal of Scientific Research in Computer Science Engineering and Information Technology., 2020, DOI: https://doi.org/10.32628/CSEIT2063230.

[22] Kaina Zhao, Zhiping Wang, Association rule mining to detect factors which contribute to heart disease in males and females, MLMI '23: Proceedings of the 6th International Conference on Machine Learning and Machine Intelligence October 2023 Pages 29–33, https://doi.org/10.1145/3635638.3635643

[23] Breiman, L.. Some infinity theory for predictor ensembles. Technical Report 579, Statistics Department, University of California, Berkeley, CA 94720, 2000.

[24] Leo Breiman , Random Forests ,Machine Learning, 45, 5–32, 2001

[25] Ghadeer Mousa, Hassan Abu Hassan and Hussein Al-Rimmawi, Prediction of Type 2 Diabetes using logistic regression techniques, Turkish Journal of Computer and Mathematics Education Vol.15 No.1(2024).

[26] Linshan Xie, Pima Indian Diabetes Database and Machine Learning Models for Diabetes Prediction, Highlights in Science, Engineering and Technology, Volume 88 (2024).

[27] N.P. Tigga, S. Garg, Prediction of type 2 diabetes using machine learning classification methods, Procedia Comput. Sci. 167 (2019) (2020) 706–716, https:// doi.org/10.1016/j.procs.2020.03.336.

Downloads

Published

20-10-2025

How to Cite

1.
FAKIR Y, KHALIL S, FAKIR W. Analysis of Data Mining Techniques and Algorithms on Diabetes Dataset. EAI Endorsed Trans Perv Health Tech [Internet]. 2025 Oct. 20 [cited 2025 Oct. 20];11. Available from: https://publications.eai.eu/index.php/phat/article/view/5317