Enhancing Diabetes Prediction with Data Preprocessing and various Machine Learning Algorithms


  • Gudluri Saranya Vellore Institute of Technology University image/svg+xml
  • Sagar Dhanraj Pande Vellore Institute of Technology University image/svg+xml




Accuracy, Diabetes, Machine Learning, Naive Bayes, Random Forest



Diabetes mellitus, usually called diabetes, is a serious public health issue that is spreading like an epidemic around the world. It is a condition that results in elevated glucose levels in the blood. India is often referred to as the 'Diabetes Capital of the World', due to the country's 17% share of the global diabetes population. It is estimated that 77 million Indians over the age of 18 have diabetes (i.e., everyone in eleven) and there are also an estimated 25 million pre-diabetics. One of the solutions to control diabetes growth is to detect it at an early stage which can lead to improved treatment. So, in this project, we are using a few machine learning algorithms like SVM, Decision Tree Classifier, Random Forest, KNN, Linear regression, Logistic regression, Naive Bayes to effectively predict the diabetes. Pima Indians Diabetes Database has been used in this project. According to the experimental findings, Random Forest produced an accuracy of 91.10% which is higher among the different algorithms used.


Download data is not yet available.
<br data-mce-bogus="1"> <br data-mce-bogus="1">


Kharroubi, A. T., & Darwish, H. M. (2015). Diabetes mellitus: The epidemic of the century. World journal of diabetes, 6(6), 850–867. https://doi.org/10.4239/wjd.v6.i6.850 DOI: https://doi.org/10.4239/wjd.v6.i6.850

American Diabetes Association (2010). Diagnosis and classification of diabetes mellitus. Diabetes care, 33 Suppl 1(Suppl 1), S62–S69. https://doi.org/10.2337/dc10-S062 DOI: https://doi.org/10.2337/dc10-S062

Rabie, O., Alghazzawi, D., Asghar, J., Saddozai, F. K., & Asghar, M. Z. (2022). A Decision Support System for Diagnosing Diabetes Using Deep Neural Network. Frontiers in public health, 10, 861062. https://doi.org/10.3389/fpubh.2022.861062 DOI: https://doi.org/10.3389/fpubh.2022.861062

Alluri, R. P., & Hemavathy, R. (2021). Diabetes Prediction Using Ensemble Techniques. International Journal of Applied Engineering Research, 16(5), 410-415. Retrieved from http://www.ripublication.com https://www.ripublication.com/ijaer21/ijaerv16n5_12.pdf

Salliah Shafi Bhat, Venkatesan Selvam, Gufran Ahmad Ansari, Mohd Dilshad Ansari, Md Habibur Rahman, and Mamoon Rashid. 2022. Prevalence and Early Prediction of Diabetes Using Machine Learning in North Kashmir: A Case Study of District Bandi-pora. Intell. Neuroscience 2022 (2022). https://doi.org/10.1155/2022/2789760 DOI: https://doi.org/10.1155/2022/2789760

Siri, Adel & Ullah, Syed Sajid. (2021). An Improved Artificial Neural Network Model for Effective Diabetes Prediction. Complexity. 2021. 1-10. 10.1155/2021/5525271. DOI: https://doi.org/10.1155/2021/5525271

Xue, Jingyu & Min, Fanchao & Ma, Fengying. (2020). Research on Diabetes Prediction Method Based on Machine Learning. Journal of Physics: Conference Series. 1684. 012062. 10.1088/1742-6596/1684/1/012062. DOI: https://doi.org/10.1088/1742-6596/1684/1/012062

Yousef K. Qawqzeh, Abdullah S. Bajahzar, Mahdi Jemmali, Mohammad Mahmood Otoom, Adel Thaljaoui, "Classification of Diabetes Using Photoplethysmogram (PPG) Waveform Analysis: Logistic Regression Modeling", BioMed Research International, vol. 2020, Article ID 3764653, 6 pages, 2020. https://doi.org/10.1155/2020/3764653 DOI: https://doi.org/10.1155/2020/3764653

G. A. Pethunachiyar, "Classification of Diabetes Patients Using Kernel Based Support Vector Machines," 2020 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2020, pp. 1-4, doi: 10.1109/ICCCI48352.2020.9104185. DOI: https://doi.org/10.1109/ICCCI48352.2020.9104185

M. F. Faruque, Asaduzzaman and I. H. Sarker, "Performance Analysis of Machine Learning Techniques to Predict Diabetes Mellitus," 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox'sBazar, Bangla-desh, 2019, pp. 1-4, doi: 10.1109/ECACE.2019.8679365. DOI: https://doi.org/10.1109/ECACE.2019.8679365

Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., & Tang, H. (2018). Predicting Diabetes Mellitus with Machine Learning Techniques. Frontiers in genetics, 9, 515. https://doi.org/10.3389/fgene.2018.00515 DOI: https://doi.org/10.3389/fgene.2018.00515

Jegan, Chitra. (2013). Classification Of Diabetes Disease Using Support Vector Machine. International Journal of Engineering Research and Applications. 3. 1797 - 1801.

Zhang Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of translational medicine, 4(11), 218. https://doi.org/10.21037/atm.2016.03.37 DOI: https://doi.org/10.21037/atm.2016.03.37

Shafi, Salliah and Ansari, Gufran Ahmad, Early Prediction of Diabetes Disease & Classification of Algorithms Using Machine Learning Approach (May 25, 2021). Proceedings of the International Conference on Smart Data Intelligence (ICSMDI 2021), Available at SSRN: https://ssrn.com/abstract=3852590 or http://dx.doi.org/10.2139/ssrn.3852590 DOI: https://doi.org/10.2139/ssrn.3852590

AlZu’bi S, Elbes M, Mughaid A, Bdair N, Abualigah L, Forestiero A, Zitar RA. Diabetes Monitoring System in Smart Health Cities Based on Big Data Intelligence. Future Internet. 2023; 15(2):85. https://doi.org/10.3390/fi15020085 DOI: https://doi.org/10.3390/fi15020085

Khanam, Jobeda Jamal & Foo, Simon. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Express. 7. 10.1016/j.icte.2021.02.004. DOI: https://doi.org/10.1016/j.icte.2021.02.004




How to Cite

G. Saranya and S. D. Pande, “Enhancing Diabetes Prediction with Data Preprocessing and various Machine Learning Algorithms ”, EAI Endorsed Trans IoT, vol. 10, Mar. 2024.

Most read articles by the same author(s)