A Comparative Study on Machine Learning Classifiers for Cervical Cancer Prediction: A Predictive Analytic Approach
DOI:
https://doi.org/10.4108/eetiot.6223Keywords:
Cervical cancer, Machine learning algorithms, Early diagnosis, Hyperparameter tunning, Support Vector Machine (SVM)Abstract
INTRODUCTION: Cervical cancer is a significant global health concern, particularly in underdeveloped nations where preventive healthcare measures are limited. Early identification of the risks associated with cervical cancer is essential for both prevention and treatment.
OBJECTIVES: In recent years, machine-learning algorithms have gained popularity as potential techniques for determining a person's risk of developing cancer based on demographic and medical information. This study uses a dataset that contains patient demographics, clinical history, and results from diagnostic tests to examine how machine learning-based algorithms can be used to predict the risks of cervical cancer.
METHODS: Various machine learning approaches are used to create predictive systems, including Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree (DT), K-Nearest Neighbors (KNN), Random Forest (RF), Logistic Regression (LR), Gradient Boosting (GB), Nearest Centroid (NC), Multilayer Perceptron(MP), and AdaBoost (AB).
RESULTS: The prediction capability of these models is assessed using performance metrics such as accuracy, sensitivity, specificity, f-measure, precision, and area under the receiver operating characteristic curve (AUC-ROC). Our results show that the decision tree has the highest accuracy, precision, and f1-score (98.91%, 97.81%, and 0.9889). Additionally, model performance was optimized by the use of hyperparameter tuning. After hyperparameter adjustment, the Support Vector Machine (SVM) showed superior accuracy of 99.64%, precision of 99.26%, and an F1-score of 0.9963, thereby indicating its potential in cervical cancer probability prediction. We also created a web application that uses a machine-learning model to estimate the risk of cervical cancer.
CONCLUSION: The findings of this study highlight the significance of SVM and demonstrate the potential and capabilities of machine learning techniques to enhance accurate prediction and patient outcomes for cervical cancer screening.
Downloads
References
[1] Bedell, Sarah L., et al. "Cervical cancer screening: past, present, and future." Sexual medicine reviews 8.1 (2020): 28-37. DOI: https://doi.org/10.1016/j.sxmr.2019.09.005
[2] Nithya, B., and V. Ilango. "Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction." SN Applied Sciences 1 (2019): 1-16. DOI: https://doi.org/10.1007/s42452-019-0645-7
[3] World Health Organization (WHO). Cervical cancer. https://www.who.int/cancer/prevention/diagnosis-screening/cervical-cancer/en/. Accessed March 27, 2024.
[4] Nagelhout, Gera, et al. "Is smoking an independent risk factor for developing cervical intra-epithelial neoplasia and cervical cancer? A systematic review and meta-analysis." Expert review of anticancer therapy 21.7 (2021): 781-794. DOI: https://doi.org/10.1080/14737140.2021.1888719
[5] Sung, Hyuna, et al. "Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries." CA: a cancer journal for clinicians 71.3 (2021): 209-249. DOI: https://doi.org/10.3322/caac.21660
[6] Kohli, Pahulpreet Singh, and Shriya Arora. "Application of machine learning in disease prediction." 2018 4th International conference on computing communication and automation (ICCCA). IEEE, 2018. DOI: https://doi.org/10.1109/CCAA.2018.8777449
[7] Cervical Cancer Dataset. https://www.kaggle.com/datasets/ranzeet013/cervical-cancer-dataset. Accessed March 24, 2024.
[8] Ratul, Ishrak Jahan, et al. "Early risk prediction of cervical cancer: A machine learning approach." 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). IEEE, 2022. DOI: https://doi.org/10.1109/ECTI-CON54298.2022.9795429
[9] Bhavani, C. H., and A. Govardhan. "Cervical cancer prediction using stacked ensemble algorithm with SMOTE and RFERF." Materials Today: Proceedings 80 (2023): 3451-3457. DOI: https://doi.org/10.1016/j.matpr.2021.07.269
[10] Pramanik, Rishav, et al. "A fuzzy distance-based ensemble of deep models for cervical cancer detection." Computer Methods and Programs in Biomedicine 219 (2022): 106776. DOI: https://doi.org/10.1016/j.cmpb.2022.106776
[11] Ali, Md Shahin, et al. "An ensemble classification approach for cervical cancer prediction using behavioral risk factors." Healthcare Analytics (2024): 100324. DOI: https://doi.org/10.1016/j.health.2024.100324
[12] Pacal, Ishak, and Serhat Kılıcarslan. "Deep learning-based approaches for robust classification of cervical cancer." Neural Computing and Applications 35.25 (2023): 18813-18828. DOI: https://doi.org/10.1007/s00521-023-08757-w
[13] Ilyas, Qazi Mudassar, and Muneer Ahmad. "An enhanced ensemble diagnosis of cervical cancer: a pursuit of machine intelligence towards sustainable health." IEEE Access 9 (2021): 12374-12388. DOI: https://doi.org/10.1109/ACCESS.2021.3049165
[14] Peng, Jiaxu, Jungpil Hahn, and Ke-Wei Huang. "Handling missing values in information systems research: A review of methods and assumptions." Information Systems Research 34.1 (2023): 5-26. DOI: https://doi.org/10.1287/isre.2022.1104
[15] Ramaraju, H. E., Y. C. Nagaveni, and A. A. Khazi. "Use of Schiller's test versus Pap smear to increase detection rate of cervical dysplasias." International Journal of Reproduction, Contraception, Obstetrics and Gynecology 5.5 (2016): 1446-1451. DOI: https://doi.org/10.18203/2320-1770.ijrcog20161302
[16] Sinka, Katy. "The global burden of sexually transmitted infections." Clinics in Dermatology 42.2 (2024): 110-118. DOI: https://doi.org/10.1016/j.clindermatol.2023.12.002
[17] Malevolti, Maria Chiara, et al. "Dose-risk relationships between cigarette smoking and cervical cancer: a systematic review and meta-analysis." European Journal of Cancer Prevention 32.2 (2023): 171-183. DOI: https://doi.org/10.1097/CEJ.0000000000000773
[18] Anastasiou, Elle, et al. "The relationship between hormonal contraception and cervical dysplasia/cancer controlling for human papillomavirus infection: A systematic review." Contraception 107 (2022): 1-9. DOI: https://doi.org/10.1016/j.contraception.2021.10.018
[19] Damayanti, Siti, Uki Retno Budihastuti, and Bhisma Murti. "Meta-Analysis: Effects of Hormonal Contraceptive Use and History of Sexually Transmitted Disease on the Risk of Cervical Cancer." Journal of Maternal and Child Health 8.6 (2023): 711-722. DOI: https://doi.org/10.26911/thejmch.2023.08.06.05
[20] Barroeta, Julieta E. "The Future Role of Cytology in Cervical Cancer Screening in the Era of HPV Vaccination." Acta Cytologica 67.2 (2023): 111-118. DOI: https://doi.org/10.1159/000528964
[21] Minalt, Nicole, et al. "Association of Intrauterine Device Use and Endometrial, Cervical, and Ovarian Cancer: an Expert Review." American Journal of Obstetrics and Gynecology (2023). DOI: https://doi.org/10.1016/j.ajog.2023.03.039
[22] Hayaty, Mardhiya, Siti Muthmainah, and Syed Muhammad Ghufran. "Random and synthetic over-sampling approach to resolve data imbalance in classification." International Journal of Artificial Intelligence Research 4.2 (2020): 86-94. DOI: https://doi.org/10.29099/ijair.v4i2.152
[23] Abro, Abdul Ahad, et al. "Machine learning classifiers: a brief primer." University of Sindh Journal of Information and Communication Technology 5.2 (2021): 63-68.
[24] Sun, Jiancheng, et al. "Analysis of the distance between two classes for tuning SVM hyperparameters." IEEE transactions on neural networks 21.2 (2010): 305-318. DOI: https://doi.org/10.1109/TNN.2009.2036999
[25] Priyanka, and Dharmender Kumar. "Decision tree classifier: a detailed survey." International Journal of Information and Decision Sciences 12.3 (2020): 246-269. DOI: https://doi.org/10.1504/IJIDS.2020.108141
[26] Genuer, Robin, et al. Random forests. Springer International Publishing, 2020. DOI: https://doi.org/10.1007/978-3-030-56485-8
[27] Pisner, Derek A., and David M. Schnyer. "Support vector machine." Machine learning. Academic Press, 2020. 101-121. DOI: https://doi.org/10.1016/B978-0-12-815739-8.00006-7
[28] Jupyter notebook, https://jupyter.org/, [Last accessed: 16.04.24].
[29] Amin, Fahmy, and M. Mahmoud. "Confusion matrix in binary classification problems: a step-by-step tutorial." Journal of Engineering Research 6.5 (2022): 0-0. DOI: https://doi.org/10.21608/erjeng.2022.274526
[30] Schwenke, Carsten, and A. G. Schering. "True positives, true negatives, false positives, false negatives." Wiley StatsRef: Statistics Reference Online (2014). DOI: https://doi.org/10.1002/9781118445112.stat06783
[31] Powers, D. M. (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness& Correlation. Journal of Machine Learning Technologies, 2(1), 37-63. doi: 10.5121/jmlr.2011.2103
[32] Polo, Tatiana Cristina Figueira, and Hélio Amante Miot. "Use of ROC curves in clinical and experimental studies." Jornal vascular brasileiro 19 (2020): e20200186. DOI: https://doi.org/10.1590/1677-5449.200186
[33] Turner, J. Rick. "Area under the curve (AUC)." Encyclopedia of Behavioral Medicine (2020): 146-146. DOI: https://doi.org/10.1007/978-3-030-39903-0_986
[34] Weerts, Hilde JP, Andreas C. Mueller, and Joaquin Vanschoren. "Importance of tuning hyperparameters of machine learning algorithms." arXiv preprint arXiv:2007.07588 (2020).
[35] Verma, Ankit, et al. "Web application implementation with machine learning." 2021 2nd International Conference on Intelligent Engineering and Management (ICIEM). IEEE, 2021. DOI: https://doi.org/10.1109/ICIEM51511.2021.9445368
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2024 EAI Endorsed Transactions on Internet of Things
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 3.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.