Predicting Breast Cancer with Ensemble Methods on Cloud
Keywords:Bagging, Boosting, Stacking, Random Forest, Ensemble methods
There are many dangerous diseases and high mortality rates for women (including breast cancer). If the disease is detected early, correctly diagnosed and treated at the right time, the likelihood of illness and death is reduced. Previous disease prediction models have mainly focused on methods for building individual models. However, these predictive models do not yet have high accuracy and high generalization performance. In this paper, we focus on combining these individual models together to create a combined model, which is more generalizable than the individual models. Three ensemble techniques used in the experiment are: Bagging; Boosting and Stacking (Stacking include three models: Gradient Boost, Random Forest, Logistic Regression) to deploy and apply to breast cancer prediction problem. The experimental results show the combined model with the ensemble methods based on the Breast Cancer Wisconsin dataset; this combined model has a higher predictive performance than the commonly used individual prediction models.
Saleh H, Abdelghany FS, Alyami H, Alosaimi W. Predicting Breast Cancer Based on Optimized Deep Learning Approach. Hindavi. 2022; Article ID 1820777:11 pages. DOI: https://doi.org/10.1155/2022/1820777
Asri H, Mousannif H, Al HM, Noel T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Computer Science. 2016; vol 83: pp 1064–1069.
Yang R. Enterprise Network Marketing Prediction Using the Optimized GA-BP Neural Network. Complexity Article. 2020; ID 6682296. DOI: https://doi.org/10.1155/2020/6682296
Zang C, Ma Y. Ensemble Machine Learning Methods and Applications. Springer Science+Business Media. 2012. DOI: https://doi.org/10.1007/978-1-4419-9326-7
Rosly R, Makhtar M, Awang M H. Rahman N D, Deris M H. Comparison of Ensemble Classifiersfor Water Quality Dataset. Proceedings of the UniSZA Research Conference 2015 (URC ’15). 2015; Universiti Sultan Zainal Abidin.
Drucker H, Cortes C, Jackel L, LeCun Y. Boosting and Other Ensemble Methods. Neural Computation. 1994; vol 6: 1289-130. DOI: https://doi.org/10.1162/neco.1918.104.22.1689
Todorovski L, Dzeroski S. Combining classifiers with meta decision trees. Researchgate. 2003; 50(3): 223-249. DOI: https://doi.org/10.1023/A:1021709817809
Wolpert DH. Stacked generalization. Researchgate. 1992; vol5(2): 241-259. DOI: https://doi.org/10.1016/S0893-6080(05)80023-1
Adele C, David R, John R. Random Forests. Springer. 2011; vol 45(1): pp 157-176.
Pintelas P, Livieris E I. Ensemble Algorithms and Their Applications. Mdpi AG. 2020; ISBN 978-3-03936-959-1
Aldhyani HHT, AI-Yaari M, Hasan Alkahtanni, Mashael Maashi. Water Quality Prediction Using Artificial Intelligence Algorithms. Hindawi. 2020; vol. 2020: Article ID 6659314: 12 pages. DOI: https://doi.org/10.1155/2020/6659314
Rokach L, Maimon O. Decision Tree. researchGate, (2005).
SOCIAL-SCIENCES https://www.encyclopedia.com/social-sciences/applied-and-social-sciences-magazines/bootstrap-method, (2022).
Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. December 19, 1996. DOI: https://doi.org/10.1007/3-540-59119-2_166
Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. August 2016. DOI: https://doi.org/10.1145/2939672.2939785
Nakano FK, Mastelini SM, Barbon S, Cerri R. Stacking Methods for Hierarchical Classification. IEEE 2017; vol 2017: 289-296. DOI: https://doi.org/10.1109/ICMLA.2017.0-145
Robert E. Schapire. The strength of weak learnability. Manufactured in The Netherlands; 2017; vol 5 (2) :197-227 DOI: https://doi.org/10.1007/BF00116037
Sultana J. Predicting Breast Cancer using Logistic Regression and Multi-Class Classifiers. Researchgate . 2018; vol 7. DOI: https://doi.org/10.14419/ijet.v7i4.20.22115
Cheng X, Whan W, Liang Y, Lin X, Luo J, Zhong W, Chen D. Risk Prediction of Coronary Artery Stenosis in Patients with Coronary Heart Disease Based on Logistic Regression and Artificial Neural Network. Computational and Mathematical Methods in Medicine. 2022; Article ID 3684700. DOI: https://doi.org/10.1155/2022/3684700
Asri H, Mousannif H, Al Moatassime H, Noel T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Sciencedirect. 2016; vol: 83: 1064-1069. DOI: https://doi.org/10.1016/j.procs.2016.04.224
Chen H, Du M, Zhang Y, Yang C. Research on Disease Prediction Method Based on R-Lookahead-LSTM. Computational Intelligence and Neuroscience. 2022; vol: 2022, Article ID 8431912. DOI: https://doi.org/10.1155/2022/8431912
Islam M Md, Haque Md R, Iqbal H, Hasan Md M, Hasan M, Kabir MN. Breast cancer prediction: a comparative study using machine learning techniques. Original research. 2020; vol: 1; no: 5; pp: 1–14. DOI: https://doi.org/10.1007/s42979-020-00305-w
Prananda AR, Nugroho HA, Frannita EL. Rapid assessment of breast cancer malignancy using deep neural network. Springer, Surabaya, Indonesia Cairo, Egypt, October 2021; pp. 639–649. DOI: https://doi.org/10.1007/978-981-33-6926-9_56
Alickovic E, Subasi A. Breast cancer diagnosis using ga feature selection and rotation forest. Researchgate. 2017; vol: 28; no. 4; pp: 753–763. DOI: https://doi.org/10.1007/s00521-015-2103-9
Leo Breiman. Bagging predictors. Machine learning. 1996; 24(2):123–140. DOI: https://doi.org/10.1007/BF00058655
Sahran S, Qasem A, Omar K, Albashih D, Adam A, Abdullah SNHS, Abdullah A, Hussain RI, Ismail F, Abdullah N, Pauzi Md HS, Shukor Adb N. Machine Learning Methods for Breast Cancer Diagnostic. 2018, Avialable: http://dx.doi.org/10.5772/intechopen. 79446, retrieved on 13th September, 2020. DOI: https://doi.org/10.5772/intechopen.79446
Quinlan J R. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81-106, 1986. DOI: https://doi.org/10.1007/BF00116251
Jerome H. Friedman. Stochastic Gradient Boosting. Jscimedcentral. 29 October 2018.
How to Cite
Copyright (c) 2023 EAI Endorsed Transactions on Context-aware Systems and Applications
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 3.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.