Predicting Breast Cancer with Ensemble Methods on Cloud




Bagging, Boosting, Stacking, Random Forest, Ensemble methods


There are many dangerous diseases and high mortality rates for women (including breast cancer). If the disease is detected early, correctly diagnosed and treated at the right time, the likelihood of illness and death is reduced. Previous disease prediction models have mainly focused on methods for building individual models. However, these predictive models do not yet have high accuracy and high generalization performance. In this paper, we focus on combining these individual models together to create a combined model, which is more generalizable than the individual models. Three ensemble techniques used in the experiment are: Bagging; Boosting and Stacking (Stacking include three models: Gradient Boost, Random Forest, Logistic Regression) to deploy and apply to breast cancer prediction problem. The experimental results show the combined model with the ensemble methods based on the Breast Cancer Wisconsin dataset; this combined model has a higher predictive performance than the commonly used individual prediction models.


Saleh H, Abdelghany FS, Alyami H, Alosaimi W. Predicting Breast Cancer Based on Optimized Deep Learning Approach. Hindavi. 2022; Article ID 1820777:11 pages.

Asri H, Mousannif H, Al HM, Noel T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Procedia Computer Science. 2016; vol 83: pp 1064–1069.

Yang R. Enterprise Network Marketing Prediction Using the Optimized GA-BP Neural Network. Complexity Article. 2020; ID 6682296.

Zang C, Ma Y. Ensemble Machine Learning Methods and Applications. Springer Science+Business Media. 2012.

Rosly R, Makhtar M, Awang M H. Rahman N D, Deris M H. Comparison of Ensemble Classifiersfor Water Quality Dataset. Proceedings of the UniSZA Research Conference 2015 (URC ’15). 2015; Universiti Sultan Zainal Abidin.

Drucker H, Cortes C, Jackel L, LeCun Y. Boosting and Other Ensemble Methods. Neural Computation. 1994; vol 6: 1289-130.

Todorovski L, Dzeroski S. Combining classifiers with meta decision trees. Researchgate. 2003; 50(3): 223-249.

Wolpert DH. Stacked generalization. Researchgate. 1992; vol5(2): 241-259.

Adele C, David R, John R. Random Forests. Springer. 2011; vol 45(1): pp 157-176.

Pintelas P, Livieris E I. Ensemble Algorithms and Their Applications. Mdpi AG. 2020; ISBN 978-3-03936-959-1

Aldhyani HHT, AI-Yaari M, Hasan Alkahtanni, Mashael Maashi. Water Quality Prediction Using Artificial Intelligence Algorithms. Hindawi. 2020; vol. 2020: Article ID 6659314: 12 pages.

Rokach L, Maimon O. Decision Tree. researchGate, (2005).


Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. December 19, 1996.

Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. August 2016.

Nakano FK, Mastelini SM, Barbon S, Cerri R. Stacking Methods for Hierarchical Classification. IEEE 2017; vol 2017: 289-296.

Robert E. Schapire. The strength of weak learnability. Manufactured in The Netherlands; 2017; vol 5 (2) :197-227

Sultana J. Predicting Breast Cancer using Logistic Regression and Multi-Class Classifiers. Researchgate . 2018; vol 7.

Cheng X, Whan W, Liang Y, Lin X, Luo J, Zhong W, Chen D. Risk Prediction of Coronary Artery Stenosis in Patients with Coronary Heart Disease Based on Logistic Regression and Artificial Neural Network. Computational and Mathematical Methods in Medicine. 2022; Article ID 3684700.

Asri H, Mousannif H, Al Moatassime H, Noel T. Using machine learning algorithms for breast cancer risk prediction and diagnosis. Sciencedirect. 2016; vol: 83: 1064-1069.

Chen H, Du M, Zhang Y, Yang C. Research on Disease Prediction Method Based on R-Lookahead-LSTM. Computational Intelligence and Neuroscience. 2022; vol: 2022, Article ID 8431912.

Islam M Md, Haque Md R, Iqbal H, Hasan Md M, Hasan M, Kabir MN. Breast cancer prediction: a comparative study using machine learning techniques. Original research. 2020; vol: 1; no: 5; pp: 1–14.

Prananda AR, Nugroho HA, Frannita EL. Rapid assessment of breast cancer malignancy using deep neural network. Springer, Surabaya, Indonesia Cairo, Egypt, October 2021; pp. 639–649.

Alickovic E, Subasi A. Breast cancer diagnosis using ga feature selection and rotation forest. Researchgate. 2017; vol: 28; no. 4; pp: 753–763.

Leo Breiman. Bagging predictors. Machine learning. 1996; 24(2):123–140.

Sahran S, Qasem A, Omar K, Albashih D, Adam A, Abdullah SNHS, Abdullah A, Hussain RI, Ismail F, Abdullah N, Pauzi Md HS, Shukor Adb N. Machine Learning Methods for Breast Cancer Diagnostic. 2018, Avialable: 79446, retrieved on 13th September, 2020.

Quinlan J R. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81-106, 1986.

Jerome H. Friedman. Stochastic Gradient Boosting. Jscimedcentral. 29 October 2018.




How to Cite

Pham A, Tran T, Tran P, Huynh H. Predicting Breast Cancer with Ensemble Methods on Cloud. EAI Endorsed Trans Context Aware Syst App [Internet]. 2023 Mar. 29 [cited 2023 Jun. 2];9(1):e1. Available from: