An Ensemble Models for the Prediction of Sickle Cell Disease from Erythrocytes Smears
Keywords:Sickle Cell Disease, Erythrocytes, Machine Learning Algorithms, Ensemble Models, Health Information System
INTRODUCTION: The human blood as a collection of tissues containing Red Blood Cells (RBCs), circular in shape and acting as an oxygen carrier, are frequently deformed by multiple blood diseases inherited from parents. These hereditary diseases of blood involve abnormal haemoglobin (Hb) or anemia which are major public health issues. Sickle Cell Disease (SCD) is one of the common non-communicable disease and genetic disorder due to changes in hematological conditions of the RBCs which often causes the inheritance of mutant Hb genes by the patient..
OBJECTIVES: The process of manual valuation, predictions and diagnosis of SCD necessitate for a passionate time spending and if not done properly can lead to wrong predictions and diagnosis. Machine Learning (ML), a branch of AI which emphases on building systems that improve performance based on the data they consume is appropriate. Despite previous research efforts in predicting with single ML algorithm, the existing systems still suffer from high false and wrong predictions.
METHODS: Thus, this paper aimed at performing comparative analysis of individual ML algorithms and their ensemble models for effective predictions of SCD (elongated shapes) in erythrocytes blood cells. Three ML algorithms were selected, and ensemble models were developed to perform the predictions and metrics were used to evaluate the performance of the model using accuracy, sensitivity, Receiver Operating Characteristics-Area under Curve (ROC-AUC) and F1 score metrics. The results were compared with existing literature for model(s) with the best prediction metrics performance..
RESULTS: The analysis was carried out using Python programming language. Individual ML algorithms reveals that their accuracies show MLR=87%, XGBoost=90%, and RF=93%, while hybridized RF-MLR=92% and RF-XGBoost=99%. The accuracy of RF-XGBoost of 99% outperformed other individual ML algorithms and Hybrid models.
CONCLUSION: Thus, the study concluded that involving hybridized ML algorithms in medical datasets increased predictions performance as it removed the challenges of high variance, low accuracy and feature noise and biases of medical datasets. The paper concluded that ensemble classifiers should be considered to improve sickle cell disease predictions.
L. Alzubaidi, M. A. Fadhel, O. Al-shamma, and J. Zhang, “Deep Learning Models for Classification of Red Blood Cells in Microscopy Images to Aid in Sickle Cell Anemia Diagnosis,” Electron. MDPI, vol. 9, no. 427, pp. 1–18, 2020.
P. K. Das, S. Meher, R. Panda, and A. Abraham, “A Review of Automated Methods for the Detection of Sickle Cell Disease,” IEEE Rev. Biomed. Eng., vol. 13, pp. 309–324, 2020, doi: 10.1109/RBME.2019.2917780.
P. L. Stephenson, M. V. Taylor, and C. Anglin, “Sickle Cell Disease,” J. Consum. Health Internet, vol. 19, no. 2, pp. 122–131, 2015, doi: 10.1080/15398285.2015.1026706.
M. W. Darlison and B. Modell, “Sickle-cell disorders: limits of descriptive epidemiology.,” Lancet (London, England), vol. 381, no. 9861, pp. 98–9, Jan. 2013, doi: 10.1016/S0140-6736(12)61817-0.
J. Kanter and R. Kruse-Jarres, “Management of sickle cell disease from childhood through adulthood.,” Blood Rev., vol. 27, no. 6, pp. 279–87, Nov. 2013, doi: 10.1016/j.blre.2013.09.001.
L. Alzubaidi, O. Al-Shamma, M. A. Fadhel, L. Farhan, and J. Zhang, “Classification of red blood cells in sickle cell anemia using deep convolutional neural network,” Adv. Intell. Syst. Comput. - Springer, vol. 940, pp. 550–559, 2020, doi: 10.1007/978-3-030-16657-1_51.
C. Grosan and A. Abraham, Machine Learning, vol. 17. 2011. doi: 10.1007/978-3-642-21004-4_10.
S. W. Knox, “Machine Learning - A Coincise Introduction,” Wiley Ser. Probab. Stat., pp. 1–320, 2018.
G. Roth, “Machine learning with Python: An introduction,” JavaWorld, pp. 1–5, 2019, [Online]. Available: https://www.javaworld.com/article/3322898/application-development/machine-learning-with-python-an-introduction.html
O. B. Ayoade, “Comparative Analysis of Selected Machine Learning Algorithms for predicting Sickle Cell Disease,” Depatertment Comput. Sci. Fac. Commun. Inf. Sci. Univ. Ilorin, Kwara State, Niger., vol. December, pp. 1–270, 2021.
N. I. of H. NIH, “Health Information for the Public - Sickle Cell Disease (SCD),” National Heart Lung and Blood Institute, 2016.
N. I. of H. NIH, “The Management of Sickle Cell Disease,” Natl. Hear. Lung Blood Inst., no. 02–2117, pp. 1–206, 2015, [Online]. Available: http://www.nhlbi.nih.gov
S. E. Roger and H. R. Rodney, “Some Medical and Social Aspects of the Treatment for Genetic-Metabolic Diseases,” Ann. Am. Acad. Polit. Soc. Sci., vol. 399, pp. 30–37, 2017.
M. Saad and Z. Salem, “Basic concepts of medical genetics , formal genetics,” Egypt. J. Med. Hum. Genet., vol. 15, no. 1, pp. 99–101, 2014, doi: 10.1016/j.ejmhg.2013.10.001.
L. M. Gunder and S. A. Martin, Essentials of Medical Genetics for Health Professionals. USA: Jones & Bartlett Learning, LLC, 2011.
World-Health-Organization, “Sickle-Cell Anaemia,” World Heal. Organ., vol. 11, no. April, pp. 1–5, 2020.
X. Jiang, T. Wang, and Z. Xing, “Simulation Study of Hemodynamics of Red Blood Cells in Stenotic Microvessels,” Adv. Mater. Res. - Biomater. Bioeng., vol. 647, pp. 321–324, 2013, doi: 10.4028/www.scientific.net/AMR.647.321.
J. R. Frost et al., “Improving Sickle Cell Transitions of Care Through Health Information Technology.,” Am. J. Prev. Med., vol. 51, no. 1 Suppl 1, pp. S17-23, Jul. 2016, doi: 10.1016/j.amepre.2016.02.004.
C. P. Rivera, A. Veneziani, R. E. Ware, and M. O. Platt, “Sickle cell anemia and pediatric strokes : Computational fluid dynamics analysis in the middle cerebral artery,” Exp. Biol. Med., vol. 241, pp. 755–765, 2016, doi: 10.1177/1535370216636722.
S. D. Grosse, I. Odame, H. K. Atrash, D. D. Amendah, F. B. Piel, and T. N. Williams, “Sickle cell disease in Africa: A neglected cause of early childhood mortality,” Am. J. Prev. Med., vol. 41, no. 6 SUPPL.4, pp. S398–S405, 2011, doi: 10.1016/j.amepre.2011.09.013.
B. Nisha, B. Madasamy, and J. J. Tamilselvi, “Enhanced Backpropagation Approach for Identifying Genetic Disease,” Appl. Mech. Mater., vol. 622, pp. 75–80, 2014, doi: 10.4028/www.scientific.net/AMM.622.75.
O. S. Platt et al., “Mortality in Sickle Cell Disease-Life Expentancy & Risk Factors,” N. Engl. J. Med., vol. 330, no. 23, pp. 1639–1644, 2012.
D. Divya, K. N. Rao, Si. G. Ratnam, and D. Sowjanya, “Supervised Machine Learning Algorithms for Analysis on Sickle Cell Anemia,” High Technol. Lett., vol. 26, no. 11, pp. 994–1004, 2020.
T. M. Sabu, “Bioinformatics,” Fundam. Concepts Bioinforma., pp. 1–155, 2003.
A. D. Hardie, L. Ramos-Duran, and J. U. Schoepf, “Cardiac MR assessment of myocardial iron deposition in sickle cell disease : risk factors and association with cardiac function,” J. Cardiovasc. Magn. Reson., vol. 1, pp. 48–48, 2010, doi: 10.1186/1532-429X-12-S1-P274.
G. D. Magoulas and A. Prentza, “Machine Learning in Medical Applications,” Springer, vol. 204, no. 9, pp. 300–307, 2015, doi: 10.1007/3-540-44673-7.
G.-H. Manuel, F. A. Guerrero-Peña, S. Herold-García, A. Jaume-I-Capó, and P. D. Marrero-Fernández, “Red Blood Cell Cluster Separation From Digital Images for Use in Sickle Cell Disease,” IEEE J. Biomed. Heal. Informatics, vol. 19, no. 4, pp. 1514–1525, 2015, doi: 10.1109/JBHI.2014.2356402.
M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf. Process. Manag., vol. 45, no. 4, pp. 427–437, 2009, doi: 10.1016/j.ipm.2009.03.002.
Y. Zhang, S. Wang, and G. Ji, “A Comprehensive Survey on Particle Swarm Optimization Algorithm and Its Applications,” vol. 2015, 2015.
R. Rajbharath and L. Sankari, “Predicting Breast Cancer using Random Forest and Logistic Regression,” Int’l J. Eng. Sci. Comput., vol. 7, no. 4, pp. 10708–10713, 2017.
W. Chang, Y. Liu, Y. Xiao, X. Yuan, X. Xu, and S. Zhang, “A Machine-Learning-Based Prediction Method for Hypertension Outcomes Based on Medical Data,” Diagnosticsc - MDPI, vol. 9, no. 178, pp. 1–21, 2019.
N. Safca, D. Popescu, and L. Ichim, “Image Processing Techniques to Identify Red Blood Cells,” in International Conference on System Theory, Control and Computing, 2018, pp. 93–98.
F. Akter, A. Hossin, G. M. Daiyan, and M. Hossain, “Classification of Hematological Data Using Data Mining Technique to Predict Diseases,” J. Comput. Commun., vol. 6, pp. 76–83, 2018, doi: 10.4236/jcc.2018.64007.
J. Carson, K. Ollingsworth, R. Datta, G. Clark, and A. Segev, “A Hybrid Decision Tree-Neural Network ( DT-NN ) Model for Large-Scale Classification Problems,” Univ. South Alabama, vol. 2, no. 11, pp. 1–9, 2018.
A. M. Sagir and S. Sathasivam, “Design of a modified adaptive neuro fuzzy inference system classifier for medical diagnosis of Pima Indians Diabetes,” in AIP Conf Proc., 2017, vol. 1, pp. 1–7. doi: 10.1063/1.4995880.
T. D. Pham, N. Yokoya, J. Xia, N. T. Ha, and N. N. Le, “Comparison of Machine Learning Methods for Estimating Mangrove Above-Ground Biomass Using Multiple Source Remote Sensing Data in the Red River Delta Biosphere Reserve, Vietnam,” Remore Sens. - MDPI, vol. 12, no. 1334, pp. 1–24, 2020.
D. Uike and S. Thorat, “Computerization Method to classifying of Red Blood Cells using Boosting Technique,” Int’l J. Eng. Researcg Technol., vol. 9, no. 06, pp. 1572–1577, 2020.
P. E. M. D. Ouglass, T. I. O. C. Onnor, and B. A. J. Avidi, “Automated sickle cell disease identification in human red blood cells using a lensless single random phase encoding biosensor and convolutional neural networks,” Opt. Express, vol. 30, no. 20, pp. 35965–35977, 2022.
M. Darrin et al., “Classification of red cell dynamics with convolutional and recurrent neural networks : a sickle cell disease case study,” Sci. Rep., vol. 13, no. 745, pp. 1–12, 2023, doi: 10.1038/s41598-023-27718-w.
A. Sada, M. Bordukova, A. Makhro, N. Navab, A. Bogdanova, and C. Marr, “RedTell : an AI tool for interpretable analysis of red blood cell morphology,” Front. Physiol., vol. 14:1058720, pp. 1–16, 2023, doi: 10.3389/fphys.2023.1058720.
H. B. R. Alabed et al., “Comparison between Sickle Cell Disease Patients and Healthy Donors : Untargeted Lipidomic Study of Erythrocytes,” Int. J. Mol. Sci., vol. 24, no. 2529, pp. 1–15, 2023.
Y. Qiang, A. Sissoko, Z. L. Liu, T. Dong, and F. Zheng, “Microfluidic study of retention and elimination of abnormal red blood cells by human spleen with implications for sickle cell disease,” PNAS - Eng. Cell Biol., vol. 120, no. 6, pp. 1–12, 2023, doi: 10.1073/pnas.
D. J. Weatherall et al., “Global epidemiology of sickle haemoglobin in neonates : a contemporary geostatistical model-based map and population estimates,” The Lancet (London, England), vol. 381, no. 9861, pp. 142–151, 2013, doi: 10.1016/S0140-6736(12)61229-X.
M. Zhang, X. Li, M. Xu, and Q. Li, “Image Segmentation and Classification for Sickle Cell Disease using Deformable U-Net,” Springer, vol. 10, pp. 1–10, 2017.
A. Navlani, “Understanding Logistic Regression in Python,” Mach. Learn., vol. 3, pp. 1–11, 2019.
M. Stojiljkovic, “Logistic Regression in Python,” J. Data Sci., vol. 2507, no. 1, pp. 1–9, 2020.
Jason Brownlee, “Logistic Regression for Machine Learning,” Machinelearningmastery.Com, 2019.
Z. Zixuan, “Boosting Algorithm Explained,” Theory, Implement. Vis., vol. 7, pp. 1–12, 2019.
L. Zulalkha, “A Comprehensive Guide To Boosting Machine Learning Algorithms,” Edureka Res. Anal. J., vol. 3, no. 12, pp. 1–7, 2020.
L. Breiman, “Random Forests,” Mach. Learn., vol. 45, pp. 5–32, 2001.
P. R. Patil and S. A. Kinariwala, “Automated Diagnosis of Heart Disease using Random Forest Algorithm,” Int. J. Adv. Res. Ideas Innov. Technol., vol. 3, no. 2, pp. 579–589, 2017.
F. Alam and S. Pachauri, “Usage of Data Mining Techniques for combating cyber security,” Int’l J. Eng. Comput. Sci., vol. 6, no. 1, pp. 20011–20016, 2017, doi: 10.18535/ijecs/v6i1.31.
J. De Boer, “Applying machine learning methods for predicting 120-day hospital readmission by utilizing medical administrative patient data,” Tilbg. Unversity Res., vol. 6, pp. 1–35, 2019.
B. Bradley and G. Brandon, “Classification Algorithms - Decision Tree,” Sch. Informatics, vol. 1, pp. 1–6, 2020.
T. Yiu, “Understanding Random Forest How the Algorithm Works and Why it Is So Effective,” Mach. Learn. Appl. An Int. J., vol. 6, pp. 1–9, 2019.
C. Nguyen, Y. Wang, and H. N. Nguyen, “Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic,” J. Biomed. Sci. Eng., vol. 6, pp. 551–560, 2013.
B. Mustain and I. Nazrul, “An Early Diagnosis System for predicting Lung Cancer Risk Using _adaptive Neuro Fuzzy Inference System and Linear Discriminant Analysis,” J. MPE Mol. Pathol. Epidemiol., vol. 1, no. 1, pp. 1–4, 2016, [Online]. Available: http://molecular-pathological-epidemiology.imedpub.com/an-early-diagnosis-system-for-predicting-lung-cancer-risk-using-adaptive-neuro-fuzzy-inference-system-and-linear-discriminant-anal.php?aid=11320
B. Bryan, “Bioinformatics Computing,” Prentice Hall - Pearson Educ. Inc., vol. 1st Editio, pp. 1–395, 2002.
How to Cite
Copyright (c) 2023 Oluwafisayo Babatope Ayoade, Tinuke Omolewa Oladele, Agbotiname Lucky Imoize, Jerome Adetoye Adeloye, Joseph Bambidele Awotunde, Segun Omotayo Olorunyomi, Oulsola Theophilius Faboya, Ayorinde Oladele Idowu
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.