Life Expectancy Prediction using Recursive Partitioning and Bagging Algorithms

Authors

DOI:

https://doi.org/10.4108/eetsis.8959

Keywords:

Decision Tree, Life Expectancy, Random Forest algorithm, Public Health Policy, Social Sustainability

Abstract

Life expectancy is a crucial indicator of the population’s health and well-being. Recent research has highlighted the importance of various socioeconomic and health factors in determining the lifespan of individuals. Those factors include Gross Domestic Product (GDP), healthcare expenditure, mortality rates, and education level. This study employs recursive partitioning (decision trees) and bagging (random forest) techniques on the Life Expectancy dataset from the World Health Organization (WHO) to evaluate the
effectiveness of predictive models. The dataset was prepared by encoding categorical features, scaling the features, normalizing them, and handling outliers. Mean imputation was used to handle missing values and produce a quality dataset. Optimized models based on recursive partitioning and bagging algorithms achieved performance efficiencies of 92% and 97%, respectively. The bagging algorithm-based model produced a mean squared error of 1.17, a mean absolute error of 2.0, and an R2-score of 97%. Other key findings included the importance of dataset characteristics—such as HIV/AIDS prevalence, adult mortality, and health resource income—in predicting life expectancy. This research elucidates the impact of feature engineering and data preprocessing strategies on data quality and predictive model precision, offering novel insights for public health policymaking and informing future research directions.

References

[1] Bali, Vikram, et al. "Life Expectancy: Prediction & Analysis using ML." 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO). IEEE, 2021.

[2] Raja, S. Selvakumar, et al. "HUMAN LIFE EXPECTANCY PREDICTION USING MACHINE LEARNING." Ann. For. Res 66.1 (2023): 4035-4043.

[3] Maps on the Web. Life Expectancy by Country, 2019. URL https://mapsontheweb.zoommaps.com/post/679623324847964160/life-expectancyby-country-2019. Accessed: 2024-05-31.

[4] Liu, Lei-Lei, et al. "Dynamic prediction of landslide life expectancy using ensemble system incorporating classical prediction models and machine learning." Geoscience Frontiers 15.2 (2024): 101758.

[5] Gill, Kanwarpartap Singh, et al. “Predicting Life Expectancy using Machine Learning Approach through Linear Regression and Decision Tree Classification Techniques.” 2023 3rd International Conference on Smart Generation Computing, Communication and Networking (SMART GENCON). IEEE, 2023.

[6] Amit, et al. "Evaluating Models for Better Life Expectancy Prediction." Computational Intelligence and Data Analytics: Proceedings of ICCIDA 2022. Singapore: Springer Nature Singapore, 2022. 389-404.

[7] Deshpande, Renuka, and Vaishnavi Uttarkar. "Life Expectancy using Data Analytics." International Journal for Research in Applied Science & Engineering Technology (IJRASET) 11 (2023): 972-978.

[8] Kerdprasop, Nittaya, and Kittisak Kerdprasop. "Association of economic and environmental factors to life expectancy of people in the Mekong basin." 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, 2017.

[9] Ronmi, Akanmode Eyitayo, Rajesh Prasad, and Baku Agyo Raphael. "How can artificial intelligence and data science algorithms predict life expectancy-An empirical investigation spanning 193 countries." International Journal of Information Management Data Insights 3.1 (2023):100168.

[10] Ho, Dennis Lim Kam, et al. "A Comparative Analysis of Machine Learning Techniques for Exploring Country Clustering Based on Life Expectancy." 2023 International Conference on Networking, Electrical Engineering, Computer Science, and Technology (ICon-NECT). IEEE, 2023.

[11] A. A. Bhosale and K. K. Sundaram. "Life predictionequation for human beings." In *2010 International Conference on Bioinformatics and Biomedical Technology*, pages 266–268. IEEE, 2010.

[12] Martin Cervantes, Pedro Antonio, Nuria Rueda Lopez, and Salvador Cruz Rambaud. "Life expectancy at birth in Europe: An econometric approach based on Random Forests methodology." Sustainability 12.1 (2020): 413.

[13] World Health Organization. World Health Organization [online], 2024. URL https://www.who.int/. Accessed: 2024-05-31.

[14] Joel, Luke Oluwaseye, Wesley Doorsamy, and Babu Sena Paul. "A review of missing data handling techniques for machine learning." International Journal of Innovative Technology and Interdisciplinary Sciences 5.3 (2022): 971-1005.

[15] Emmanuel, Tlamelo, et al. "A survey on missing data in machine learning." Journal of Big data 8 (2021): 1-37.

[16] Ruiz-Chavez, Zoila, Jaime Salvador-Meneses, and Jose Garcia-Rodriguez. "Machine learning methods based preprocessing to improve categorical data classification." Intelligent Data Engineering and Automated Learning–IDEAL 2018: 19th International Conference, Madrid, Spain, November 21–23, 2018, Proceedings, Part I 19. Springer International Publishing, 2018.

[17] Guedrez, Rabah, et al. "Label encoding algorithm for MPLS segment routing." 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA). IEEE, 2016.

[18] Shah, Deval, Zi Yu Xue, and Tor M. Aamodt. "Label encoding for regression networks." arXiv preprint arXiv:2212.01927 (2022).

[19] Zhang, Kai, and Minxia Luo. "Outlier-robust extreme learning machine for regression problems." Neurocomputing 151 (2015): 1519-1527.

[20] Yang, Jiawei, Susanto Rahardja, and Pasi Franti. "Outlier detection: how to threshold outlier scores?." Proceedings of the international conference on artificial intelligence, information processing and cloud computing. 2019.

[21] Jo, Jun-Mo. "Effectiveness of normalization preprocessing of big data to the machine learning performance." The Journal of the Korea institute of electronic communication sciences 14.3 (2019): 547-552.

[22] Darst, Burcu F., Kristen C. Malecki, and Corinne D. Engelman. "Using recursive feature elimination in random forest to account for correlated variables in high dimensional data." BMC genetics 19 (2018): 1-6.

[23] Mayer, Joshua, et al. "Sequential feature selection and inference using multi-variate random forests." Bioinformatics 34.8 (2018): 1336-1344.

[24] Alvi, Muhammad Bux, et al. "An effective framework for tweet level sentiment classification using recursive text pre-processing approach." International Journal of Advanced Computer Science and Applications 10.6 (2019).

[25] Faisal, Khulood, et al. "Life expectancy estimation based on machine learning and structured predictors." Proceedings of the 3rd International Conference on Advanced Information Science and System. 2021.

[26] Meshram, Siddhant Sunil. "Comparative analysis of life expectancy between developed and developing countries using machine learning." 2020 IEEE Bombay Section Signature Conference (IBSSC). IEEE, 2020.

[27] Charbuty, Bahzad, and Adnan Abdulazeez. "Classification based on decision tree algorithm for machine learning." Journal of Applied Science and Technology Trends 2.01 (2021): 20-28.

[28] Quinlan, J. Ross. "Induction of decision trees." Machine learning 1 (1986): 81-106.

[29] Loh,Wei-Yin. "Classification and regression trees." Wiley interdisciplinary reviews: data mining and knowledge discovery 1.1 (2011): 14-23.

[30] Mienye, Ibomoiye Domor, and Yanxia Sun. "A survey of ensemble learning: Concepts, algorithms, applications, and prospects." IEEE Access 10 (2022): 99129-99149.

[31] Khan, Azal Ahmad, Omkar Chaudhari, and Rohitash Chandra. "A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation." Expert Systems with Applications 244 (2024): 122778.

[32] Ali, Peshawa Jamal Muhammad, et al. "Data normalization and standardization: a technical report." Mach Learn Tech Rep 1.1 (2014): 1-6.

[33] Cinaroglu, Songul, and Onur Baser. "Comparative regression performances of machine learning methods optimising hyperparameters: application to health expenditures." International Journal of Bioinformatics Research and Applications 16.4 (2020): 387-407.

[34] Selvaraj, Gayathri, Punithavalli Muthuswamy, and Chaitanya Vasanth Kumar. "Alcohol Expectancy Prediction Using Fuzzy C-Regression Based Structural Brain Imaging." International Journal of Intelligent Engineering & Systems 12.5 (2019).

[35] Pisal, Nurul Shahira, et al. "Prediction of life expectancy for Asian population using machine learning algorithms." Malaysian Journal of Computing 7.2 (2022): 1150-1161.

[36] Pandey, Anshu, and Rita Chhikara. "Analysis of life expectancy using various regression techniques." 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN). IEEE, 2020.

[37] T. Choudhury, S. K. Bharti, M. Kumar Gourisaria, J. J. Jena, D. Kumar Behera and A. Bandyopadhyay, "Predictive Modeling of Life Expectancy Using Machine Learning Algorithms," 2024 Global Conference on Communications and Information Technologies (GCCIT), BANGALORE, India, 2024, pp. 1-6, doi: 10.1109/GCCIT63234.2024.10862085.

Downloads

Published

13-10-2025

How to Cite

1.
Alvi MB, Alvi M, Yasir Hussain, Rehman W, Kavita Tabassum, Shahnawaz Farhan, Fatima Noor. Life Expectancy Prediction using Recursive Partitioning and Bagging Algorithms. EAI Endorsed Scal Inf Syst [Internet]. 2025 Oct. 13 [cited 2025 Oct. 13];12(5). Available from: https://publications.eai.eu/index.php/sis/article/view/8959