A Comprehensive Feature Engineering Approach for Breast Cancer Dataset


  • Shambhvi Sharma DPS Mathura Road
  • Monica Sahni DPS Mathura Road




Breast Cancer, Univariate Analysis, Bivariate Analysis, Heat Map, Correlation


Breast cancer continues to pose a significant challenge in the field of healthcare, serving as the primary cause of cancer-related deaths in women on a global scale. The present study aims to investigate the intricate relationship between breast cancer, statistical analysis, and feature engineering. By conducting an extensive analysis of a comprehensive dataset and employing sophisticated statistical methodologies, this research endeavor aims to unveil concealed insights that can enrich the medical community's existing knowledge base. Through the implementation of rigorous feature selection and extraction methodologies, the overarching aim is to augment the comprehension of breast cancer. Moreover, the study showcases the successful incorporation of univariate and bivariate analysis in order to enhance the accuracy of diagnostic procedures. The convergence of these disciplines exhibits considerable promise in the realm of breast cancer detection and prediction, facilitating cooperative endeavours aimed at addressing this widespread malignancy.


Download data is not yet available.


N. Sharma, M. Mangla, M. Ishaque and S. N. Mohanty, "Inferential Statistics and Visualization Techniques for Aspect Analysis," 2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC), Jeddah, Saudi Arabia, 2023, pp. DOI: https://doi.org/10.1109/ICAISC56366.2023.10085093


Dubey, A. K., Gupta, U., & Jain, S. (2015). Breast cancer statistics and prediction methodology: a systematic review and analysis. Asian Pacific journal of cancer prevention, 16(10), 4237-4245. DOI: https://doi.org/10.7314/APJCP.2015.16.10.4237

Lewis, J. T., Hartmann, L. C., Vierkant, R. A., Maloney, S. D., Pankratz, V. S., Allers, T. M., ... & Visscher, D. W. (2006). An analysis of breast cancer risk in women with single, multiple, and atypical papilloma. The American journal of surgical pathology, 30(6), 665-672. DOI: https://doi.org/10.1097/00000478-200606000-00001

YK Ng, LN Ung, FC Ng, LSJ Sim, E. (2001). Statistical analysis of healthy and malignant breast thermography. Journal of medical engineering & technology, 25(6), 253-263. DOI: https://doi.org/10.1080/03091900110086642

Aruna, S., Rajagopalan, S. P., & Nandakishore, L. V. (2011). Knowledge based analysis of various statistical tools in detecting breast cancer. Computer Science & Information Technology, 2(2011), 37-45.

Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., & Jemal, A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 68(6), 394-424. DOI: https://doi.org/10.3322/caac.21492

Chen, H., Boutros, P. C., & Vennettilli, A. (2016). Characterizing heterogeneous subtype by integrating gene expression data and pathway markers. BMC Bioinformatics, 17(Suppl 13), 323.

Anderson, W. F., Luo, S., Chatterjee, N., Rosenberg, P. S., & Matsuno, R. K. (2019). J Natl Cancer Inst, 111(3), 310-320.

Li, H., Pang, B., & Wu, N. (2018). A hybrid method for breast cancer diagnosis based on feature selection and ensemble learning. Frontiers in Genetics, 9, 597.

Wang, X., Janowczyk, A., Zhou, Y., Thawani, R., Fu, P., Schalper, K., ... & Yao, J. (2020). Prediction of recurrence in early stage non-small cell lung cancer using computer extracted nuclear features from digital H&E images. Scientific Reports, 10(1), 1-12.

Zhu, W., Zeng, N., Wang, N., Yang, Y., & Wu, F. (2017). A review on region-based object detection algorithms. Pattern Recognition, 70, 167-183.


D. Jalan, A. Tuli, V. Chaudhary, N. Sharma and M. Rakhra, "Machine Learning Models for Life Expectancy," 2023 International Conference on Artificial Intelligence and Applications (ICAIA) Alliance Technology Conference (ATCON-1), Bangalore, India, 2023, pp. 1-6, doi: 10.1109/ICAIA57370.2023.10169737. DOI: https://doi.org/10.1109/ICAIA57370.2023.10169737




How to Cite

Sharma S, Sahni M. A Comprehensive Feature Engineering Approach for Breast Cancer Dataset. EAI Endorsed Trans Perv Health Tech [Internet]. 2024 Mar. 7 [cited 2024 Apr. 25];10. Available from: https://publications.eai.eu/index.php/phat/article/view/5327