Modified Filter Based Feature Selection Technique for Dermatology Dataset Using Beetle Swarm Optimization
DOI:
https://doi.org/10.4108/eetsis.vi.1998Keywords:
Skin cancer, Feature selection algorithm, LSI, CFS, Beetle swarm optimization, Classification performanceAbstract
INTRODUCTION: Skin cancer is an emerging disease all over the world which causes a huge mortality. To detect skin cancer at an early stage, computer aided systems is designed. The most crucial step in it is the feature selection process because of its greater impact on classification performance. Various feature selection algorithms were designed previously to find the relevant features from a set of attributes. Yet, there arise challenges in selecting appropriate features from datasets related to disease prediction.
OBJECTIVES: To design a hybrid feature selection algorithm for selecting relevant feature subspace from dermatology datasets.
METHODS: The hybrid feature selection algorithm is designed by integrating the Latent Semantic Index (LSI) along with correlation-based Feature Selection (CFS). To achieve an optimal selection of feature subset, beetle swarm optimization is used.
RESULTS: Statistical metrics such as accuracy, specificity, recall, F1 score and MCC are calculated.
CONCLUSION: The accuracy and sensitivity value obtained is 95% and 92%.
References
Li Y, Li T, Liu H. Recent advances in feature selection and its applications. Knowledge and Information Systems, 2017, 53(3):551-77.
Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH. Relief-based feature selection: Introduction and review. Journal of biomedical informatics, 2018, 85:189-203.
Chen K, Zhou FY, Yuan XF. Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection. Expert Systems with Applications, 2019, 128:140-56.
Selvakumar B, Muneeswaran K. Firefly algorithm based feature selection for network intrusion detection. Computers & Security, 2019, 81:148-55.
Rao H, Shi X, Rodrigue AK, Feng J, Xia Y, Elhoseny M, Yuan X, Gu L. Feature selection based on artificial bee colony and gradient boosting decision tree. Applied Soft Computing, 2019, 74:634-42.
Brezočnik L, Fister I, Podgorelec V. Swarm intelligence algorithms for feature selection: a review. Applied Sciences, 2018, 8(9):1521.
Bayati H, Dowlatshahi MB, Paniri M. MLPSO: a filter multi-label feature selection based on particle swarm optimization. In 2020 25th International Computer Conference, Jan 1 Computer Society of Iran (CSICC) IEEE 2020. pp. 1-6.
Li M, Wang H, Yang L, Liang Y, Shang Z, Wan H. Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction. Expert Systems with Applications, 2020, 150:113277.
Gokulnath CB, Shantharajah SP. An optimized feature selection based on genetic approach and support vector machine for heart disease. Cluster Computing, 2019, 22(6):14777-87.
Abdel-Basset M, El-Shahat D, El-henawy I, de Albuquerque VH, Mirjalili S. A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Systems with Applications, 2020, 139:112824.
Maldonado S, Bravo C, López J, Pérez J. Integrated framework for profit-based feature selection and SVM classification in credit scoring. Decision Support Systems, 2017, 104:113-21.
Ali M, Aittokallio T. Machine learning and feature selection for drug response prediction in precision oncology applications. Biophysical reviews, 2019, 11(1):31-9.
Jadhav S, He H, Jenkins K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Applied Soft Computing, 2018, 69:541-53.
Moslehi F, Haeri A. An evolutionary computation-based approach for feature selection. Journal of Ambient Intelligence and Humanized Computing, 2020, 11(9):3757-69.
Pashaei E, Aydin N. Binary black hole algorithm for feature selection and classification on biological data. Applied Soft Computing, 2017, 56:94-106.
Zhou, H., Zhang, J., Zhou, Y., Guo, X., & Ma, Y. (2021). A feature selection algorithm of decision tree based on feature weight. Expert Systems with Applications, 164, 113842.
Hosseini, S., & Seilani, H. (2021). Anomaly process detection using negative selection algorithm and classification techniques. Evolving Systems, 12(3), 769-778.
Liu, S., Wang, H., Peng, W., & Yao, W. (2022). A surrogate-assisted evolutionary feature selection algorithm with parallel random grouping for high-dimensional classification. IEEE Transactions on Evolutionary Computation.
Abinash MJ, Vasudevan V. A study on wrapper-based feature selection algorithm for leukemia dataset. InIntelligent Engineering Informatics 2018 (pp. 311-321), Springer, Singapore.
Abualigah L, Dulaimi AJ. A novel feature selection method for data mining tasks using hybrid sine cosine algorithm and genetic algorithm. Cluster Computing, 2021, 24(3):2161-76.
Lyu H, Wan M, Han J, Liu R, Wang C. A filter feature selection method based on the Maximal Information Coefficient and Gram-Schmidt Orthogonalization for biomedical data mining. Computers in biology and medicine, 2017, 89:264-74.
Chormunge S, Jena S. Correlation based feature selection with clustering for high dimensional data. Journal of Electrical Systems and Information Technology, 2018, 5(3):542-9.
Jain I, Jain VK, Jain R. Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Applied Soft Computing, 2018, 62:203-15.
Omuya, E. O., Okeyo, G. O., & Kimwele, M. W. (2021). Feature selection for classification using principal component analysis and information gain. Expert Systems with Applications, 174, 114765.
Sivaranjani, S., Ananya, S., Aravinth, J., & Karthika, R. (2021, March). Diabetes prediction using machine learning algorithms with feature selection and dimensionality reduction. In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) (Vol. 1, pp. 141-146). IEEE.
Tawhid, M., Ahad, N., Siuly, S., Wang, K., & Wang, H. (2021, October). Data Mining Based Artificial Intelligent Technique for Identifying Abnormalities from Brain Signal Data. In International Conference on Web Information Systems Engineering (pp. 198-206). Springer, Cham.
Jenghara MM, Ebrahimpour-Komleh H, Rezaie V, Nejatian S, Parvin H, Yusof SK. Imputing missing value through ensemble concept based on statistical measures. Knowledge and Information Systems. 2018, 56(1):123-39.
Jain S, Shukla S, Wadhvani R. Dynamic selection of normalization techniques using data complexity measures. Expert Systems with Applications. 2018, 106:252-62.
Adinugroho S, Sari YA, Fauzi MA, Adikara PP. Optimizing K-means text document clustering using latent semantic indexing and pillar algorithm. In2017 5th international symposium on computational and business intelligence (ISCBI) 2017 Aug 11 (pp. 81-85), IEEE.
Khaire UM, Dhanalakshmi R. Stability of feature selection algorithm: A review. Journal of King Saud University-Computer and Information Sciences, 2019 Jun 25.
Ma J, Gao X. A filter-based feature construction and feature selection approach for classification using Genetic Programming. Knowledge-Based Systems, 2020, 196:105806.
Abinash MJ, Vasudevan V. A study on wrapper-based feature selection algorithm for leukemia dataset. InIntelligent Engineering Informatics 2018 (pp. 311-321), Springer, Singapore.
Albashish D, Hammouri AI, Braik M, Atwan J, Sahran S. Binary biogeography-based optimization based SVM-RFE for feature selection. Applied Soft Computing, 2021, 101:107026.
Wang T, Yang L. Beetle swarm optimization algorithm: Theory and application. arXiv preprint arXiv:1808.00206, 2018 Aug 1.
Wang L, Wu Q, Lin F, Li S, Chen D. A new trajectory-planning beetle swarm optimization algorithm for trajectory planning of robot manipulators. IEEE access, 2019, 7:154331-45.
Cunha CF, Carvalho AT, Petraglia MR, Amorim HP, Lima AC. Proposal of a novel fitness function for evaluation of wavelet shrinkage parameters on partial discharge denoising. IET Science, Measurement & Technology, 2018, 12(2):283-9.
Xue JH, Hall P. Why does rebalancing class-unbalanced data improve AUC for linear discriminant analysis?. IEEE transactions on pattern analysis and machine intelligence, 2014, 37(5):1109-12.
Dataset Link: https://archive.ics.uci.edu/ml/datasets/dermatology
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 EAI Endorsed Transactions on Scalable Information Systems
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.