Detection of Lung and Colon Cancer using Average and Weighted Average Ensemble Models

.


Introduction
Cancer is a collection of illnesses during which the human body experiences random changes in cell structure, leading to the development of abnormal cells.As cells accumulate damage or age, they eventually die off and will be substituted by new ones.These new cells undergo uncontrolled division during their formation and have the ability to spread uncontrollably within the organs.The accuracy of cancer predictions largely relies on clinical data, including factors like identifying the cancer type, molecular characteristics, tumor grades, and other relevant information.In recent times, an increasing variety of data sources have become accessible, enabling more comprehensive assessments of the disease's status and progression.Various diagnostic methods such as MRI scans,biopsies, CT scans, PET scans, and ultrasonography have been widely employed to facilitate early cancer identification, observation, and follow-up after treatment [1][2] [3]. 2 The researchers noted that the diagnosis of synchronous colon cancer in lung cancer patients is quite rare, occurring in only 0.55% of the cases studied.However, they emphasized the importance of considering the possibility of synchronous colon cancer in lung cancer patients, particularly those with advanced disease.Detecting and diagnosing cancer requires a lot of skills and time, especially when making the final decision [4].In such a scenario, machine learning approach or deep learning approach plays a major role in identifying cancer cells more efficiently and help the specialist makes decision faster.
Furthermore, machine learning is employed to examine the patient's symptoms and history to prescribe drugs.Machine learning can also be used in analyzing the side effects of drugs and can also recommend the best path of treatment customized for each patient [5].The use of machine learning approach and deep learning approach in cancer detection, pathologists can reduce the need for expensive screening tests, as the algorithms can analyze images and make accurate diagnoses.So with limited financial resources cancer detection can be affordable for individuals.
Deep learning the sub-field of machine learning is more powerful than machine learning in analyzing images and extracting features from the raw images and videos.This has led to a great growth of AI in the medical domain.The medical profession was able to combat many pandemics with the aid of deep learning algorithms and radiological imaging such as X-rays, CT scans, and MRI images.COVID-19 Padamic is the best illustration.
In this study, we have used pre-trained models in another term called a transfer-learning technique to classify cancer tissues.Transfer Learning approach is the course of gaining from one domain or task in another similar domain or task.It is mainly used when there is data scarcity to build a model or when we need to reduce the computation cost and time of training the model.TL has gained considerable importance and influence within the field of Machine Learning due to its applicability across a wide range of applications [7].
In this study, three different pre-trained models are evaluated, and their predictions are combined using ensemble techniques.When using the transfer-learning approach, it is not necessary to train the model or method from the beginning.Instead, it can make use of the weights learned during the training of the source task to classify the target task.The pre-trained models are downloaded from various libraries along with the weights.The lower layers are frozen, and no training happens in the lower layer.To investigate the higher-level features of the dataset, a few top levels or layers are added.Training time and resources are saved by only training the newly added top layers.This approach is particularly valuable when the target task has limited data available or when the task at hand is related to but distinct from the source task.

Literature Review
Takoglu et al. [8] proposed a machine-learning ensemble model to identify cancer in colon and lung.To extract the advanced features of the dataset, a hybrid ensemble model was used along with efficient filter techniques.The accuracy of the model was 99.05%.
Mehmood et al. [9] used a pre-trained neural network model (AlexNet) with fine-tuning to classify the LC25000 dataset.The initial classification findings showed promising accuracy for the majority of image classes; nevertheless, one class earned an overall accuracy of 89%, showing that there is still space for improvement.To improve the accuracy image-enhancing techniques were applied to underperforming classes instead of the full dataset.This was accomplished through the use of a simple and effective contrast enhancement approach.
Chehade et al. [10] used different machine-learning approaches to identify the types of cancer in colon and lung tissues in the LC25000 dataset of histopathological images.The author used 6 models to classify the images: RF, XGBoost, MLP, LDA, Light GBM, and SVM.The XGBoost model obtained an accuracy level of 99% and Score of F1 is 98.8% when compared to other models.
Mohamed et al. [11] used different pre-trained models like Squeezenet, Resnet-50, AlexNet, GoogleNet to select the deep features from the colon and lung cancer dataset.As for the model training, the extracted features are too huge.In order to decrease the number of features, the author employed a metaheuristic approach.The grasshopper optimization technique is used to choose the most crucial attributes.Finally, the dataset is classified using a machinelearning approach like a decision tree or with the help of support vector machine.The lung and colon cancer datasets were classified with 99.12% accuracy using the support vector approach.
Mengash et al.Raghu et al. [17] analysed the properties of transfer learning for medical imaging.A performance evaluation was conducted on two extensive medical imaging tasks.The results projected that transfer learning does not significantly improve performance, and simple and lightweight models can achieve comparable results to complex IMAGENET architectures.Further exploration of the learned representations and features indicates that some differences in transfer learning performance can be attributed to the over-parametrization of standard models, rather than the sophisticated reuse of features.Indeed, the insight gained from this study sheds light on the critical factors that influence the efficacy of transfer learning in the context of medical imaging.This valuable information can significantly contribute to the improvement and refinement of future approaches in the field.
DCCN have shown tremendous promise for computeraided diagnostic systems (CADs).A key benefit of CNNs is their capacity to extract features directly from images, this resulted in eliminating the need for conventional feature extraction techniques.This property enables CNNs to automatically collect key patterns and representations from input data, making them particularly useful for medical image analysis tasks and improving the performance and accuracy of CAD systems.
Albashish et al. [18] proposed two different types of ensemble learning techniques such as product rule and majority voting.These strategies are intended to categorize colon cancer histopathology images into distinct classifications.The ensembles are produced by modifying pre-trained CNN models such as DenseNet121, MobileNetV2, InceptionV3, and VGG16.These pre-trained models served as the foundation for ensemble learning approaches.
Lava Th.Omar et.al [20] developed an collaborative model with 3 pre-trained models namely Inception V3, VGG16 and MobileNet V1, and obtained the level of accuracy is 99.44%.

Dataset
We imported the dataset from Kaggle which contains the image database of colon and lung cancer ((https://www.kaggle.com/datasets/andrewmvd/lung-andcolon-cancer-histopathological-images).As shown in Figure 1, the images are divided into five categories: 'colon_aca', 'colon_n', 'lung_aca', 'lung_n', and 'lung_scc'.The number of samples in each class is distributed evenly, and the dataset is split into training and testing in the proportions of 80% and 20%, respectively.

Methods and Techniques
Producing and developing a model using either machine learning or deep learning in medical field is very difficult due to the scarcity of training data.The scarcity of medical data can hinder the ability to create highly accurate and reliable models.As a result, researchers often face difficulties in achieving generalization and overfitting issues when dealing with small datasets.Absolutely, transferlearning is indeed a beneficial approach in scenarios where the number of available samples for building machine learning or deep learning models is limited.Transferlearning can dramatically enhance the performance of models on smaller medical datasets by utilizing pre-trained models or characteristics learned from bigger datasets.
The ensemble model in this study is built using three base learners.They are namely Inception V3, ResNet 50, DenseNet 201.These three pre-trained models are regulated by updating with top levels or layers to extract dataset's specific features.Updated new layers alone are practiced recognizing the sophisticated or particular aspects of the dataset, lower layers are frozen, and no training is required as it learns only the basic features.The performance metrics for three pre-trained models are displayed in Tables 2 and 3.

Proposed Ensemble Technique
The prediction of several deep learning models can be combined with ensemble models (EM) to increase performance.When trained on smaller datasets, individual deep-learning models may exhibit considerable bias and variation.Such models could have lower test accuracy and higher training accuracy.Predictions from many models are integrated to get the final prediction in order to avoid this tendency.Base learners are the models that are utilized to build the ensemble model.An ensemble model was constructed in this paper employing pre-trained representations like ResNet50, InceptionV3 and DenseNet201.Two ensemble techniques are used in this study: The Average Ensemble (AE) and the Weighted-Average Ensemble (WAE).The predictions of separate models are integrated into an Average Ensemble (AE) by simply taking the average of all the base models.In a Weighted Average Ensemble (WAE), the highly performing base learner is given greater weightage than the other base learners.Table 4 shows the performance metrics for the AE and WAE models.The WA model attained a high level of accuracy of 99.80% by using the knowledge from various models, as in the following Table 4.This highlights the effectiveness of transfer learning in improving model performance, particularly when working with imbalanced datasets or small sample sizes per class.

Results
Using Python 3 and a backend with Google Compute Engine (GPU -A100 and size of GPU RAM is 40GB), the experiment was run in Google Colab Pro.The WAE model summary is shown in Figure 3.After five training epochs,

Conclusion
We suggested an ensemble approach for the identifying and classifying the lung and colon cancer datasets in this research.The ensemble model is constructed with the DenseNet201, InceptionV3, and ResNet50.AE and WAE are used to integrate the calculation of the base learners.AE involves combining for estimation and taking an average to determine the final prediction.Whereas in WAE, each base learner is assigned a weight based on its performance.Base learners with higher accuracy are given more weight than other models.The findings of the experiment reveal that the WAE outperforms the AE.The AE achieved 98.66% accuracy, while the WAE achieved 99.80% which is better than all the existing models.

EAI
Endorsed Transactions on Pervasive Health and Technology | Volume 10 | 2024 | H. Gunasekaran et al.
[12] used a CLAHE-based preprocessing method to improve the contrast of the image.The author used a pre-trained model such as MobileNet to select the structures and used a Deep Belief Network (DBN) for the grouping and alignment, which resulted in an accuracy of 99.21%.Abdullah et al.[13] used a combination of different types of machine learning methods then deep learning techniques to group the lung and colon cancer dataset.The author used KNN, CNN and SVM in the WEKA tool for classification.SVM resulted in 95.56%, CNN resulted in 92.11% and KNN resulted in 88.40%.Hemalatha et al.[14] used an ensemble technique to detect the gastro-intestinal disorder.The author used pretrained models to create an ensemble model to classify the KVAISR dataset.The author concluded that the weighted average ensemble model outperforms the average ensemble model.Ashwin Shanbhag et al.[15] used CT images to detect carcinoma as either benign or malignant categories.The author used five models of machine learning such as LR, MLP, SVM, KNN and decision tree.The pre-processing techniques like segmentation were applied before feature EAI Endorsed Transactions on Pervasive Health and Technology | Volume 10 | 2024 | extraction and finally, classification was done.The author also created an ensemble model with the five machinelearning models and achieved an accuracy of 85%.Hee E. Kim et al. [16] conducted a wide range of surveys on transfer learning models.He analysed many PubMed, and Web of Science papers on transfer learning techniques and concluded that pre-trained models like ResNet and Inception can be used as feature extractors to save computation time and cost.They compiled a list of publicly available medical datasets along with their URLs, facilitating accessibility and reproducibility of the research.

Figure. 2
Figure. 2 Architecture of Proposed Ensemble Model

Table 2 .
Accuracy of Base Learners and Ensemble Models

Table 3 .
Execution Measurement of different Pre-trained Methods

Table 4 .
Accuracy of Ensemble Models

Table 5
displays the performance metrics for the AE and WAE models.The outcome of the study shows that the EM beats with the accuracy of pre trained models.The accuracy level of AE is 98.66%, whereas that of the WAE is 99.80%.EAI Endorsed Transactions on Pervasive Health and Technology | Volume 10 | 2024 |

Table 5 .
Performance Metrics of Ensemble Model