DeepCerviCancer - Deep Learning-Based Cervical Image Classification using Colposcopy and Cytology Images

.


Introduction
Cervical cancer is a term that describes an instance of malignancy that originates in the cervix, the bottom portion of the uterus that binds the vagina.Human papillomavirus is the primary causative agent [1,2,3].The outer layer of the cervix is lined with squamous (thin, flat) cells, and these can transform into squamous cell carcinoma, which is one of the two elementary kinds of cervical cancer.Adenocarcinoma, on the other hand, develops in the glandular (column-shaped) cells lining the cervical canal [1].
Among female malignancies, cervical cancer has a high incidence rate.The prevalence of cervical cancer among women is a stark illustration of the effects of global health disparity.Cervical cancer affects over 5.6 billion women worldwide every year and it has a 90% fatality rate.The research projects an annual increase of over 300,000 deaths and over 500,000 new cases.And the study indicates that 85% of those deaths occurred in developing countries [4].More than 700 women die every day from cervical cancer, and that number is expected to rise to an astounding 400,000 by the year 2030.More than 266,000 women lost their lives to cervical cancer in 2012.More than 311,000 women aged 20-39 lost their lives to cervical cancer last year.Each year, over 0.025 million females in Europe lose their lives to cervical cancer, and over 0.060 million new cases are identified [5].Every year, 0.12 million Indian females are diagnosed with cervical cancer, this constitutes 15.2% of all cancer-related deaths in the nation of India [6,7].
In most instances, the cancer of the cervix was detected too late because of a lack of awareness.Cervical abnormalities, which may eventually lead to cervical cancer, progress very gradually.Cervical cancer is difficult to detect early since there are few warning signs.Patients in developing countries are less likely to get regular tests due to a lack of awareness about the need of doing so.Contrary to expectations, high-income countries with well-established cervical screening programs had the lowest death rates [4,8,9].With sufficient quality and quantity of screenings, the incidence of cervical cancer might be lowered by as much as 90% [10].
Cervical cancer screening and diagnosis are hindered by inadequate healthcare infrastructure in low and middle-income nations.Cervical cancer is curable if recognized early however, detection rates are low because of barriers including high testing costs and inadequate accessibility.To detect precancerous cells, the Pap smear test is presently the gold standard.The Pap smear test is important, but it takes a long time to acquire the results.Pathologists must analyze hundreds of cells on a single slide and label them all correctly [11].The Pap smear test has been in use since the 1940s, yet it still has some serious drawbacks.Problems at any point in the examination process might introduce errors, increasing the likelihood of both false positives (in which a lesion is incorrectly identified) and false negatives (in which an existing lesion is not discovered).Cytological sample collection through lesion analysis is an example of these steps.The percentage of mistakes made during hand microscopic inspection of smears might reach 0.62 [12][13][14][15].
Cervical cancer cells are readily detectable and testable in their early stages thanks to screening techniques.Tests for early detection are discussed in the next paragraph.Cells from the cervix of the uterus are examined under a microscope in a procedure known as a Pap smear.A Pap smear involves the doctor inserting brushes within the cervix with the goal of collecting cells.In the laboratory, more cells are examined for issues.This test is very accurate and reliably differentiates between cervix cells that are normal and those that aren't.Human Papillomavirus (HPV) test: The infected cervix of the uterus is sampled for this test.Women over the age of 30 are encouraged to do this test, whereas younger women should get a Pap smear instead.Diagnosis is the process of identifying the ailment that is causing the symptoms a patient is experiencing.
Colposcopy (a magnifying instrument) aids in the analysis of cervical cancer cells by a physician.During a diagnostic clinical technique called a punch biopsy, a tiny sample of skin and underlying tissue is taken away and scrutinized under a microscope.Using this technique, malignant cells in the study region may be inspected and analyzed.A curette, fashioned like a spoon, is used in endocervical curettage (ECC) to scrape away the cervix mucosa.The following tests may be performed if the infection found during the punch biopsy or ECC requires further investigation.Electrical wire loop: This procedure includes extracting a small tissue sample using a low-voltage electrical cable that is tenuous and stretchy.This is often carried out in the office under local anesthetic.Cone biopsy: A cone biopsy allows a trained healthcare professional to acquire cervical tissue from the interior layers of the cervix for analysis in the laboratory [16].
Rapid advancements in artificial intelligence and machine learning, as well as digital platforms, have led to unprecedented growth in the healthcare sector during the last decade [17].Deep learning (DL) is one of the most promising disciplines because of the many possibilities it presents in the medical field.The use of DL has the potential to improve healthcare by allowing doctors to make more precise and timely diagnosis that may then inform more targeted treatment plans for each patient.Significant progress in medical image analysis for the detection and diagnosis of many forms of cancer has also been proven by DL, allowing for speedier and more effective treatment of patients.Improved patient care is the outcome of this technology's decreased dependence on physicians while still delivering timely and accurate findings to patients.
Multiple advances, including AI, have been put into practice out of a desire to improve clinical care's efficiency and effectiveness.The need to optimize and streamline clinical procedures has grown in importance in light of rising healthcare service demands and the massive amounts of data being created every day from several concurrent sources.Artificial intelligence's strength is in its ability to spot intricate patterns in pictures, which presents a unique chance to make the previously qualitative and subjective process of image interpretation into a measurable and easily repeatable one.Artificial intelligence has the potential to improve how diagnoses are made by doctors by gleaning hidden details from images.Medical imaging, genetics, pathology, electronic health records (EHRs), and social media might all benefit from the incorporation of artificial intelligence into more efficient diagnostic systems [18].
In this paper, the authors proposed an ensemble approach to improve the classification accuracy of abnormal and normal classes for colposcopy and pap screening images.The approach involved training two separate models, M1 and M2, on different datasets of colposcopy and Pap screening images.These models were then used in an ensemble approach where M1 took a colposcopy image of a patient and M2 took a Pap smear image of the same patient at the same time.The ensemble approach utilized the probabilities of abnormal and normal classes for both images, which were obtained by the models, and used them as features for machine learning approaches.This unique approach, using images of exactly the same patient for both colposcopy and pap screening, sets this research apart.
The following aspects distinguish this research from previous work, making it novel and valuable in terms of its contributions.
❖ The integration of colposcopic and cytology images into an AI system for cervical cancer detection marks a remarkable milestone in the field.This approach offers a more comprehensive and rigorous method for identifying potential cancer cases in patients.By leveraging multiple imaging modalities, the system can produce more accurate and reliable results.
❖ The Models were trained using numerous datasets, which resulted in a more comprehensive knowledge base and better performance when detecting cancer in unseen pictures.
❖ Real-world datasets from a hospital in Assam, India were collected and utilized to account for biological or physiological adaptations, which we named as Malhari.This approach ensured that the system was trained on relevant data that would lead to a more accurate and effective cervical cancer detection model.
❖ The ensemble model was subjected to a performance evaluation using colposcopy screening and cytology screening images sourced from the same patients, representing a real-world use case scenario.This approach deviates from the norm of using random images of the same category, as it offers a more reliable and relevant evaluation of the AI system's efficacy in detecting cervical cancer.
❖ According to the study's findings, the ensemble approach employed in this research achieved a perfect score of 100% for accuracy, recall, f1 score, and precision.
❖ Given the significant threat that cervical cancer poses to developing countries, where the availability of doctors is often limited, the development of AI solutions has become crucial in addressing this issue.To this end, the dataset used in this study, collected from real patients, has been made available for further research.
The subsequent article is divided into many sections.The data sets that were used are described in the "Datasets" section.The "Proposed System" section explains how the researchers implemented the ensemble approach.In the next part, it describes the results and analyses.Recent studies and advancements in this field are summarized in the "related work" section.

Related Work
Table 1 shows the summaries of the different papers studied and analyzed .

Dataset:
The section that follows offers a detailed summary of all the datasets utilized in this investigation.The datasets were carefully chosen for their relevance to the study questions and aims, and they came from reliable sources to assure their quality and trustworthiness.The IARC Colposcopy Image Dataset is a collection of 1313 abnormal and normal high-quality cervical cancer screening images from two datasets, the IARC Visual Inspection with Acetic Acid (VIA) Image Bank and the IARC Colposcopy Image Bank.In the following manuscript, this dataset is referred to as D1.

The IARC Visual Inspection with Acetic Acid (VIA) Image Bank
This dataset is a valuable resource for cervical cancer screening and research [32].The database includes over 408 cervical images with various attributes, including the visibility and location of the transformation zone, features of the Acetowhite region and its existence, and the results of VIA screening.These attributes provide important information for healthcare professionals to identify cervical abnormalities and determine the appropriate management.The acetowhite area, in particular, is a significant indicator of cervical abnormalities and can vary in color, margin, surface, location, and size.The database comprises data regarding the nature of the conducted tests, including pre-or postapplication of Lugol's iodine same for acetic acid, as well as qualification for ablative treatment when relevant.Moreover, the images are annotated with histological results, spanning from normal to different levels of cervical cancer.Figure 1 shows the folder of the IARC image bank VIA datasets.

The IARC Colposcopy Image Bank
This dataset is a valuable resource for researchers and medical professionals studying cervical cancer [33].This dataset contains colposcopy screening images along with metadata files.Metadata for colposcopy images included in the Image Bank dataset in , including information about the patient, imaging equipment, and image characteristics.The images are annotated by expert colposcopists, providing a standardized and reliable dataset for research and clinical applications.The availability of this dataset can contribute to the development of improved screening methods, diagnostic tools, and treatment options for cervical cancer, ultimately leading to better outcomes for patients.Researchers can use this metadata to analyze the images and identify patterns or features that may be useful for diagnosis, developing smart deep learning screening solutions and treatment planning.Figure 2 shows the folder of the IARC image bank colpo datasets.The cases metadata file contains following attributes 1. HPV (human papillomavirus) Status: is a sexually transmitted virus that is known to cause nearly all cases of cervical cancer.The virus can infect the cells of the cervix, causing abnormal changes that may lead to cancer over time.Therefore, HPV testing is a crucial part of cervical cancer screening and detection.2. Provisional diagnosis: Based on the results of these tests, a healthcare provider may make a provisional diagnosis of cervical cancer, which may include information such as the type of cancer, the stage of cancer, and the degree of malignancy.Image Separation Logic 1. IARC Image Bank VIA Based on the observations noted in the VIA column of the cases metadata file, if diagnosis in the VIA column is negative then that image is considered as normal image else abnormal.

IARC Image bank colposcopy dataset
Based on the observations noted in the Provisional Diagnosis column of the cases metadata file, certain diagnoses have been identified as normal, and as a result, the corresponding images have been categorized as normal.
The following observations are considered as normal diagnosis and the corresponding image is considered as normal image or else abnormal.The above Figure 4 displays a subset of a dataset, wherein the first two images represent negative cases, and the last two images represent positive cases of colposcopy screening pictures.

Liquid based-cytology Pap smear dataset (Dataset-II)
The Gauhati Medical College and Hospital's LBC (Liquid-Based Cytology) image collection is included in the dataset [34].A cone-shaped brush is used to extract the target sample from the transformation zone, and the sample is then stored in a container with additive fluid to remove detritus.The samples are then layered, sedimented, and centrifuged at 2500 rpm for 5 minutes, then stained with hematoxylin and eosin before being prepared on slides.Each slide was photographed at the smear level using a Leica microscope so that cellular features could be identified.The 10 best images from each slide, together with the patient's medical history, were uploaded to an Excel file.Based on the patient's account and subsequent examination by a pathologist, the pictures were categorized as NILM, LSIL, HSIL, and SSC.The pictures are 2048 pixels wide by 1536 pixels high.Table 3 shows the Image distribution of the dataset.Figures 5 and 6 show the sample images from the LBC dataset in normal and abnormal classes.In the following manuscript, this dataset is referred to as D2.

Malhari dataset (Dataset-III)
We have collected the LBC and Colposcopy images of the same patient from Asam Hospital in India and named it as Malhari dataset.Patients permitted the hospital to release their data for research and development purposes under a strict confidentiality agreement.Important data from the dataset will be utilized in the study, and all necessary precautions will be taken to protect the privacy of the patient's information.
The dataset includes information from 32 patients, with each patient having four images captured from colposcopy, as well as 10 image patches obtained from a single Pap test image (In most of the cases).To our knowledge, no prior study has utilized both colposcopy and Pap smear images from the same set of patients, making this dataset a unique resource for research and analysis in this field.Table 4 shows the overview of the Malhari dataset.In the following manuscript, this dataset is referred to as D3.The above Figure 8 displays a subset of a dataset, wherein the first two images represent negative cases, and the last two images represent positive cases of a pap smear.

Proposed Method
This section explains how the final ensemble model was implemented to predict whether a particular patient has cervical cancer or not, using both colposcopy and Pap screening images of the same patient.This paper refers to the DeepColpo model as M1, which has been developed to classify colposcopy images into normal or abnormal categories with high accuracy.Similarly, the DeepCyto+ model is referred to as M2, and has been created to accurately classify cytology images into normal or abnormal categories.
The section is divided into various sub-sections, where Subsection 4.1 explains the architecture of model M1, Subsection 4.2 describes the architecture of model M2, Subsection 4.3 describes the training process of models, and Subsection 4.4 will explain the ensemble approach using machine learning.Starting with the architecture of individual models and working our way up to the final ensemble model, this section strives to explain everything in as much detail as possible.

Architecture of M1
The M1 architecture has been designed to accurately classify abnormal and normal images obtained through colposcopy screening tests.To enhance its ability, the architecture has undergone training using diverse datasets, in order to acquire a robust knowledge base and improve its sensitivity in detecting abnormalities.This approach ensures that the M1 architecture can effectively differentiate between normal and abnormal colposcopy images, thereby aiding in the early detection and diagnosis of potential health issues.Figure 9 shows the architecture of model M1.The model (M1) comprises an input layer with a shape of (None, 224, 224, and 3) to accept input images with a size of 224x224 and 3 color channels (RGB).A 2D convolutional layer (Conv2D) is then applied, using 32 filters and a kernel size of 3x3, followed by batch normalization and an activation function to introduce non-linearity.The output's spatial dimensions are then cut in half through a downsampling layer with a pool size of 2x2.Over fitting may be avoided by including a dropout layer with a rate of 0. The performance of the M1 architecture has been found to be superior to that of other well-known architectures, despite its comparatively simple design and lower number of layers.This is a notable achievement, given the complexity of the dataset involved in classifying abnormal and normal colposcopy images.By achieving better results with a simpler architecture, the M1 model is a testament to the efficacy of its design and training approach.

Architecture of M2
The M2 model has been specifically developed and trained to classify abnormal and normal Pap smear screening images with a focus on achieving robust classification performance.To achieve this, the model has been trained using various datasets, including D2 and D3.Handling biological or physiological adaptations is crucial to ensuring accurate classification of images.For this reason, data from Asam Hospital has been collected under a confidentiality agreement with patients.As this dataset is derived from the Indian region, it represents a more realistic real-world scenario, accounting for the impact of regional environments and eating habits on internal body features, which are commonly referred to as "biological or physiological adaptations."Figure 10 shows the architecture of model M2.Specifically, M2 is a deep neural network model trained to distinguish between abnormal and normal Pap smear images.It has many layers: five convolutional layers with ReLU as an activation function, batch normalization layers, and a maximum pooling layer.The initial step in feature extraction involves running the input image through a sequence of convolutional layers.Training stability is increased, and convergence is sped up with the aid of batch normalization layers, which standardize the activations of preceding layers.To do this, we employ max pooling layers to compress the feature maps' spatial dimensions while maintaining their essential features.
For the final classification, the convolutional layers' output is flattened and sent through numerous fully linked layers (dense layers).The last dense layer has a 2 softmax activation function neuron that produces a probability score that corresponds to the input image's class label.
Because of its enormous number of trainable parameters (59 million), M2 is able to learn complicated patterns and achieve high accuracy on tough image classification tasks, such as differentiating between abnormal and normal pap smear images.
The M2 model has achieved an impressive 100% accuracy in testing and very good accuracy in training, despite its simple architecture.This level of accuracy indicates that the model is capable of generalizing well and handling real-world scenarios.Consequently, it is highly suitable for use in real-world applications.The detailed training process of the M2 model will be explained in the next section.

Process of Model Training
This section explains how the models M1 and M2 were trained to be used in an ensemble approach, whereby the same patient's colposcopy and Pap images are passed through both models, and a prediction is generated.Details about this process will be explained as follows.
Throughout the training and testing phase, the data was divided into three sets.The training set was used to train the models, the validation set was used to fine-tune the models and prevent over fitting, and the testing set was used to evaluate the performance of the trained models.
To avoid bias in the selection of training, validation, and testing sets, the data was split in such a way that each set contained a representative sample of both abnormal and normal images.
Step 1: Training of Model M1 on Dataset D1 In the initial part of our research, authors trained Model M1 on Dataset D1 using several activation functions.The data has been divided into three separate sets, to facilitate the assessment and contrast of various models' efficacy during both the training and testing phases.With specified proportions, as shown in the table below.The findings of these tests are reported in the results section for future reference and comparison.The results section also discusses the training and performance of well-known CNN architectures tested on the same dataset (D1) with the same proportion using both transfer learning and new-from-scratch methodologies.
Table 5 shows the distribution of images across various sets during the training and testing of the model M1 on dataset D1.Step The dataset was split into three subsets with a 7:2:1 as shown in the following table.The results are outlined in the next section.
Table 6 shows the distribution of images across various sets during the training and testing of the model M2 on dataset D2.Step 3: Fine tuning of Model M1 and M2 Using Dataset D3 After training on the D1 and D2 datasets, Models M1 and M2 were fine-tuned on the D3 dataset, which included both colposcopy and pap smear images of the same patients.This dataset was deemed critical as it reflects real-world scenarios and can provide valuable insights into the performance of the models.Colposcopy images were utilized for Model M1's fine-tuning, whereas pap smear images were utilized for Model M2's.The dataset comprises images with varying grades of cervical intraepithelial neoplasia (CIN), where CIN1 is considered normal and CIN2 and CIN3 are categorized as abnormal.Images with NILM are regarded as normal for Pap smear tests, while any other classification is regarded as abnormal.This approach of fine-tuning on a diverse dataset with different types of images allowed the models to learn and generalize better on unseen data, potentially improving the models' diagnostic accuracy in identifying cervical lesions.Model M1 with the ELU activation function was chosen for further tuning because its performance on Dataset D1 was superior to that of other activation functions.Table 7 shows the distribution of images in dataset D3.Table 7 shows the distribution of images across various sets during the fine tuning and testing of the model M1 on dataset D3 colposcopy only.Table 8 shows the distribution of images across various sets during the fine tuning and testing of the model M2 on dataset D3 Pap smear only.

Ensemble Approach Using Machine Learning
To ensure a more realistic and clinically relevant evaluation of the ensemble approach, the testing process involves using screening images from the same patient for both colposcopy and pap smear screenings.This means that the same colposcopy and pap smear images from a particular patient are used in the evaluation, which helps to simulate a real-world scenario and better assess the performance of the ensemble model.The aim of the ensemble approach is to leverage the strengths of two models, M1 and M2, which are trained and tested on different datasets, to acquire robust knowledge for the classification of abnormal and normal colposcopy and pap smear screening images, respectively.By combining the outputs of both models, the ensemble approach seeks.

Figure 11. Architecture of Proposed Model
The accuracy and reliability of the classification results.
To combine the outputs of models M1 and M2, the probabilities of the output classes are taken and used as features that are then passed as input to various machine learning algorithms, such as SVM, random forest, decision tree, KNN, LDA, and naive Bayes.A certain portion of the data is used for training, and the rest is used for testing.The performance of the ensemble approach is evaluated in the result section, which provides insights into the accuracy and robustness of the model.
Figure 11 shows the architecture of the proposed method.
The dataset provided by D3 consists of records for 32 patients.However, for the purposes of this particular study, only data pertaining to 23 patients was utilized.Specifically, only those patients for whom both cytology and colposcopy images belonged to the same category were included in the analysis.For the purpose of this testing, pairs of identical patients' colposcopy and pap screening pictures are chosen from dataset D3, which was included in the testing set despite the fact that these photos have never been seen before.In order to create a greater number of pairings, various permutations and combinations are constructed.A total of 136 pairs were generated, of which 41 were used for testing and 95 for training.Table 9 shows the distribution of pairs used for ML algorithms.The accuracy of the Resnet50 model is slightly better than the other two models, which indicates that it is better at correctly identifying the positive and negative classes from the dataset.But performance is much low as compared to model M1 with elu activation function.
Table 12 presents the performance metrics of well-known models trained on dataset D1 using transfer learning and tested on portions of dataset D3.These architectures had a single neuron in the output layer with a sigmoid function.From the analysis of the results, it appears that combining two different datasets from the IARC for colposcopy screening has led to an issue of interclass dissimilarity.This could be due to the usage of different imaging methods for capturing images of the cervix, resulting in the combination of various types of cervical images.As a consequence, even after training different neural network architectures on this merged dataset, the training and validation accuracy remained at 50%, while the loss decreased.This indicates that the models were unable to differentiate between the different classes, indicating the presence of an interclass dissimilarity problem.Furthermore, it was found that the performance of the models differed based on their complexity, and the model with fewer layers (m1) performed better than others on the same dataset and with the same settings.After training on the Pap smear image dataset, the model's performance was outstanding, high marks on all relevant performance metrics bear this out.During the 15 epochs of training, with an optimizer momentum of 0.9, the authors employed Stochastic Gradient Descent (SGD).The training accuracy achieved was 95.39%, while the validation accuracy was 98.39%.These results demonstrate that the model is capable of effectively identifying and learning the unique features of the Pap smear image dataset.Table 13 shows the results of the finetuned model M2 on dataset D3.
The model was also tested on a separate unseen dataset of Pap smear images and achieved 100% accuracy: precision, recall, and F1 score.This indicates that the model is capable of generalizing well on unseen data and could be a promising approach for assisting medical professionals in the accurate identification of cervical lesions.

The results obtained after fine-tuning Model M1 and M2 on Dataset D3 (Training Step 3)
The following Table 14 represents the fine-tuning results of Model M1 on Dataset D3 using colposcopy images, along with various performance metrics.The model M1 with the ELU activation function performed well on the D1 dataset, and therefore it was selected for fine-tuning on the ASAM Hospital images.Through fine-tuning, the model gained knowledge and was able to classify images more accurately as it was trained on a new dataset.However, compared to the D1 dataset, the ASAM Hospital dataset had fewer images, which limited the model's learning capacity.As a result, the accuracy was not very impressive, as the model could only update its weights based on the limited information it had available.The model was able to attain notable metrics of accuracy, precision, recall, and F1 score, specifically scoring 63.64, 66.88, 63.64, and 63.03, respectively.
The following Table 15 represents the fine-tuning results of Model M2 on Dataset D3 using Pap smear images along with various performance metrics.Based on the results obtained from fine-tuning model M2 on the D3 dataset comprising pap smear images, it is evident that the model performed well during the finetuning process.This is because both the D2 dataset and the Pap smear portion of D3 dataset contain similar images that generated the same kind of features.Moreover, the fact that model M2 had already achieved 100% testing accuracy on the testing set suggests that it was well-generalized.Using the same type of data for fine-tuning the model resulted in the same features that were learned by the model during its training on the D2 dataset.Therefore, the model was able to accept the new dataset with just 15 epochs and achieved 100% testing accuracy,precision,recall and F1 score.On the other hand, the D1 dataset comprising colposcopy images and the D3 dataset containing colposcopy images exhibited some feature differences due to the variations in instruments, biological factors, and adaptations.

Result of Ensemble Approach
Probabilities of the output classes are extracted and used as features in machine learning methods that incorporate LDA, SVM, random forests, decision trees, and naive Bayes to combine the findings of models M1 and M2.The tabular data below displays the outcomes of the aforementioned experiments.The result of the experiment is given in the following Table 16.On the other hand, LDA and Naive Bayes performed poorly compared to KNN, which obtained perfect accuracy on both the training and testing sets.KNN was the only algorithm to get a perfect score in both accuracy,precision, F1 score and recall.Overall, the ensemble method performed quite well when comparing normal and abnormal Pap smear and colposcopy pictures.

Conclusion
The DeepColpo (M1) model was trained from scratch using a combination of the IARC colposcopy VIA dataset and IARC Colposcopy Image Bank, which were merged into a single dataset D1.The elu activation function was utilized during the training process, and the resulting test results showed an accuracy, precision, recall, and f1 score of 72.18%, 73.07%, 72.18%, and 72.29%, respectively.It was found that the interclass dissimilarity problem was the root cause of the poor performance of the model.This was attributed to including images taken using different approaches and methodologies.The DeepCyto+ model was trained from scratch on a Liquid-based Cytology Pap smear dataset (D2).The resulting test results showed 100% accuracy, precision, recall, and f1 score, indicating that the model could generalize well.This suggests that the DeepCyto+ model was able to learn and accurately predict new data beyond the training data, demonstrating its potential utility in real-world scenarios.
But to make them robust to handle real-world problems and biological or physical adaptation, real data is collected for fine-tuning the weights of both models.After fine-tuning DeepColpo (M1) achieved the following test result for accuracy, precision, recall, and f1 score of 63.64%, 66.68%, 63.64%, and 63.04%, respectively.Whereas DeepCyto+ retains its perfect score in all matrices because the Liquid-based Cytology Pap smear dataset D2 and D3 (LBC/ pap smear portion) are collected from the same geographical location, most probably using the same approach and methodology.Whereas dataset D1 and the colposcopy portion of D3 were collected from different geographical regions, different methodologies were used to collect samples.
To overcome the limitations of DeepColpo's versatility and DeepCyto+'s lack of robustness, an ensemble approach was adopted that leveraged various machine learning algorithms such as SVM, Decision Tree, Random Forest, KNN, LDA, and Naive Bayes to combine the strengths of both models.After rigorous evaluation, KNN was identified as the optimal algorithm that provided the best results.
The ensemble model was designed by utilizing DeepColpo's multi-dataset and multi-method training, which provided a diverse knowledge base, and DeepCyto+'s high accuracy on a specific image acquisition method, which enabled it to contribute highly reliable predictions.As a result, the ensemble model exhibited improved robustness and accuracy, making it more suitable for handling diverse image datasets.
To assess the ensemble model's real-world applicability, it was tested on colposcopy and cytology images from the same patient.Impressively, the model achieved perfect scores across all evaluation metrics on KNN, demonstrating its exceptional performance in accurately analyzing and diagnosing these images.This highlights the ensemble model's potential as a valuable tool for the clinical diagnosis and management of patients with cervical cancer.

3 .
Histopathology: Histopathology refers to the examination of tissue samples under a microscope to detect abnormalities, such as cancer or precancerous changes.4. Type: elaborate image category a) After Lugol's iodine b) After normal saline c) With green filter d) After acetic acid.EAI Endorsed Transactions on Pervasive Health and Technology

Figure 7 .
Figure 7. sample screening images from colposcopy portion of dataset D3.The above Figure7displays a subset of a dataset, where the first two images represent negative cases, and the last two images represent positive cases of colposcopy screening pictures.

Figure 8 .
Figure 8. sample screening images from cytology portion of dataset D3.

2 :
Training of Model M2 on Dataset D2 Dataset D2 was used for the training and evaluation phases of the M2 architecture.There were a total of 963 images in the dataset, including 35 abnormal and 61 normal examples for testing, 245 abnormal and 429 normal examples for training, and 70 abnormal and 123 normal examples for validating.

2 .
The outcome of the training process of Model M2 on Dataset D2 (Training Step 2)

. Methodology Description Advantage Disadvantage Result
1.When trained, the CYENET model was 0.971 accurate, whereas the VGG_19 (TL) model was only 0.870 accurate.2. In contrast to CYENET's steady validation methodology and smooth loss curve, 1.Both the dataset's size and the layer's depth have an effect on the model's efficiency.Due 1.The CYENET model outperforms the VGG19 (TL) model by 0.19 points in terms of classification accuracy, reaching 0.923 2. CYENET Accuracy EAI Endorsed Transactions on Pervasive Health and Technology

Table 2 .
[32]gories and Counts of the IARC Image Bank VIA and IARC Image Bank Colpo Datasets, including the number of normal and abnormal images in each dataset[32][33].

Table 4 .
Categories and Counts of the (D3) Dataset, including the number of normal and abnormal images in dataset.
25,which eliminates 25% of the input units at random during training.Subsequently, a flatten layer, fully connected layers, and a classification layer with a softmax function are appended to the model (M1).The model consists of 12 layers.The total number of parameters for this model is 100,936,450, out of which 100,935,874 are trainable parameters, and the remaining 576 are non-trainable parameters.The evaluation of model M1's performance was carried out using different activation functions, as explained in the later part of the paper.

Table 5 .
The distribution of images in Dataset D1 used in developing and evaluating Model M1.

Table 6 .
The distribution of images in Dataset D2 used in developing and evaluating Model M2.
EAI Endorsed Transactions on Pervasive Health and Technology

Table 7 .
The distribution of images in Dataset D3 used in fine tuning and evaluating Model M1 on colposcopy image

Table 8 .
The distribution of images in Dataset D3 used in fine-tuning and evaluating Model M2 on pap smear images.

Table 9 .
Distribution of pairs of colposcopy and Pap images.

Table 10 .
Performance metrics of model M1 on dataset D1.

Table 12 .
Performance Metrics of Well-Known Models Trained with Transfer Learning and Tested on Portion of Dataset D1.The results shown in the table seem to indicate that the models are not performing well.The training accuracy and validation accuracy are both at 50%, which means that the models are essentially guessing randomly.The accuracy, precision, recall, and F1 score are all quite low, indicating that the models are not able to correctly classify the images.

Table 13 .
Fine-tuning results of Model M2 on Dataset D3.

Table 14 .
Fine-tuning results of Model 1 on Dataset D3 using colposcopy images with various performance metrics.

Table 15 .
Fine-tuning results for Model M2 on Dataset D3.

Table 16 .
Performance of the ensemble approach .