Colorectal cancer prediction via histopathology segmentation using DC-GAN and VAE-GAN

Colorectal cancer ranks as the third most common form of cancer in the United States. The Centres of Disease Control and Prevention report that males and individuals assigned male at birth (AMAB) have a slightly higher incidence of colon cancer than females and those assigned female at birth (AFAB) Black humans are more likely than other ethnic groups or races to develop colon cancer. Early detection of suspicious tissues can improve a person's life for 3-4 years. In this project, we use the EBHI-seg dataset. This study explores a technique called Generative Adversarial Networks (GAN) that can be utilized for data augmentation colorectal cancer histopathology Image Segmentation. Specifically, we compare the effectiveness of two GAN models, namely the deep convolutional GAN (DC-GAN) and the Variational autoencoder GAN (VAE-GAN), in generating realistic synthetic images for training a neural network model for cancer prediction. Our findings suggest that DC-GAN outperforms VAE-GAN in generating high-quality synthetic images and improving the neural network model. These results highlight the possibility of GAN-based data augmentation to enhance machine learning models’ performance in medical image analysis tasks. The result shows DC-GAN outperformed VAE-GAN.


Introduction
Colorectal cancer, often known as CRC, can affect both the colon and the rectum.The second most prevalent disease) in women is colorectal cancer (CRC) and in males, colorectal cancer is the third most prevalent type of cancer, is one of the most common kinds of gastrointestinal cancer.The incidence of CRC is expected to in-crease globally by 80% by 2035, despite previously existing variations such as geo-graphic distribution, age, and gender inequalities.The main reason for the rising incidence of CRC is changing in lifestyle, notably dietary patterns.The majority (70-80%) of CRCs are sporadic, although about one-third are hereditary.The term CRC refers to a broad range of carcinoma subtypes with various morphological traits and molecular alterations.Recently, cancer detection and treatment have benefited from the application of machine learning [21] [22] approaches for medical imaging, such as Generative Adversarial Networks (GANs).Data augmentation can boost the effectiveness of machine learning algorithms for analyzing medical images.In this article, we investigate how to segment colorectal cancer histology using generative adversarial networks (GANs) for data augmentation.We evaluate how well two GAN models-DC-GAN and VAE-GAN-perform when it comes to creating convincing synthetic images for use in training a model based on neural networks that can be developed to predict the occurrence of cancer.Our research intends to increase the precision of cancer prediction models using advanced techniques that can improve the precision of detecting and treating colorectal cancer.
Accurate segmentation of histopathology images is crucial for the early detection and effective treatment of cancer.However, the limited availability of annotated data can hinder the performance of these models.Data augmentation using GANs has shown promise in

Literature Review
[1] Used the CNN architecture to test the classification findings.Then, using artificial liver lesions instead of the traditional data augmentation methods, examined the impact of data augmentation.Put the two techniques for creating artificial lesions into practice.They discovered that the Deep Convolutional GAN (DCGAN) technique performed better in testing.[2] Demonstrate that such a SegAN framework outperforms the latest U-net segmentation method in terms of effectiveness and stability for the segmentation task.Using MICCAI BRATS brain tumour seg-mentation challenge data, they evaluated our SegAN approach.Numerous experimental findings show how effective the suggested SegAN with multi-scale loss is.[4][5] Assess Cycle GAN's potential for enhancing data in CT scan segmentation tasks.Using a large image collection, A Cycle GAN was trained to convert CT images with contrast to those without contrast.Then, utilising these created non-contrast images, they added to our training using the trained Cycle GAN.The researchers evaluated the segmentation performance of U-nets on two separate datasets-the initial contrast CT dataset utilized for segmentation creation, and a second dataset that consisted solely of non-contrast CTs, after training the U-Nets on the original dataset and a combined dataset containing both the original data and the generated non-contrast images.The in-distribution dataset and the out-of-distribution dataset are the names given to these two separate datasets.[9] This study uses Deep Convolution GANs (DCGANs) to address the issue of a small, labelled dataset.A visualisation programme called ImageJ was employed to examine how closely the actual and artificial images resembled one another.With the aid of medical professionals, a visual Turing test was carried out to verify the proposed model.

Data Pre-Processing & Augmentation
At this point, the dataset's images have all been scaled to a fixed size of 96x96.The data augmentation is to provide more representative and diverse data to improve the performance of machine learning models.The model can learn to be more durable and generalise to new, unknown data by producing modifications of the original data.

DCGAN
DCGANs, or "Deep Convolutional Generative Adversarial Networks," are used to produce images.Convolutional neural networks (CNNs) are used as both the generator and discriminator networks in the conventional Generative Adversarial Network (GAN) architecture, which is modified to create DCGANs [12].
The generator network generates images that resemble the real images in the training dataset using input as a random noise vector is used.The discriminator network attempts to distinguish between real and fraudulent images as input.With the aim of improving both the discriminator's accuracy in identifying false and the generator's capability to create realistic images, in a competitive approach, the two networks are trained concurrently.DCGANs have a number of architectural characteristics that aid in the creation of high-quality images.For instance, the discriminator network downsamples the image while the generator network upsamples the input noise vector using transposed convolutional layers.Additionally aiding in the stabilisation of the training process and raising the caliber of the generated images are the usage of batch normalisation and the LeakyReLU activation function [13].
A wide range of image types, including faces, natural settings, and even artwork, have been produced using DCGANs.

VAE-GAN
The term "Variational Autoencoder Generative Adversarial Network," or simply "VAE-GAN," refers to a specific type of deep learning model that, in order to produce images, combines the benefits of both Variational Autoencoders (VAE) and Generative Adversarial Networks (GAN).
A particular class of generative model known as a VAE produces new samples by learning a low-dimensional representation of the data input.GANs, on the opposite hand, there is a category of adversarial model that employs a discriminator network to separate real data from fake data and a generator network to make the fake data.By employing the VAE to develop a latent representation of the data, which the GAN subsequently uses to produce realistic images, VAE-GAN combines the benefits of both models [8].
In VAE-GAN, while the generator network uses a random noise vector as input, the discriminator attempts to distinguish between real and fake images.The input image is mapped by the VAE encoder network to a latent representation, which is subsequently utilised to recreate the original image.The VAE decoder network creates a reconstructed image using the latent representation as input.The VAE encoder and decoder networks are trained using a reconstruction loss and a KL-divergence loss, while the generator network and the VAE decoder network are trained together using the adversarial loss from the discriminator network [8].
VAE-GAN has been found to produce high-quality photos with crisper features and fewer artifacts.It has been used to create realistic faces and settings from nature, among other image-generation jobs.

Implementations
The Keras and Tensor Flow libraries are used to implement both the VAE and DCGAN models in Python.

DCGAN
In DCGAN, the generator network utilises deconvolutional layers to create new images that resemble the training dataset using a random noise vector as input.they are real or fake.The dataset is loaded.Figure 1 illustrates the entire flow.

Figure 1. DCGAN Architecture
Generator: The generator begins with a dense layer that receives an input of a seed (random noise).Following that, a normal layer with ReLU as activation and a Batch Normalization layer is present.The noise vector is fed into the model, which is then repeatedly upsampled until it reaches the size of the images.Three deconvolutional layers are used for this.All with a kernel size of 5x5 and utilise bias option set to false and an ordinary layer that uses activation.A batch normalisation layer and an initial deconvolutional layer with a 1x1 stride size following ReLU.A batch Normalization layer, a normal layer utilising activation ReLU, and a second deconvolutional layer with a stride size of 2x2 are all added after the second deconvolutional layer [18].
Discriminator: An image classifier built on the CNN serves as the discriminator.It employs 3 convolutional layers with 64 and 128 filters, respectively.It employs a stride size of 2x2 and a kernel size of 5x5.A regular layer with LeakyReLU as the activation and a dropout layer was incorporated into the model architecture with a 0.2 dropout rate following each convolutional layer [18].

Loss function:
Due to the fact that they are trained on two different networks, the discriminator and generator optimizers are separated.

Discriminator loss:
A technique is used to gauge the discriminator's effectiveness in differentiating between real and fake images.It evaluates the discriminator's predictions against a variety of other ones using real-world images as its input.The discriminator's predictions on fake images are handled in the same way, although in this case, a comparison is made using an array of zeros [19].

Generator loss:
The effectiveness of the generator's deception of the discriminator is measured by a method that is put into practice.The discriminator classification on the resulting images is compared to an array of ones to accomplish this.Developing and producing images to train the network, a training loop is established.A random seed is first utilised as input to create an image using a technique that simultaneously trains the generator and discriminator.
After each iteration of the loop, the model calculates the loss and adjusts the gradients before the discriminator distinguishes between the real and fake images.After each iteration of the loop, the images are created and saved during the training loop [19].

VAE-GAN
By first creating samples in the latent space of the VAE using the generator portion of the GAN and then mapping those samples to the data space using the decoder portion of the VAE, the VAE-GAN integrates these two concepts.The reconstruction error between the original and reconstructed data as well as the adversarial loss be-tween the real and fake samples are both minimised during the training of the VAE-GAN [6].Flow is depicted in Figure 2. First, the datasets are loaded.

Figure. 2. VAE-GAN Architecture
Network Architecture: An input layer is utilised for the encoder network and accepts the images as input.Two convolutional layers are added after that, the first of which employs 64 filters and the second of which is doubled to 128 filters.Both have a 3-kernel size and a 2x2 stride size.They are followed by a layer that flattens, and the output size of the subsequent fully connected dense layer is twice the dimension of the latent space.The encoder's structure is somewhat mirrored in the decoder.It consists of a dense fully connected layer whose output size corresponds to that of images and the activation ReLU, followed by an input layer whose output size corresponds to that of the decoder network.Three deconvolutional layers are then applied, each with a kernel that is the same size as the convolutional layers in the encoder network and uses the same activation function for the first two deconvolutional layers.The deconvolutional layers decrease from 128 filters at stride size 2 to 64 filters at stride size 2 to either 1 or 3 filters at stride size 1 for color images [14] [20].

Loss function:
The reconstruction loss and the KL divergence loss are the two terms that make up the loss function in VAE.The model's ability to reconstruct the input data is measured by the reconstruction loss, and the latent variables are encouraged to follow a normal distribution by the KL divergence loss.The reconstruction loss was calculated using mean squared error (MSE).The KL divergence loss encourages the latent variables to follow a normal distribution.It is described as the discrepancy between the latent variable distribution and a typical normal distribution.The sum of the reconstruction loss and the KL divergence loss is the total loss function for VAE [7].Utilising optimization methods like stochastic gradient descent and backpropagation, the objective of training VAE is to reduce this total loss function.

Training and Generating:
The dataset is iterated through and then given to the encoder network for training.The mean and log variance parameters are taken from the estimated posterior distribution during each iteration.When selecting samples from the approximate posterior distribution the reparametrization approach is used.The samples are given to the decoder as the final training step to acquire the logits from the generating distribution.The images are generated and saved by selecting a collection of latent vectors from the Gaussian prior distribution.To produce predictions of the specified distribution, the generator converts the latent sample to logits of the observation.The generated images are these predictions [3].

Results and Discussions
The Python programming language is used in this work to train and evaluate the model using Google Colab Notebook.The keras and torch libraries are utilized for VAE-GAN and DC-GAN.Figures 3 and 4 show the sample images used in DCGAN and fake images generated in DCGAN.Figure 5 and 6 shows the sample images used in VAE-GAN and produces fake images via reconstruction and average VAE-GAN process.The loss functions of the generator and discriminator are used to assess each network's performance during training.Typically, the discriminator loss is a measurement of how well it can tell the difference between generated and real data, whereas the generator's loss is a measure of how well the generated data can deceive the discriminator.Due to their random initialization and lack of data knowledge, the generator and discriminator both have significant loss values at the start of training.The loss of the generator reduces during training as it learns to produce more realistic data that can deceive the discriminator.On the other hand, because the discriminator improves at differentiating between fake and real data, it experiences a decrease in loss.Figure 7 graphically represents the generator and discriminator loss during training.To assess the effectiveness of the prediction model, the loss measure is used.Figure 8

Conclusion and Future Work
In summary, our study demonstrated the effectiveness of using GAN for data augmentation in medical image analysis, specifically improving the neural network model for colorectal cancer prediction using histopathology images.The study compared the performance of two GAN models, DC-GAN and VAE-GAN, and found that DC-GAN outperformed VAE-GAN in generating realistic synthetic images and improving the neural network model.To delve deeper into data augmentation for medical image analysis tasks, future studies could investigate alternative GAN architectures and hyper parameters.Additionally, the accuracy of the model could be increased by researching transfer learning techniques and multi-task learning and enabling the prediction of other aspects of cancer diagnosis and prognosis.Furthermore, the development of standardized evaluation metrics and benchmarks for medical image analysis tasks could facilitate the comparison and validation of different models and techniques.Finally, the application of these techniques to larger and more diverse datasets could enhance their effectiveness and impact on the early detection and treatment of colorectal cancer.

[ 10 ]
For the purpose of segmenting the tumour in breast ultrasound pictures, they recommend employing a technique depending on a Generative Adversarial Network (GAN).The discriminator is a CNN classifier, while the segmentation and generation modules are implemented using Residual-Dilated-Attention-Gate-NET (RDAU-NET).The total Accuracy, PR-AUC, ROC-AUC, and F1score achieved were, respectively, 0.98, 0.95, 0.89, and 0.88 in comparison to the majority of conventional deep net models.The outcomes also indicate how the WGAN-RDA-UNET model can be used to address the flaws in RDA U-Net, Convolutional neural networks, and other models.[11]Proposes employing a semi-supervised multichannel-based generative adversarial network (MGAN) to rate Diabetic Retinopathy.The suggested semi-supervised MGAN can recognise the subtle damage feature utilising high-resolution fundus pictures without confining by reducing the reliance on labelled data.Experimental findings on the open Messidor data set demonstrate the effectiveness of the suggested model in grading DR.[15] proposed a GAN based on NU-Net.The primary architecture for this approach is cGAN.The NU-Net is the method's generating network, and the FCN is its discriminator.Utilising the MM-WHS 2017 challenge dataset, our experiment about automatic complete heart seg-mentation outperformed most Deep learning techniques with a Dice score of 0.899.[17]To create fake medical images that correspond to masks from a single training image, we describe the SinGAN-Seg synthetic data production process.The re-searchers demonstrated that models trained on synthetic data can outperform those trained on real data, provided that both datasets comprise a EAI Endorsed Transactions on Pervasive Health and Technology | Volume 10 | 2024 | substantial amount of training data.This was achieved by training UNet++ on a combination of real and synthetic data generated through the SinGAN-Seg pipepipeliney demonstrate, how-ever, that when training datasets lack a significant amount of data, synthetic data produced by the SinGAN-Seg pipeline enhances segmentation model performance.
A CNN-based classifier called the discriminator network uses realistic images from the training dataset or created images from the generator network to try to determine whether EAI Endorsed Transactions on Pervasive Health and Technology | Volume 10 | 2024 |

Figure. 3 .Figure. 4 .Figure. 5 .Fig. 6 .Figure. 7 .
Figure. 3. Sample images -DCGAN shows the Loss D and G in the case of DCGAN and VAE-GAN in generating high-quality synthetic images and improving the neural network model.Based on this DCGAN outperforms by showing minimal discriminator loss.

Figure. 8 .
Figure. 8. DCGAN vs VAE-GAN -Loss Comparison generating synthetic images that can improve these models.The study aims to compare the effectiveness of two GAN models, DC-GAN and VAE-GAN, in generating synthetic images in order to train a neural network model for cancer prediction.The findings of this study can help to improve the medical image analysis models and ultimately contribute to early detection and colorectal cancer treatment.Colorectal Cancer is one of the world's major causes of cancer death, and early detection is critical for effective treatment and improved outcomes.Despite advances in diagnostic technologies, medical professionals' manual examination of medical imaging data can still be time-consuming and prone to human error, resulting in missed diagnoses or false positives.Create a machine learning model that accurately diagnoses and classifies colorectal cancer using a set of medical imaging data, such as CT scans, colonoscopy images, or MRI scans.To make accurate predictions, the model should take into account both the visual appearance and structural information of the images.The model should also be capable of identifying and highlighting regions of the images that are indicative of CRC, as well as providing a quantitative measure of the degree of malignancy for each region.
EAI Endorsed Transactions on Pervasive Health and Technology | Volume 10 | 2024 | Scope • Create a GAN-based deep learning model to diagnose colorectal cancer from medical image datasets such as EBHI-seg • To improve accuracy and robustness, train the GAN model on a large, annotated dataset of medical images.