White Blood Cells Classification using CNN

One kind of cancer that arises from an overabundance of white blood cells produced by the patient's bone marrow and lymph nodes is leukaemia. Since white blood cells are the primary source of immunity, or the body's defence, it is imperative to determine the type of leukocyte cell the patient has leukaemia from as soon as possible. Failure to do so could result in a more serious condition. Haematologists typically use a light microscope to examine the necessary cell traces in order to classify and identify the features of the cell cytoplasm or nucleus in order to diagnose leukaemia in a patient. One form of cancer is leukaemia, which develops when a patient's bone marrow and lymph nodes produce an excessive amount of white blood cells. It is vital to determine the type of leukocyte cell the patient has leukaemia from as soon as possible because postponing diagnosis can worsen the situation. Our white corpuscles are the primary source of immunity, which is the body's defence. In order to define and identify the features found in the cell cytoplasm or nucleus, hematopathologists typically use a light microscope to examine the necessary cell traces in order to diagnose leukaemia in patients.


Introduction
A person with leukaemia has blood cancer, which is brought on by white blood cells.Because leukaemia is caused by a variety of WBC types, a proper diagnosis must be made for each individual [1].In most circumstances, taking the incorrect technique can result in a catastrophic state for the patient, which is why WBC identification is so important.The aberrant production of leukocytes is the cause of leukaemia, a type of bone marrow illness [2].Because of the buildup, the marrow area of the bones is unable to produce other normal cells, which can cause the patient to develop other illnesses like anaemia, recurring infections, and internal bleeding.
Over time, this WBC infection divides and spreads throughout the body, possibly resulting in the development of a tumour [3,4] or even death by impairing the operation of other organs.Because of the aberrant white blood cells' rapid replication, acute leukaemia patients are unable to perform their normal duties, which exacerbates the illness.Chronic leukaemia comes in various forms, including in one case, they produce more cells than necessary, while in another, they produce less cells than necessary, which prevents the onset of early symptoms and may make them undetectable until they become fatal [5].Affected lymphoid cells are seen in lymphocytic leukaemia.Myeloid cells suffer from myelogenous leukaemia.These are the four primary forms of leukaemia; other forms include myelodysplastic syndromes, myeloproliferative diseases, and hairy cell leukaemia [6].
Using an image classifier, we primarily focus on the four types in this paper, identifying which WBC caused the leukaemia as depicted in the Figure .1.AML is one such prevalent form of acute leukaemia that primarily affects adult Americans, with an average age of 67 or so.Later in 2010 [7], after looking at 12,330 persons, it was discovered that these patients had received AML diagnoses in the USA [8], and 8,950 fatalities were noted.The likelihood of surviving against AML decreases with age because standard operating procedures for the treatment of AML are less tolerable [9].based on the kind.When the total blood count is found to be abnormal within the required rate, the treatment's initial steps begin.The haematologists use a light microscope to view these anomalies [10,11].We got good results in identifying the type of leukaemia by using a CNN model.
White blood cells can be categorised using Deep Learning models, including convolutional neural networks (CNNs), based on their morphological and biochemical properties [12].These algorithms can accurately detect several types of white blood cells and their properties, such as size, shape, and colour intensity [13], after being trained on large datasets of tagged images.
Furthermore, even with limited training data, it has been shown that transfer learning-which makes use of pretrained models for feature extraction-increases the classification accuracy of white blood cells [14].By doing this, the issues brought about by imbalanced datasets and the underrepresentation of particular types of white blood cells might be resolved.
Moreover, to improve classification accuracy, key characteristics from the photos can be identified utilising feature extraction methods like deep learning [15,16].This information can be used to identify and classify infections and diseases affecting white blood cells.
Deep learning methods for white blood cell analysis can help diagnose and classify a variety of diseases and infections that impact white blood cells, such as lymphoma, autoimmune disorders, and leukaemia [17].
It may also provide insightful information about the operation of the immune system.The results of these models can be used to develop novel remedies and therapies for a variety of immune system-related illnesses.

Literature Review
Based on current research, the majority of approaches involve the use of transfer learning, which has a pretrained deep learning model that is well-suited to recognise WBC in order to diagnose the disease early.Given that the WBC classification is the only thing covered in the publications that are currently available, Ahsan et al.Their primary goal in this issue was to achieve high accuracy; however, three flaws in their methodology were observed.These are, first off, their models are limited to using only classification models in order to produce superior outcomes.Second, they rely on the VGG-16 Deep Learning model for transfer learning since they are unable to identify the best presupervised techniques for achieving high results.
Moreover, their models lack sufficient clarity for comprehension.Thus, choosing which medical professionals to trust during a mass screening may become difficult.Previous works that employ techniques like microscopy maintain the high precision, but they suffer from problems like subpar performance and protracted cell cycle cultivation, which makes this method less effective.Another method that restores the standard procedure for quantitative analysis of the provided test samples of blood is cytometry; however, because test samples of blood might destroy themselves arbitrarily, a retrospective study of WBCs becomes difficult.Another strategy, like machine learning models, shows simple outcomes, dependability, and robust model operation.But the ML algorithms cannot be applied on huge datasets thus resulting in temporary high accuracy.
Furthermore, because most deep learning models have built-in pipeline automation, translation in-variance, weight sharing, and end-to-end training, they perform well as image classifiers when using images as the primary input.The precision of a deep learning model remains uncertain since standard datasets need to be perfected before the models can be used.The intricacy of using DL to categorise different types of leukocytes has recently drawn a lot of attention to the topic of WBC categorization.Therefore, the three approaches that we primarily focused on in this paper were CNN, ANN, and ensemble.All of these techniques-NNN and RNN-obtained generally good accuracy.
Hybrid models are used to categorise leukocytes; the Computer Networks technique has achieved an accuracy of 91 percent, leading this race.ANN from Table-0 has gained an accuracy of 63 percent and to achieve high performance.Ensemble CNN and ANN, with an accuracy of, is one type of such approach.These accuracy results were achieved for the 12,000 image entries in the BCCD dataset.Our study team primarily employed the idea of an outlier to crop the dataset photographs to the necessary portion, as illustrated in figure 2.

Dataset Description
Images of white blood cells totaling about 1200 across 4 classes were gathered from the Kaggle website.These photos, which depict a wide variety of white blood cell kinds, were taken from several sources.The dataset provides a comprehensive representation of the morphological and phenotypic alterations in white blood cells for researchers and practitioners in the domains of medical imaging and blood cell analysis.Because each image has been individually tagged and categorised, the data is of a very high calibre and correctness.The availability of this dataset on Kaggle facilitates the acquisition of a broad and diverse set of white blood cell images, rendering it an excellent resource for algorithm development and field testing.As seen in figure 4, every image in the collection has a blue-highlighted WBC cell along with a small bit of blue-colored noise.To reduce this noise, we used a technique called the outlier methodology.An outlier is a data point in a dataset that deviates noticeably from the other values in the same dataset.Measurement errors, data entry errors, or genuine findings that are merely uncommon or unexpected can all result in outliers.If handled improperly, outliers can significantly affect a dataset's analysis and produce false findings.It is now necessary to decrease the distant, blue-colored noise in the dataset.In this case, we essentially obtain all of the image's pixels along the x and y axes and gather all of the pixels that are blue in colour.Now, the outlier's technique will take the values of the x and y axes as coordinates, check for tightly packed and far-fromclosely packed pixels, and then eliminate the pixels that are distant from the closely packed ones.We cropped the image to just transmit the bluehighlighted region to the data after removing noise from the data, which may have improved accuracy.This cropping is done by finding the junction points of the lines produced by the maximum and lowest values of the x-and y-axes of the pixels in the highlighted blue zone.We increased the pixel length of the lines to make the final image produced easier to read.The final image is shown in figure 5 below.In order to improve accuracy and productivity, the parameters of the trained model are adjusted using model assessment, which evaluates the model's performance using an alternative test dataset.Testing the model: You can determine the model's generalizability and suitability for fresh data by testing it on an alternative test set.

Model Building
Visualisation and interpretation: To gain insight into the model's prediction process, analyse the model's output and visualize the feature maps.

Proposed Classification methods and techniques:
KNN KNN is a well-liked non-parametric machine learning method for classification applications.Throughout training, the KNN algorithm keeps track of every feature vector and the corresponding class labels in a database.To classify a new, unidentified white blood cell image, the approach measures the distance between the image's feature vector and each feature vector in the database.The KNN technique then selects the K neighbours who are closest to the new image based on their distance apart, and then uses the K neighbours' consensus to determine the new image's class.The image features are extracted and applied in order to represent the image in the feature space.
The KNN technique then looks up the K nearest neighbours in the database and uses their agreement to decide the new image's class label.
The KNN model is fed a 32X32X3 RGB image, and the output is based on which class the test attributes are closest to.The KNN will employ pixel values as features in this instance.Consequently, the KNN model will have 3,072 attributes in total.

SVM
SVM (Support Vector Machine), a supervised machine learning model, is widely used to solve classification issues.Based on their digital images, SVM can be used to classify white blood cells in the study's summary.
To remove any noise or artefacts, the leukocyte pictures used in these assays are first preprocessed.The features are then retrieved from the images utilising techniques including morphological operations, texture analysis, and wavelet transform.These features are then used to train the SVM algorithm.
We supply a 32X32X3 (RGB image) to the SVM model in this study.whichextracts features like texture features, edge histograms, and histograms.Each image is represented by these features in a high-dimensional feature space.The SVM model will forecast a distance vector as an output.This clarifies which class the nearest labels belong to.

ANN
Artificial neural networks, a type of machine learning technique, are modelled after the structure and functionality of organic neurons.ANNs consist of many interconnected layers of nodes, where each node performs a mathematical processing of the incoming input and then sends the results to the layer above it.
In the summary of the classification of leukocyte cells, artificial neural networks (ANNs) can be used to classify white blood cells according to their digital images.The white blood cell images are initially preprocessed to remove any noise or artefacts.The features are then retrieved from the images utilising techniques including morphological operations, texture analysis, and wavelet transform.These features are subsequently used to train the ANN.
An RGB image of 32 by 32 by 3 is supplied as the input for the ANN model, and the output is based on which class the test attributes are closest to.After receiving 3,072 feature vectors, the ANN model creates a probabilistic strategy for each of the four pre-defined classes.

CNN
The foundation of the CNN operating principle is the extraction of information from the input image using convolutional layers, which allows the network to learn the hierarchy of more complex features.Many layers, including fully connected, convolutional, and pooling layers, are typically present in a CNN.When extracting features from an input picture, convolutional layers are crucial.where the original image is subjected to many filters, sometimes referred to as kernels or weights, that comprise each convolutional layer.When the filter travels to cover the full image, a dot product is applied between the filter weights and the image.This results in the creation of a feature map, which encodes the presence of particular attributes inside the image.The filters are frequently tiny in order to learn and are trained during the training period.
In this work, we employ CNN to extract features from images of white blood cells.The first convolution layer is composed of 32 filters, or kernels, with a RELU activation function, and has an input vector dimension of 32X32X3 (RBG image).Next, a 2X2 max pooling layer is employed.A 2x2 max pooling layer comes after the second convolution layer, which has 64 filters (or kernels) of size 3X3 with a RELU activation function.After flattening the tensor into nX1, 64-size dense layer is applied.In order to identify four distinct types of leukocytes, a dense layer of four sizes is finally applied.Accuracy: The degree of agreement between an absolute measurement and the real measurement is called accuracy.demonstrates the degree to which the results match the real value.We determine accuracy using: Recall: Recall is determined by dividing the total number of Positive outcomes by the fraction of Positive outcomes that were correctly identified as Positive.The model's recall quantifies its capacity to identify positive samples.
As more positive samples are discovered, the recall value rises.

A.1. Results and Discussions
Overall, CNN has been found to be a successful way for classifying white blood cells, outperforming other classification algorithms in terms of accuracy and performance.With additional layers and nodes, CNN models have the potential to be more complicated than SVM models, necessitating greater processing power for both training and inference.They can, however, also offer additional adaptability and power when simulating intricate connections between the input feature associations and the output class labels.The results of a Machine learning image classification task can be evaluated using a variety of measures like accuracy, precision, recall and confusion matrix.In order to classify leukocytes, we employed the SVM, CNN, KNN and ANN models.We got 67% accuracy with Ann, 93% with CNN, 82% with KNN and 87% with SVM.We obtained higher precision, recall for CNN, KNN and SVM models than for Ann.The high accuracy for CNN and SVM, shows that the models was successful in classifying blood cells into their respective species.Misclassifications of similar species, such as eosinophils and neutrophils, highlight the need for further data to differentiate between similar classes.The most accurate and effective way to count and classify WBCs may be the combined method of using CNN and SVM, according to the study's findings.The findings suggest that the model might be used efficiently in a realworld setting to determine which white blood cell is causing the problem.CNN has the highest accuracy (93%) of all techniques used.

Appendix B. The Conclusion
Overall, classifying of the leukocytes with the help of machine learning and deep learning methods is one such active field of study with the potential to enhance patient outcomes and diagnostic precision.In this study, we presented a unique method for classifying leukocytes through the combination of feature extraction methods with the help of classification models.
To classify of the leukocytes, we investigated, so that efficacy of several deep learning and machine learning algorithms, such as SVM, ANN, KNN, and CNN.Our findings demonstrated the excellent accuracy and performance of all of these algorithms, with CNN being the most successful strategy overall.Additionally, our study illustrated how crucial feature extraction is for classifying white blood cells.The accuracy of the classification models may be increased by using feature extraction techniques including wavelet transform, texture analysis, and morphological procedures to identify key properties of the white blood cells.
Overall, our research sums up to expand the flow of knowledge on the categorization of white blood cells and shows the use of deep learning and machine learning techniques in this area.Our suggested strategy might be used in clinical settings to better patient care and increase classification accuracy of leukocytes.Additional investigation is required to examine the robustness and generalizability of our approach as well as to confirm its efficacy in clinical settings.

Further Scope:
The classifying of the leukocytes, using machine learning methods is an interesting field of study with the potential to enhance patient outcomes and diagnostic accuracy.Overall, there is a lot of room for future study and effort in this area because machine learning and deep learning algorithms have enormous scope to make better view in hematology.

Fig 1 .
Fig 1. Illustrating 4 different types of white blood cells with their names from BCCD dataset and WBC's dataset.

Fig 2 :
Fig 2: Illustrates, on the left side it shows the original image of an Eosinophil cell from BCCD dataset, on right side it shows the cropped image of Eosinophil cell by using out layer's concept.

Fig 4 :
Fig 4: Blue color Noise in the images of dataset

Fig 5 :
Fig 5: Image in data set before and after data processing Selecting a model architecture: For the aim of classifying white blood cells, select a suitable model architecture, such as Support Vector Machines (SVMs), Convolutional Neural Networks (CNNs), or Artificial Neural Networks (ANNs).The design of the model should be able to efficiently classify the images and gather the relevant data from them.Preprocessing the data includes improving the features, balancing the colour and contrast, and de-noising the images.This step may involve the use of morphological operations, segmentation, filtering, and other image processing techniques.The process of producing new images from preexisting ones using various transformations, like rotation, scaling, and flipping, is known as data augmentation.The goal of this procedure is to increase the size of the dataset and improve the model's performance.Once the dataset has been divided into training, validation, and testing sets, an appropriate optimisation technique, such as Stochastic Gradient Descent (SGD), should be used to train the model on the training set.The model needs to be optimised in order to lower the loss of function and prevent overfitting on the training set.Model assessment entails evaluating the performance of the training technique using several evaluation metrics, such as accuracy, precision, recall, and F1-score, on the validation set.The performance can be further improved by modifying the model's hyperparameters, which include the learning rate, batch size, and number of iterations (epochs).

Fig 10 :Fig 11 :
Fig 10: ANN confusion matrix For the classifying of the leukocytes, future research can study additional feature extraction methods like deep feature learning and transfer learning, as well as other deep learning architectures including recurrent neural networks and attention-based models and Experiment with multiple network designs and hyperparameters to increase the performance of the model.Incorporating extra data, such as photographs shot under various lighting situations, can improve the model's resilience.Conducting user research to evaluate the model's usability in a real-world context and collecting input for future enhancements.

Table 2 :
Accuracy Table