PCA-DNN: A Novel Deep Neural Network Oriented System for Breast Cancer Classification

INTRODUCTION: The number of women diagnosed with breast cancer has risen rapidly in recent years worldwide, which is anticipated to continue. After lung cancer, it is the second most common cause of mortality worldwide, and most women are diagnosed with it in their lives. Accurate breast cancer classification has become a challenging task in the healthcare sector. Breast cancer is a malignant tumor found in the breast tissue due to abnormal cell proliferation inside the breast. Early detection of breast cancer can reduce the death rate. OBJECTIVES: This article proposes a principal component analysis deep neural network (PCA-DNN) for breast cancer classification. METHODS: PCA-DNN is developed using features extracted through Principal component analysis (PCA) with deep neural network (DNN).In addition to PCA-DNN, conventional DNN and machine learning classifiers, including support vector machine (SVM), naive bayes (NB), random forest (RF), and adaptive boosting (AdaBoost), are used to perform classification. The Wisconsin Diagnostic Breast Cancer (WDBC) dataset available at the University of California, Irvine (UCI) is used to conduct experiments. RESULTS: PCA-DNN provided 98.83% of accuracy and 10.36% of loss. The area under the receiver operating characteristic curve (AUROC) equals 99.3%. CONCLUSION: Results provided by PCA-DNN are better than conventional DNN and traditional machine learning classifiers. Compared to conventional DNN, it offered accuracy improvements of 3.68% and loss reductions of 29.37%.


Introduction
Thousands of women die each year from breast cancer (BC), one of the most common causes of mortality in women.BC is the second-most frequent type of cancer worldwide.More than 23% of all female cancers worldwide are breast cancer as a result [3].According to a US report, there are 3.8 million women who are alive but have breast cancer.[4].Breast cancer manifests in various ways when cells affected by cancer spread across the body.Ductal Carcinoma in Situ (DCIS) occurs due to the spreading of abnormal cells outside the breast [5].Invasive Ductal Carcinoma (IDC) [6] cancer occurs due to the spreading of abnormal breast cells across the breast tissues, and it mostly affects men [7,8].Mixed Tumors Breast Cancer (MTBC) occurs due to abnormal duct and lobular cells.Lobular Breast Cancer (LBC) occurs within the lobule.Mucinous breast cancer (MBC) is caused by invasive ductal cells.It is caused due to abnormal tissues across the duct [9].The last type is inflammatory breast cancer (IBC), which causes reddening and swelling of the breast.It is a rapidly growing cancer that is caused by the blockage of lymph vessels in the broken cell [10].Breast cancer has no known origin, and the best treatment depends on when the cancer is diagnosed.The possibility of a patient's survival improves when the disease is detected early.As a result, tumor diagnosis has become a critical and urgent issue in the medical field [11][12][13].Breast cancer is caused by uncontrolled cell proliferation.A typical cell develops in size, divides into new cells, and dies at the appropriate period during its life cycle.Cancerous cells, on the other hand, act differently from normal cells.Normal cells can become cancerous because of any mutations in DNA.Some genes regulate normal cell function, such as cell development, division of cells, and repair or death at the appropriate time.A proto-oncogene is a type of gene that regulates cell proliferation.It becomes a "bad" gene called Oncogenes when too many copies exist.Furthermore, tumor suppressor genes reduce the pace of cell division.Uncontrolled cell development occurs when certain genes do not operate properly, which has the potential to cause cancer [14].Certain DNA mutations also increase breast cancer risk.The reason behind breast cancer-causing mutations is unknown.Cancerous cells can clump together to create a tumor.Benign tumors are those that are not cancerous.Malignant tumors are those that are cancerous.Malignant tumors can migrate to other body regions, causing them to spread [15].Abnormal cell growth inside the breast leads to benign tumor.However, they do not expand.
Beyond the breast, and do not pose a threat to human life.The types of tumors are shown in Figure 1.The relevance and urgency of the topic addressed in this study lie in the profound impact of breast cancer on public health globally.Developing accurate classification methods is vital to improve early detection and effective treatment.This research aims to address issues related to late-stage diagnoses, which frequently result in difficult and inefficient treatment.Improved survival rates and better treatment outcomes depend on early detection of BC.Numerous techniques have been developed to diagnose BC and reduce the number of fatalities from the disease, and many computer-aided approaches have been employed to improve diagnostic accuracy.However, accurately classifying benign and malignant tumors is challenging [16].

Figure 1. Malignant and Benign Tumors [15]
The research objectives of this paper are as follows:

Literature Survey
Abdel-Zaher and Eldeib [17] used a deep belief network (DBN) to classify breast cancer.Backpropagation neural networks with the levenberg-marquardt learning function were used to develop this system.The supervised path of backpropagation followed the unsupervised path of DBN.The DBN path was used to initialize the weights.Asri et al. [18] classified breast cancer using four algorithms: SVM, NB, decision tree (DT), and k nearest neighbors (KNN) on Wisconsin datasets.The experiments were done on the Weka machine-learning tool.SVM outperformed with 97.13 % of accuracy.Peng et al. [19] proposed a method based on artificial immunity and achieved 98% accuracy on the WDBC dataset.Computer immunology is based on the concept of the immune system of biology.One of the main challenges in diagnosis systems based on supervised learning is obtaining labelled data.The proposed system reduced the requirement for labelled data.Nilashi et al. [20] developed a system with fuzzy logic.The problem of multicollinearity was solved using PCA.Fuzzy rules were produced using the classification and regression tree (CART) algorithm.The system achieved 94.1% accuracy on the Mammographic mass dataset and 93.2% on the WDBC dataset.Huang et al. [21] developed SVM ensembles for BC classification.The best features from the dataset were chosen using a genetic algorithm (GA).SVM ensembles were developed using bagging and boosting methods.SVM classifiers with different kernels were used while constructing SVM ensembles.An ensemble based on the bagging method performed best on a small dataset, whereas an ensemble based on the method performed best on a large dataset.Dora et al. [22] proposed the Gauss-Newton representationbased algorithm (GNRBA).Sparse representation was used with training samples.Optimal weights of training samples were found using the gauss newton-based method.Implementation was done in Matlab software.The system achieved 98.48% of accuracy.Alikovi and Subasi [23] proposed a system with two stages.
In the first stage, unnecessary features were removed using GA.GA has selected 14 features.In the second stage, various classifiers were used, and the best classifier was selected.Rotation forest was identified as the best classifier.Zhang et al. [27] proposed a hybrid approach combining the k-means and C5.0 algorithms.Clustering was done using kmeans, and informative samples around the cluster's edge were chosen.It resulted in a balanced dataset classified with the boosted C5.0 algorithm.The system obtained 98.2% accuracy.Dhahri et al. [28] proposed an automated system based on genetic programming.By genetic programming, the best features and optimal values of parameters were identified for the classifiers.The performance of SVM, KNN, DT, NB, RF, AdaBoost, logistic regression (LR), gradient boosting (GB), and linear discriminant analysis (LDA) was evaluated.
The AdaBoost achieved the maximum accuracy of 98.24%.Zhang and Chen [29] developed a hybrid model combining k-means, random oversampling example (ROSE), and SVM methods.The dataset was balanced using ROSE.K-means was used to select samples near the cluster boundary.Using ROSE and k-means along with SVM improved the performance of SVM.Salod [30] used the breast cancer Coimbra dataset (BCCD), which contains 116 rows having ten features based on breast cancer patients' routine tests.The performance of different algorithms, including SVM, LR, DT, KNN, AdaBoost, RF, and GB, were checked on full features and selected features.
Correlation-based feature selection (CFS) was used for selecting features.Kadam et al. [31] proposed a method based on softmax regression and sparse auto-encoders for classifying breast cancer.An auto-encoder comprises a decoder, an artificial neural network, and an encoder.In sparse auto-encoder, sparseness constraints are applied on all hidden nodes.Khan et al. [32] classified BC images using transfer and deep learning.Features were extracted using VGGNet, GoogleNet, and ResNet.A total of 8000 images were used for training and testing.A maximum accuracy of 97.25% was obtained.Al Ghunaim et al. [33] compared machine learning algorithms using two types of big data.Algorithms were applied to individual datasets and combined datasets.SVM, DT, and RF were used to develop a model using three datasets.Results show that SVM in the spark platform provided the best performance.Memon et al. [34] used the SVM with a recursive feature elimination technique to detect breast cancer.The performance of SVM on various kernels was evaluated.SVM achieved 98% on the RBF kernel, 84% accuracy on the sigmoid kernel, and 97% accuracy on the polynomial kernel.
Abdar et al. [35] performed various experiments using SVM and ANN.The performance of SVM was evaluated with various values of hyperparameters.It was identified that these hyperparameters helped in improving the performance of SVM.CWV-BANNSVM model was proposed by combining boosting ANNs (BANN) with SVM using the confidence-weighted voting method (CWV).Hyperparameters selected during the first experiment were used to develop CWV-BANSVM.The model was evaluated on a dataset having 669 records.Zheng et al. [36] divided images into MRI, ultrasound, and digital images.The authors performed classification using CNN, autoencoders, and long short-term memory (LSTM).Adaboost high-level learning model (DLA-EABS) was proposed, which provided 97.2% accuracy.Abdar et al. [37] proposed a nested ensemble mechanism based on voting and stacking.There were meta classifiers and classifiers in nested ensemble classifiers.Each metaclassifier contained two or more classification algorithms.

EAI Endorsed Transactions on Pervasive Health and Technology
The proposed classifier outperformed other classifiers, achieving 98.07% accuracy.Supriya and Deepa [38] proposed an optimized artificial neural network (OANN) model.Firstly, data was preprocessed using replacing missing attributes (RMA) and normalization methods.Important features were selected using the modified dragonfly algorithm (MDF).The classification was done using OANN, which was optimized using the grey wolf optimization (GWO) algorithm.The OANN model achieved an accuracy of more than 96%.Kumar et al. [39] predicted BC using twelve classifiers: decision table, AdaBoost, J48, J-Rip, lazy K-star, lazy IBK, LR, RF, NB, multilayer perceptron, multiclass classifier, and random tree.Experiments were performed on the Wisconsin dataset.All of the classifiers performed well and most of them provided accuracy of more than 94%.NB has provided the worst performance, and lazy IBK has the best.Naji et al. [40] diagnosed breast cancer using SVM, KNN, NB, DT, and LR classifiers.Three best-performing classifiers, SVM, KNN and LR, were used to develop an ensemble model using a majority voting mechanism.The ensemble model provided an accuracy of 98.1%.Al-Azzam and Shatnawi [41] applied various semisupervised and supervised learning methods on the WDBC dataset.LR, NB, SVM, DT, RF, gradient boosting, and extreme gradient boosting (XGBoost) classifiers were evaluated, and their performance was compared.Performance was assessed using k-fold validation.The highest accuracy of 98% was obtained using the KNN classifier.According to a literature review, multiple systems have been given to detect breast cancer.These systems have been developed using various techniques, including machine learning, deep learning, and fuzzy logic.Some systems have used images for diagnosis, while others have used clinical data from medical test results.Most systems combine machine learning algorithms with feature selection and feature extraction techniques.In existing systems, feature extraction methods are not utilized with DNN.This study incorporates the PCA feature extraction approach with DNN to address this research gap.

Dataset
This research used the WDBC dataset available at the UCI repository.This dataset has 569 incidences, 357 of which are benign and 212 of which are malignant [42].One class attribute, an ID number, and 30 real-value attributes comprise the 32 included features.These features are derived from an image of a fine needle aspiration technique performed on a breast mass and are used to define the properties of the cell nuclei.The class attribute has two possible values: malignant and benign.

Methods
The underlying techniques for PCA-DNN are described in this section.

Principal Component Analysis
The PCA feature extraction method and traditional DNN are combined in the PCA-DNN approach.A matrix with n features is transformed using PCA into a new dataset with fewer features.In other words, it reduces the number of features by introducing new, fewer variables that effectively capture the significant quantity of information in the original features.PCA identifies the eigenvectors of a covariance matrix having the highest eigenvalues to transform the data to fewer dimensions.

Deep Neural Network
The structure of the human brain drives the basic architecture of a DNN.DNN architecture includes multiple computation units.These computational units are connected.The perceptron receives input and provides output.The fundamental concept behind a neural network is that input, i, is combined with a bias, b, and then weighted by, w, before being summed, as shown in equation 1.
Weight lies between -1 to 1.If all the weights are made very small, it will take longer to get to a point where anything significant occurs.Conversely, using large initial weights increases the risk of becoming locked in a local optimum too early.The activation function performs transformation nonlinearly and activates and deactivates nodes in DNN.Rectified linear (Relu), sigmoid, and softmax are frequently used activation functions.The programmer does not need to provide all the computational parameters to the DNNs, which is a significant feature of the DNNs.A DNN is trained by exposing several examples and modifying the internal parameters.The performance of PCA-DNN was compared to conventional DNN and four machine learning classifiers NB, RF, SVM, and AdaBoost.NB is based upon Bayes theorem [43].According to the Bayes theorem, P(H|I), or probability that the hypothesis H is true for a sample I, can be computed as in equation 2:

P(H|I) = P(I|H)P(H) ÷ P(I)
( H = Hypeplane  = vector representing a point in the vector space  = vector representing a displacement vector SVM can classify both linear and non-linear data.Finding a straight line separating two classes is impossible with nonlinear data.To get data in linear form, low-dimensional data is transformed into high-dimensional data.In SVM, the kernel function is utilized to carry out this task.The extreme points chosen by SVM that assist in creating the hyperplane are known as support vectors.Although SVM has a good overall performance, some nontrivial parameters impact the performance of the SVM model, such as kernel and regularization parameters [44].RF makes predictions by combining the results of more than one decision tree.Gini index shown is used to decide how branching will be done in different nodes of the decision tree: (5)

class j in the dataset
The final output is produced by combining the prediction of each tree using majority voting [45].
AdaBoost employs the boosting concept, which improves weak classifiers' performance.In this approach, the classifier is initially trained using the original dataset.The classifier is then trained many times, with each iteration aiming to correct the mistakes caused by the iteration before it [46].

System Model
In PCA-DNN, principal components are extracted from the original dataset using the following steps: -Consider the d+1 dimensional dataset and ignore the labels achieving d dimensional dataset.
-Calculate the covariance matrix of the complete dataset as: ) + �B j − B� where I is the identity matrix -The eigenvectors are sorted based on eigenvalues, and K eigenvectors having the highest eigenvalues are selected.By using these selected eigenvectors, dxK dimensional matrix M1 is formulated.
-The transpose, M2, of matrix M1 is calculated: In the output layer, the sigmoid activation function is used, which is defined as:

Experimental Setup
The authors performed experiments on the system having an i3 processor seventh generation and 8 GB RAM.The programming language used was Python 3.1, and the coding environment used was Jupyter Notebook.

Experimental Parameters
The following parameters were used to assess the performance: Accuracy: It indicates the percentage of correct predictions.number of normal people who are mistakenly classified as cancer patients, and false negatives (FN) are the number of cases where a cancer patient is mistaken for a normal person [47].

Results
In this study, three experiments were performed.Firstly, machine learning algorithms were used to classify breast cancer.Secondly, a DNN was used for classification.Thirdly, the classification was done by using the proposed PCA-DNN.The proposed PCA-DNN was also trained with 70% of the data and tested with 30% of the data.The training was done in a hundred epochs with batch size thirty.The obtained change in accuracy and loss of the proposed PCA-DNN with the increasing number of epochs is shown in Figures 6 and  7.
The performance of the PCA-DNN on the training and testing data is given in        1. Loss of Interpretability: When PCA and DNN are combined, there may be a loss of interpretability, making it difficult to comprehend why the model made certain predictions.2. Data preprocessing Overhead: The PCA procedure necessitates rigorous preprocessing, and DNN requires significant tuning, increasing computational overhead and complexity.
PCA-DNN has also improved performance over systems proposed by existing researchers.It can be concluded that doctors can utilize the PCA-DNN to detect breast cancer efficiently.

Conclusion & Future Work
In

1 .
A literature review of the various classification schemes for breast cancer is conducted to identify the research gap. 2. PCA-DNN is proposed in this research as a strategy for diagnosing breast cancer by merging DNN and PCA. 3. The performance of PCA-DNN is compared to conventional DNN and machine learning classifiers in terms of accuracy, specificity, sensitivity, precision, and F-measure.4. The performance of PCA-DNN is compared to existing systems, using fuzzy logic and other supervised and semi-supervised techniques in recent literature.The remaining sections of the paper are as follows: Section 2 describes the work done by researchers.Materials and methods follow it in section 3. The results and discussions are in the next section.The conclusion and future scope are given in Section 5.
) = Predictor Prior Probability If and only if the likelihood of having class n conditioned on I is greater than the likelihood of other classes, an Input I is classified to a class Cn as:

P
(C n |I) = Probability of input I belong to Class C n P(C m |I) = Probability of input I belong to Class C m SVM classifies by creating a hyperplane with all samples from one class on one side and samples from another class on the other; H: x + b = 0

9 )
By following the above steps, eight components were extracted from the dataset, which were applied to DNN input.PCA-DNN has three layers: one input layer, one output layer, and one hidden layer.There are eight nodes in the input layer, fifty in the hidden layer, and one in the output layer.The training of the PCA-DNN was done in a hundred epochs having batch size thirty.An optimum number of layers and nodes were found by performing different experiments.A relu activation function is used to introduce the non linearity: ) = Relu activation function on input a

C
number or matrix of the real number f(a) = Sigmoid activation function on input a The binary cross-entropy function is used to calculate loss because of the binary classification nature of the problem: () = � ()() ) =   −    =  () =      () =      Loss is computed as:  )) + (1 −   ) * log (1 − p(  )) (13)  =   C  = Class label of input i �   * log (p(  =1 C  )   * log�p(  )� = Log probability of class label of input i 3.3.2Architecture and Working The flowchart of PCA-DNN is given in Figure 2. The methodology of PCA-DNN is shown in Figure 3.In addition to PCA-DNN, conventional DNN is also studied with the same number of layers as PCA-DNN.The conventional DNN was also trained in a hundred epochs with batch size thirty.Traditional machine learning classifiers, mainly NB, SVM, RF, and AdaBoost, were also used for classification.The performance of PCA-DNN was compared to the conventional DNN and the traditional machine learning classifiers.Breast cancer was classified as malignant or benign 2. The Pseudocode of PCA-DNN is given in Algorithm 1. the dataset with the original set of features F= Original set of Features // Apply PCA on the features F //obtaining principal components pc=PCA(F) no of epochs=100 Construct a neural network with one input layer, one hidden layer, and one output layer.Divide the dataset into training and testing data i=1 //Train the neural network with 100 epochs While ( i <= no of epochs) { Train the neural network with training data using pc.i=i + 1 } Test the neural network with testing data.EAI Endorsed Transactions on Pervasive Health and Technology 2023 | Volume 9

Figure 3 .
Figure 3. Methodology of the proposed PCA-DNN method for Breast Cancer Classification =    =   Sensitivity: It indicates the percentage of correct positive predictions.It indicates the percentage of correct negative predictions.It calculates the harmonic mean of sensitivity and precision.F − Measure = 2 * Sensitivity * Precision Sensitivity + Precision (18) True positives (TP) denote the number of times the system has correctly identified cancer.On the other hand, true negatives (TN) denote scenarios where a person without cancer is accurately classified.False positives (FP) are the EAI Endorsed Transactions on Pervasive Health and Technology 2023 | Volume 9

4. 4
Performance ComparisonThe proposed PCA-DNN performed better than those obtained by machine learning algorithms and conventional DNN.The performance comparison among the Proposed PCA-DNN, machine learning classifiers, and DNN are shown in Figure8.The confusion matrix of the machine learning algorithms under comparison is shown in Figure9.The confusion matrix of the DNN and PCA-DNN is shown in Figure10.

Figure 11 .
Figure 11.ROC Curve of the machine learning and deep learning methods under study

Figure 12 . 2 .
Figure 12.Comparison of proposed PCA-DNN with existing systems this research, PCA-DNN is proposed for the classification of breast cancer.The idea of PCA-DNN was put forth by integrating the PCA concept with traditional DNN.It allowed the utilization of explicit feature extraction with DNN.In addition, breast cancer was also classified using conventional DNN and machine learning classifiers.NB obtained an accuracy of 93.49%, SVM of 88.93%, RF of 95.78%, and Adaboost of 96.30%.The DNN obtained an accuracy of 95.32%.The proposed PCA-DNN obtained the highest accuracy, which was 98.83%.The proposed solution achieved reliable results on both training and testing data.Results from PCA-DNN outperformed those from traditional DNN and standard machine learning classifiers.It generated results that were 3.68% more accurate and 29.37% less loss than conventional DNN.The PCA-DNN model can be used as a reliable tool for breast cancer diagnosis.This has important practical implications, such as increased diagnostic accuracy resulting in prompt interventions.The robust performance of the model is due to the explicit feature extraction capabilities made possible by combining PCA with DNN.PCA-DNN might be further improved by applying the concept of regularization to DNN.The model's generalizability can be ensured by validating it across multiple datasets.Incorporating imaging data, such as mammograms, can further enrich the feature extraction process and possibly improve the model's performance in early cancer detection.Future research might be focused on turning the proposed methodology into a potentially useful tool for clinicians seeking a second opinion on a breast cancer diagnosis.More optimization techniques can be used to improve the system's performance.

Table 1 .
Standard machine learning algorithms, including NB, RF, SVM, and AdaBoost, were used for classification.Ten-fold cross-validation is used to perform validation and evaluate the performance of classifiers.The performance achieved by the algorithms under comparison to classify breast cancer is shown in Table1.Adaboost obtained the best accuracy of 96.30%, and NB the worst accuracy of 93.49%.After classifying with machine learning classifiers, conventional DNN was used to perform the classification.The hold-out validation procedure assesses the performance of DNN and PCA-DNN.The dataset is split into 70% training and 30% testing data.The model was trained in a hundred epochs with batch size thirty.Accuracy and loss of the DNN on train and test data were measured.The obtained change in accuracy and loss of the DNN with the increasing number of epochs are shown in Figures4 and 5.The performance of the DNN on training and testing data is given in Table2.The DNN achieved 94.97% accuracy on training data and 95.32% on testing data.It achieved a 17.63% loss in the training data and a 14.67% loss in the testing data.Performance achieved by the machine learning algorithms under study in classifying breast cancer.

Table 2 .
Performance of the DNN on the Training and Testing Data

Table 3 .
Performance of the proposed PCA-DNN on the Training and Testing Data

Table 4 .
Comparison among the proposed PCA-DNN and existing related works