ODET: Optimized Deep ELM-based Transfer Learning for Breast Cancer Explainable Detection

INTRODUCTION: Breast cancer is one of the most common malignant tumors in women, and the incidence rate is increasing year by year. Women in every country in the world may develop breast cancer at any age after puberty. The cause of breast cancer is not fully understood. At present, the main methods of breast cancer detection are inefficient. Researchers are trying to use computer technology to detect breast cancer. But there are some still limitations. METHODS: We propose a network (ODET) to detect breast cancer based on ultrasound images. In this paper, we use ResNet50 as the backbone model. We make some modifications to the backbone model by deep ELM-based transfer learning. After these modifications, the network is named DET. However, DET still has some shortcomings because the parameters in DET are randomly assigned and will not change in the experiment. In this case, we select BA to optimize DET. The optimized DET is named ODET. RESULTS: The proposed ODET gets the F1-score (F1), precision (PRE), specificity (SPE), sensitivity (SEN), and accuracy (ACC) are 93.16%±1.12%, 93.28%±1.36%, 98.63%±0.31%, 93.96%±1.85%, and 97.84%±0.37%, respectively. CONCLUSION: It proves that the proposed ODET is an effective method for breast cancer detection.


Introduction
Breast cancer is one of the most common malignant tumors in women, and the incidence rate is increasing year by year. In Western Europe and North America, the incidence rate of breast cancer ranks first in female cancer. Women in every country in the world may develop breast cancer at any age after puberty. The cause of breast cancer is not fully understood. According to the recent research, the following factors may be related to the incidence of breast cancer: 1. breast cancer patients in the family; 2. early menarche or late cessation; 3. those who are old and unmarried or have no children after marriage; 4. no lactation after childbirth; 5. severe atypical hyperplasia of the breast; 6. abnormal ovarian function; 7. excessive exposure to ionizing radiation.
The main methods of breast cancer detection are physical examination, blood tests, and imaging tests. These methods for breast cancer detection are inefficient. In the detection process, doctors are easily disturbed by other factors, such as doctors' lack of sleep, doctors' lack of professional knowledge, and so on.
With the rapid progress of artificial intelligence technology, researchers are trying to use computer technology to detect breast cancer. Gao, et al. [1] proposed a new model (SD-CNN) for the diagnosis of breast cancer. In this proposed model, features were extracted through the convolutional neural network (CNN). The proposed model obtained 90% accuracy and AUC=0.92. Wang, et al. [2] explored a method that was based on CNN. In their Ziquan Zhu, Shui-Hua Wang 2 experiment, CNN was the feature extraction, and ELM was used as the classifier. The CNN-GTD got 86.50% accuracy, 85.10% sensitivity, 88.02% specificity, 0.923 AUC. Rouhi, et al. [3] presented two models for the classification of breast tumors. The first model used a pre-trained ANN for the segmentation. In the second method, the segmentation was done by CNN. They got 96.87% sensitivity, 95.94% specificity, and 96.47% accuracy. Zuluaga-Gomez, et al. [4] explored a method for breast cancer diagnosis based on thermal images. The hyper-parameters of a CNN model were fine-tuned by a tree estimator. This method obtained 0.92 accuracy and 0.92 F1-score. Mahmood, et al. [5] presented a model to detect breast cancer. The presented model was based on the Faster R-CNN and deep CNNs and got 0.841 recall, 0.858 F1-measure, and 0.876 precision for the ICPR 2012 dataset, 0.583 recall, 0.691 F1-measure, and 0.848 precision for the ICPR 2014 dataset. Gupta and Chawla [6] combined CNN with two different machine learning methods to detect brain cancer. The two different machine learning methods were SVM and LR. In conclusion, CNN+LR got better performance than CNN+SVM.
Many researchers have used CNN to detect breast cancer from the above studies. Nevertheless, these methods have some disadvantages. In CNN, a small data set may lead to overfitting problems. However, it is difficult to have a large data set for medical image detection. This paper proposes the optimized deep ELM-based transfer learning network (ODET) to detect breast cancer based on ultrasound images. The main innovations are concluded as below: • We propose an BA-guided optimized DET (ODET) model. • We propose an BA-guided optimized DET (ODET) model. • We validate ResNet50 is the best backbone model. • This explainable ODET network is superior to six stateof-the-art models.
The rest sections of this paper are as follows. Section 2 is about the material. The methodology is presented in Section 3. Experiment results are shown in Section 4. The conclusion is given in Section 5.

Materials
We use the breast ultrasound images data set for this paper. This data set is public and available on the Kaggle website (Breast Ultrasound Images Dataset | Kaggle). This data set was completed in 2018 and mainly contained ultrasound images of breasts of women aged 25 to 75. This data set consisted of 780 images, and the pixel size was 500×500, which was saved in PNG. This data set was divided into normal, benign, and malignant categories. However, the data set is divided into two categories in this study. Benign and malignant images are unhealthy, and normal images are healthy. Some images of this public data set are given in Figure 1.     Table 1 and Table 2. Extracting effective features is the great significance for image classification [7]. Many researchers use computer technology to extract effective features from images [8]. With the continuous improvement of computer technology in image classification, many excellent computer models have been put forward [9], such as AlexNet [10], GoogleNet [11], VGG [12], DenseNet [13], MobileNet [14], and so on.
In this paper, the pre-trained model (ResNet50) [15] is selected as the backbone of our network. We make some modifications to the backbone model by deep ELM-based transfer learning. Firstly, we modify the pre-trained ResNet50 because there are only two output nodes in our experiment. Then, the last five layers of modified ResNet50 are replaced by ELM [16], which is the single hidden layer feedforward network (SLFN). After these modifications, the network is named DET. However, DET still has some shortcomings because the parameters in DET are randomly assigned and will not change in the experiment. In this case, we select the bat algorithm (BA) [17] to optimize DET. The optimized DET is named ODET. The pseudocode of ODET is demonstrated in Table 3. The architecture of our proposed ODET is presented in Figure 2.   Figure 2. The architecture of our proposed ODET Table 3. The pseudocode of ODET Step 1: The pre-trained ResNet50 is selected as the backbone of our model. Step 2: The data set is divided into five sets and set i=1.
Step 2.1: The i-th set is the test set and the rest sets as the training set.
Step 3: Some modifications are made to the backbone model.
Step 3.3: The last five layers of the modified ResNet50 are substituted by ELM.
Step 3.4 The network is named DET after these modifications: Step 4: The BA-guided Optimization process in the ODET Step 4.1: The BA is selected to optimize the DET.
Step 4.2: The optimized DET is named ODET.
Step 4.3: The ODET is trained on the training set.
Step 4.3.1: The training set is input.
Step 4.3.2: The target is the labels of the training set.
Step 4.4: The trained ODET is tested on the test set.
Step 4.5: The test classification performance of the ODET is reported.
Step 6: Average the test classification performance.

Comparison of Backbone Models
The more layers in the CNN model, the more features should be extracted [18]. However, if the depth of the CNN model is continuously increased, gradient dispersion or gradient explosion could occur [19]. Batch normalization (BN) is a solution to deal with these problems [20]. However, another problem is degradation, which is that with the increasing number of layers in the CNN model, the accuracy is declining. He, et al. [15] proposed residual learning to deal with this problem. The framework of residual learning is given in Figure 3.

Weight Layer
Weight Layer The input is represented as x, the original target is denoted as , and the is obtained through the residual function: . (1) The original target: . (1) There are some shortcut connections in the residual learning network [21]. These connections skip some layers and pass the original data directly to subsequent layers [22]. When the extreme case is encountered that the new layer does not learn any data, the new layer in the residual learning network can directly copy the original data. Therefore, the residual learning network can cope with the degradation problem [23]. In the experiment, we will compare ResNet50 with AlexNet, DenseNet, MobileNet, ResNet18, and VGG.

Proposed Deep ELM-based Transfer Learning
We make some modifications to the backbone model by deep ELM-based transfer learning [24], as shown on the left side of Figure 4. Firstly, we modify the pre-trained ResNet50 because there are only two output nodes in our experiment. There are only two categories in this paper. Therefore, the FC1000, softmax, and classification layer of the pre-trained backbone model are substituted by six layers, which are FC128, ReLU, BN, FC2, softmax, and classification layer. We replace the last five layers of the modified ResNet50 with ELM. After these modifications, the network is called DET. The structure of ELM is presented on the right side of Figure 4. There are only three layers in the ELM: input, hidden, and output layers. The orange box is the input, the blue circle represents the hidden nodes in the hidden layer, and the yellow box presents the output.
There are only three training steps in ELM. Given a data set with N samples: .
(2) . ( Where the input dimension and the output dimension are represented by and , respectively. The output matrix of the hidden layer is computed as: . (4) Where is the number of hidden nodes in the hidden layer, represents the sigmoid function, is the weight vector, which connects the input nodes with the j-th hidden node, and the bias of the j-th hidden node is presented by .
Another expression of this formula is: . (6) Then the final output weight is calculated as: .
Where the final output weight is given as , is the pseudo-inverse matrix of the output matrix of the hidden layer ( ), and the label of the data set is represented as . The training of ELM is completed through three calculation steps. In recent years, the ELM has attracted more and more researchers' attention and has been widely used in various machine learning tasks, such as chemistry [25], geography [26], and so on.

Proposed BA-guided Optimization
Although the proposed DET can reduce the computation and avoid overfitting problems, some parameters in the DET are set randomly and will not change, which also leads to some problems [27]. Poor weight and bias will lead to the decline of model performance [28]. For better classification performance, BA is selected for the optimization of DET. The optimized DET is named ODET. The BA was proposed based on the behaviour of bats. Bats sense distance through echolocation and distinguish the difference between food/prey and background barriers in some magical way. BA finds the best location through specific strategies based on bat behaviour. The parameters are continuously updated based on the optimal solution in each iteration. The equations of BA are as follows: .
Where is the output of ELM.
The v-th bat gets the velocity and position when the time is . The updated velocity and position are calculated as: .
Where the frequency of the v-th bat is , and are the maximum frequency and minimum frequency, respectively, is a random variable, is the current best location.
When a position is selected from the current optimal position, a new alternative position will be generated by the random walk: .
Where is a random variable, is the average loudness of the bat population when time is .
When the v-th bat locks the prey, the loudness and emission rate will be updated: .
To better understand the optimization process, the pseudocode of BA-guided Optimization is given in Table  4. Table 4. The pseudocode of the BA-guided Optimization Step 1: Initialize the position , the velocity , the loudness , and the emission rate .
Step 2: While ( < the maximum number of iterations).
Step 2.2: Update the position and velocity.
Step 2.3.1: Choose a position from the current optimal positions.
Step 2.3.2: Generate a new alternative position by the random walk.
Step 2.4: Generate a new position by flying randomly.
Step 2.5.1: Store the new position.
Step 2.5.2: Update the loudness and emission rate. Step 2.6: Find the best .

Evaluation
We define unhealthy breasts as the positive class and healthy breasts as the negative class. Five widely used measurements are calculated to evaluate the proposed ODET, which are F1-score (F1), precision (PRE), specificity (SPE), sensitivity (SEN), and accuracy (ACC). Their equations are given below: . (13) where FP, FN, TP, and TN are false positive, false negative, true positive, and true negative, respectively.

The classification performance of ODET
We use five-fold cross-validation to evaluate the proposed ODET. The classification performance is given in Table  5. The comparison figure of DET and ODET is given in Figure 5. From the figure, we can see that the classification performance is significantly improved after the BA-guided Optimization. The accuracy of ODET is higher than 97%. Meanwhile, other measurements are higher than 93%. Therefore, these results prove that the proposed ODET is an effective model to detect breast cancer.

Comparison of different backbone models
Six CNN models are tested in this paper, which are ResNet50, AlexNet, DenseNet, MobileNet, ResNet18, and VGG. The classification performances of these backbone models are presented in Table 6. The comparison figure of different backbone models is shown in Figure 6. ResNet50 is the backbone model that achieves the best results than other backbones, except SPE and PRE. Nevertheless, these two measurements are only 1% less than the highest.
ResNet50 is a backbone model that can achieve such good results because residual learning can solve the problem of degradation. Compared with ResNet18, ResNet50 has more layers, which may extract more features. Therefore, it will achieve better results. AlexNet and VGG have too many parameters, which requires more epochs for aggregation. However, in this paper, the maxepoch is set to 1 based on the dataset size. MobileNet has fewer layers and cannot extract features well compared with Resnet50. DenseNet has no residual learning so it may encounter degradation problems.   Table 7 and Table 8, respectively. The comparison figure is given in Figure 7. Based on the comparison results, we can see that the CNN models modified by deep ELM-based transfer learning can generally obtain better classification performance than the CNN models.   Table 6 and Table 8, respectively. For a more intuitive view, the comparison figure is presented in Figure 8. After the BAguided Optimization, the classification performance of six different models is greatly improved. We can conclude that BA-guided Optimization can have a good influence on breast cancer detection.  In this paper, we use Gradient-weighted class activation mapping (Grad-CAM) [29] to explain the proposed ODET. The Grad-CAM has been applied for many fields, such as [30][31][32]. The raw images and Grad-CAM images are shown in Figure 9. In the Grad-CAM image, the greatest attention is represented by the red region. On the contrary, the blue region is the lowest attention. It can be discovered that breast cancers are within the red regions in the ODET. In conclusion, the ODET can capture breast cancers based on ultrasound images.
The comparison figure is given in Figure 10. In breast cancer detection, the method proposed in this paper is better than other state-of-the-art methods, which proves that our method is an effective method for breast cancer detection. The CNN model can capture high-level features. At the same time, we select the ELM as the classifier to prevent overfitting problems on small datasets.

Conclusion
We propose a network (ODET) to detect breast cancer based on ultrasound images. In this paper, we use ResNet50 as the backbone model. We make some modifications to the backbone model by deep ELM-based transfer learning. After these modifications, the network is named DET. However, DET still has some shortcomings ODET EAI Endorsed Transactions on Scalable Information Systems 01 2023 -01 2023 | Volume 10 | Issue 2 | e4 because the parameters in DET are randomly assigned and will not change in the experiment. In this case, we select BA to optimize DET. The optimized DET is named ODET. The proposed ODET gets the F1-score (F1), precision (PRE), specificity (SPE), sensitivity (SEN), and accuracy (ACC) are 93.16%±1.12%, 93.28%±1.36%, 98.63%±0.31%, 93.96%±1.85%, and 97.84%±0.37%, respectively. The proposed ODET overperforms other state-of-the-art methods. It proves that our method is an effective method for breast cancer detection.
The proposed model yields great classification performance. But there are still some limitations to this paper. This paper has only the training and test sets and no validation set. In this paper, the public data set is classified into two categories, but this data set has three categories: normal, benign, and malignant. We should test different types of SLFN to compare the classification performance.
In the future, we will conduct experiments in more public breast cancer data sets to test the effectiveness of the proposed method. We will try to apply other methods [33,34] to segment and classify breast cancer images, such as transformer, VIT, etc. Recently, many methods have proven their effectiveness, such as UNet, transformer, and so on. In the future, we will try these technologies in our study.

Data Availability Statement.
Publicly available datasets were analysed in this study. This data can be found here: Breast Ultrasound Images Dataset | Kaggle.

Conflicts of Interest.
The authors declare no conflict of interest.