WE-BA: Covid-19 detection by Wavelet Entropy and Bat Algorithm

INTRODUCTION: Covid-19 is a kind of fast-spreading pneumonia and has dramatically impacted human life and the economy. OBJECTIVES: As early diagnosis is the most effective method to treat patients and block virus transmission, an accurate, automatic, and effective diagnosis method is needed. METHODS: Our research proposes a machine learning model (WE-BA) using wavelet entropy for feature extraction to reduce the excessive features, one-layer FNNs for classification, 10-fold cross-validation (CV) to reuse the data for the relatively small dataset, and bat algorithm (BA) as a training algorithm. RESULTS: The experiment eventually achieved excellent performance with an average sensitivity of 75.27% ± 3.25%, an average specificity of 75.88% ± 1.89%, an average precision of 75.75% ± 1.06%, an average accuracy of 75.57% ± 1.21%, an average F1 score of 75.47% ± 1.64%, an average Matthews correlation coefficient of 51.20% ± 2.42%, and an average Fowlkes–Mallows index of 75.49% ± 1.64%. CONCLUSION: The experiments showed that the proposed WE-BA method yielded superior performance to the state-of-the-art methods. The results also proved the potential of the proposed method for the CT image classification task of Covid-19 on a small dataset.


Introduction
Covid-19 is a kind of pneumonia caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).The World Health Organisation (WHO) defined the COVID-19 outbreak as a public health emergency of international concern (PHEIC) in January 2020.Covid-19 has dramatically impacted human life and caused significant damage to the economy.Over 100 million people are facing a return to extreme poverty because of Covid-19 [1,2].
COVID-19 can cause cough, fever, taste loss, smell, dry cough, and other symptoms [3] and may develop into severe disease and lead to rapid death.Some infected people are asymptomatic but can also be a source of infection.At the same time, early diagnosis is the most effective method to treat patients and block virus transmission [4], so it is necessary to have an accurate and effective diagnosis method.The two most widely used detection methods are Polymerase Chain Reaction (PCR) and CT screening.However, the PCR test is timeconsuming and has a high ratio of false negatives.The CT scanning method has higher sensitivity, which can assess the level of the disease [5].However, manual diagnosing CT scanning needs a lot of human resources, and subjective factors may affect the results.Therefore, it is essential to develop an accurate computer-aided diagnosis system.
Machine learning technology has been a popular field in recent years.It has made significant progress and has been widely used in many areas, such as computer vision [6][7][8][9] and real-time systems [10][11][12][13].At present, many researchers have used machine learning methods for the diagnosis of CT images, some of which perform well and surpass human radiologists in many indicators.Ahuja [17] proposed a method combing wavelet entropy and the self-adaptive Jaya algorithm and getting 85.47% sensitivity, 87.23% precision, and 86.23% accuracy.However, the performance of these models is not good enough to assist the clinical diagnosis.Because of the high fake positive rate, applying these models in the diagnosis would cause an unnecessary bad impact on the patient's quality of life.This paper's contributions are: 1.We discovered a wavelet entropy and Bat Algorithm combined method (WE-BA) for COVID-19 diagnosing; 2. We explored the possibility of training a model to classify COVID-19 on small datasets; 3. We further demonstrated the potential of combining wavelet entropy and optimization algorithms in COVID-19 diagnosis.
In this paper, the second part introduces the information about the dataset.The third part introduces the methods used in this paper.The fourth part presents the experiment and discusses the results.The fifth part is the conclusion.

Dataset
The database comprises CT images taken by the Fourth People's Hospital of Huai'an, China.The database contains 132 subjects, aged from 24 to 91, of whom 77 are men and 55 are women.The database includes two parts: the observation group and the control group.Each group contains 148 selected CT slices.The observation group included 66 COVID-19-infected subjects, including 39 men and 27 women, with an average age of 48.In the control group, we randomly selected 66 subjects from 159 healthy people, including 38 men and 28 women, with an average age of 38.44.In Figure 1, we show a set of photos in the database.

Wavelet Entropy
The Fourier transform is a standard signal processing method that converts the signal from the time domain to the frequency domain.The Fourier transform has been widely used in many fields, and the formula of the transform is shown in Formula (1).
where  refers to the time,  refers to the frequency,  refers to the function in the frequency domain, and  refers to the function in the time domain.
Although the Fourier transform can obtain the spatial frequency information of the image, it also has disadvantages.In the actual signal in nature, it is often not stationary.For non-stationary signals, the performance of the Fourier transform is not very good.The Fourier transform can only determine which frequencies are contained in the signal but not when a specific frequency component exists.It will lose time characteristics, making it possible for two non-stationary signals that are different in the time domain to appear the same in the frequency domain.Therefore, the Fourier transform cannot solve the medical image signal processing problem.
The short-time Fourier transform (STFT) improves the problem of poor time resolution.It decomposes the whole time-domain signal into an infinite number of short-time signals.This process of signal decomposition is called adding windows.Then, it is assumed that the signal is smooth in the short-term interval, and each smooth part will respectively use the Fourier transform.In this way, we can get the frequency and time information simultaneously.On this basis, we can further do time-frequency analysis and get a spectrogram of the signal.However, STFT also has its shortcomings.The window's width is fixed and will not change in the transformation process.According to the uncertainty principle, as shown in Formula (2), there is no way to determine the frequency at a certain moment, but we can know the frequency at a certain period.
where ∆ refers to the interval of the time, and ∆ refers to the frequency at the time interval.Therefore, a too wide window will lead to low time resolution, and a too-narrow window will lead to low frequency resolution.Therefore, there is no way to have good resolution for high-frequency and low-frequency signals simultaneously.To achieve good resolution of both high-frequency and low-frequency signals, the window's width needs to be variable.
Wavelet transform uses finite decaying wavelet to replace the infinitely long trigonometric basis of the Fourier transform, so it can get the time of a certain frequency when obtaining the frequency.The formula of wavelet transform is shown in Formula (3).

𝑊𝑊(𝑎𝑎, 𝜏𝜏)
where  refers to the scale,  refers to the translation,  refers to the time,  refers to the parent wavelet function, and  refers to the function of wavelet transformation.Wavelet transform realizes scaling and translation (changing frequency and time) by changing  and  [18].The large  value applies to low frequency, and small  value is applicable to high frequency.In wavelet transform, there are multiple resolutions according to different frequencies.Therefore, the wavelet transform has a good resolution in both high and low frequencies, which solves the problem of unsatisfactory non-stationary signal processing in the Fourier transform.
Although wavelet transform contains multiple resolutions and retains rich information, it contains a lot of redundant information, which requires a lot of time and space [19].We can use entropy to reduce the dimension of features.Entropy represents the uncertainty of image texture features-the more uniform the image distribution, the lower the entropy [20].Wavelet entropy is a new tool for analyzing the instantaneous characteristics of nonstationary signals.Wavelet entropy combines wavelet decomposition and entropy.The original Shannon entropy can be used as a standard to compare and quantify the energy distribution in wavelet transform.The definition of Shannon entropy is shown in Formula (4).

𝑆𝑆(𝑔𝑔
where  ⃗ refers to the grey levels,  refers to the probability of grey levels, and  refers to the energy distribution in wavelet subbands according to the probabilities of grey levels occurring.

Feedforward Neural Network
Feedforward neural network (FNN) is a classical deep learning model that has been mature in application and development.Structurally, FNN consists of an input layer, an output layer, and one or several hidden layers between the two layers.
In this paper, a one-hidden-layer FNN is adopted, and its structural diagram is shown in Figure 2. FNN has a strong expression ability.FNN can approximate all functions at any degree with only one hidden layer according to the universal approximation theory [21].The extracted feature of the image will be sent to the input layer, so the number of neurons in the input layer is equal to the number of features.
The grid-searching method can determine the number of neurons in the hidden layer.The number of neurons in the output layer equals the number of classifications.The classification result can be obtained using the argmax function in the output layers.We use FNN because the shallow architecture can effectively prevent the overfitting problem when training with a small dataset.The training method is backpropagation.After initializing the weights, the cost function will be calculated, and then the cost function will be backpropagated to update the weights.The cost function will gradually become smaller and get our final model by repeating this process.This article uses the mean squared error (MSE) as the cost function.

Bat Algorithm
As the weights and biases of the model may not be optimal, we use the Bat Algorithm to optimize them.It is a metaheuristic algorithm based on the echolocation behavior of bats proposed by Yang, X.-S.(2010) [22].The BA performs better than genetic algorithms [23] and particle swarm optimization (PSO) [24].For simplicity, it sets several idealized rules: 1.All bats sense distance by echolocation, and they can distinguish food from the barriers.

Bats randomly fly at location 𝑥𝑥 𝚤𝚤
���⃗ with a speed of   ���⃗, and search for prey using a fixed frequency   , wavelength , and loudness  0 .They can adapt the wavelength of their emitted pulses and adjust the rate of pulse emission  while approaching the target.3. We assume that the loudness varies from the maximum value  0 to the minimum   .4. The range of frequency is from   to   .
In simulations, each bat's frequency is randomly assigned by where  ∈ [0,1] is a random variable,   refers to the pulse frequency of a specific bat.We update the solutions and velocities at time step  as where    ���⃗ refers to the speed of a specific bat at time step ,    ���⃗ refers to the solution of a specific bat at time step  and  * ���⃗ is the current global best solution.For the local search, once a solution is chosen from the current best solutions, a new solution for each bat is generated locally using the random walk.
where   �������⃗ is selected among the best solutions, and  ∈ [-1,1] is a random variable.If the new solution is accepted, the   will reduce and   will increase according to where ,  are constants, 0<  <1, 0 < .  0 is the initial emission rate and   =<   > is the mean loudness of all the bats at this time step.When  → ∞, we have    → 0,    →   0 .The flow chart shown in Figure 3 illustrates the basic steps of BA.

K-fold Cross-Validation
The dataset in machine learning is usually divided into a training set used for training the model and a test set used for testing the model's performance.Machine learning is a data-driven science.The size of the dataset can greatly affect the performance of the model.Dividing the dataset into a training set and a test set can greatly reduce the data available for training the model.
Cross-Validation reuses the data and increases the data used for training, which can further improve the model's performance.CV can also effectively avoid overfitting.The cross-validation uses each sample data as both training data and test data, and the overlearning and underlearning can be avoided, so the results obtained are more convincing than just using hold-out validation.K-fold CV is a widely used CV method.It divides the dataset into pre-specified K groups and takes one group of data as the test set without repeating and the other data as the training set to train the model.The training will repeat K times and ensure each data group is test set once.After calculating the performance of each model, we can get the final performance.In this paper, we use the most commonly used 10-fold CV.

WE Results
Figure 4 shows the wavelet decomposition of a sample image at four levels.In wavelet transform, the lowfrequency signal will be further decomposed, and the highfrequency part of signal details will be reserved so the wavelet transform can characterize the groups with lowfrequency information as the main components of the signal.Figure 4(a) is the first level wavelet decomposition image, which contains both high-frequency sub-bands (lower right corner, lower left corner, and upper right corner) and low-frequency (upper left corner) sub-band of the input image.Figure 4(b) is the second level wavelet decomposition, which reserved the high-frequency subbands and did the same transformation to the lowfrequency sub-band of the first level result.Figure 4(c) and  (d) are the third and fourth-level wavelet decomposition obtained by a similar recursive operation.The sample image is painted with pseudo-color for a clear view and is represented in greyscale.
After 4-level wavelet decomposition, we get thirteen sub-bands.We can calculate the entropy of each sub-band to form the feature vector according to Formula(4).Thus, we extract the feature and reduce the feature number to only thirteen.Reducing feature numbers can efficiently solve the overfitting problem caused by a small training set.

Statistical Results
The experiment used the wavelet entropy to extract the feature, the one-hidden-layer feedforward neural network as the classifier, the 10-fold cross-validation to validate the model's performance and the bat algorithm as the training algorithm.The experiment eventually achieved great performance (shown in Table 1) with an average sensitivity of 75.27% ± 3.25%, an average specificity of 75.88% ± 1.89%, an average precision of 75.75% ± 1.06%, an average accuracy of 75.57% ± 1.21%, an average F1 score of 75.47% ± 1.64%, an average Matthews correlation coefficient of 51.20% ± 2.42%, and an average Fowlkes-Mallows index of 75.49% ± 1.64%.

Comparison to State-of-the-art Approaches
Compared to other state-of-the-art models classifying COVID-19 by CT images, WE-BA greatly improves in many aspects.The numerical comparison between WE-BA and other approaches is shown in Table 2.
Figure 6 shows the overview of performance comparison to other algorithms.Except for specificity and precision, our model achieved the best performance in all the other metrics.The results also proved the potential of the proposed method for the CT image classification task of Covid-19 on a small dataset.

Conclusions
To develop an accurate, efficient, and automatic COVID-19 classifier of CT chest slices, we used wavelet entropy for feature extraction to reduce the excessive features.We introduced the 10-fold cross-validation to reuse the data for the relatively small dataset.Afterwards, we proposed to use the one-hidden-layer FNN classifier and BA as the training algorithm.The experiments showed that the proposed WE-BA method yielded superior performance to the state-ofthe-art methods.
The results also proved the potential of the proposed method for the CT image classification task of Covid-19 on a small dataset.At the same time, the specificity and precision of our model are lower than that of the SOTA algorithms.We believe that using more optimization and improvement methods for the wavelet entropy feature can achieve better performance in the future.
, S. et al. (2021) [14] proposed a method based on deep learning.It used the pre-trained architectures ResNet18, with 30% data as a test set and the rest as a validation set, achieving 92.21% sensitivity and 98.50% specificity.Yildiz, A. et al. (2009) [15] proposed a new feature extraction method that combined discrete wavelet transform (DWT) with Shannon entropy.The obtained signal will be decomposed to its sub-bands by DWT, and then calculate the Shannon entropy of each sub-bands.Utilizing wavelet entropy can reduce the dimension of the feature, which can help the model get better generalization performance.Yao, X. et al. (2021) [16] proposed a model for COVID-19 CT image classification by using a wavelet entropy (WE) and biogeography-based optimization (BBO) based method (WE-BBO).They found that the combination of optimisation algorithm and Wavelet Entropy can improve performance.Based on the WE-BBO, Wang, W. et al. (2022)

Figure 1 .
Figure 1.Samples in the database

Figure 3 .
Figure 3. Flow chart of bat algorithm, where ( ) is the objective function, and rand( ) is a function generating random variable∈ [0,1].

Figure 4 .
Figure 4. Wavelet decomposition of a sample image

Figure 5 Volume 9 Figure 5 .
Figure 5 shows the ROC curve (Receiver Operating Characteristic curve) and the AUC (Area Under Curve) value of the WE-BA model.Each point on the ROC curve corresponds to a classifier with different thresholds.When the threshold reaches the maximum, TPR (True Positive Rate, known as sensitivity) = FPR (False Positive Rate, which equals 1 -specificity) = 0, which corresponds to the origin of the coordinate system.When the threshold is the minimum, TPR = FPR =1, corresponding to (1,1) in the coordinate system.The TPR and FPR will decrease as the threshold increases.The AUC value is the size of the area under the ROC curve.The AUC value of WE-BA is 0.8361.

Table 2 .
Performance comparison to other algorithms.