Covid-19 Diagnosis by Wavelet Entropy and Extreme Learning Machine

In recent years, COVID-19 has spread rapidly among humans. Chest CT is an effective means of diagnosing COVID-19. However, the diagnosis of CT images still depends on the doctor's visual judgment and medical experience. This takes a certain amount of time and may lead to misjudgment. In this paper, a new algorithm for automatic diagnosis of COVID-19 based on chest CT image data was proposed. The algorithm comprehensively uses WE to extract image features, uses ELM for training, and finally passes k-fold CV validation. After evaluating and detecting performance on 296 chest CT images, our proposed method is superior to state-of-the-art approaches in terms of sensitivity, specificity, precision, accuracy, F1, MCC and FMI.


Introduction
On February 11, 2020, the Director-General of the World Health Organization, Tedros Adhanom Ghebreyesus, announced in Geneva, Switzerland, that the pneumonia caused by the novel coronavirus was named "COVID-19" [1]. On March 11, the World Health Organization considered the current COVID-19 outbreak to be called a global pandemic [2]. As of 20:11 CEST on May 23, there have been 52,278,3196 confirmed cases of novel coronary pneumonia in the world and 6,276,210 deaths.
Chest computed tomography (CT) is one of the commonly used monitoring methods. Chest CT in patients with COVID-19 usually shows bilateral lower lung ground-glass opacity (GGO) [3]. From the CT scans of 93 COVID-19 patients, Wang et al. found that the detection rate of ground glass shadows in normal patients (79.5%) was higher than that in critically ill patients (55%), and the detection rate of consolidation shadows in critically ill patients (90%) was higher in normal patients (52.1%) [4]. Bernheim  from CT of 121 patients with COVID-19 that the detection rate of ground-glass opacity and consolidation opacity increased with the increase of the number of days of onset [5]. During the course of treatment, the treatment effect was good, and the lung CT also showed good performance. On the contrary, the CT showed that the situation deteriorated [6][7][8].
Compared with viral test, CT can help doctors to better judge the critical condition of the disease, the duration of the disease, and the effect of treatment [9][10][11].
Radial basis function neural network (RBFNN) is a feedforward neural network with a single hidden layer, which consists of an input layer, a single hidden layer (radial basis layer), and an output layer (linear). Universal approximator, best-approximation property, and optimality are the features of RBFNN. RBFNN can choose different activation functions of the hidden layer according to different needs. Lu et al. [12] combined wavelet entropy and RBFNN to classify brain MR images into pathological and healthy. But, it is difficult for technicians to understand the weights/biases of RBFNN. Moreover, this method can only perform single classification of images, not multi-class classification.
Xue Han, Zuojin Hu and William Wang 2 K-ELM is a variant of ELM. It can provide different kernel functions for optimization, regression and classification. Lu et al. [13] used wavelet entropy as the features and K-ELM as a classifier to classify images as pathological or healthy. The hard part is interpreting the entropy values or the weights/biases of K-ELM.
Extreme learning machine (ELM) is a single hidden layer neural network learning method. ELM can solve regression and classification problems. Sometimes it is used in combination with the Bat algorithm (BA) to solve the problem of redundant nodes due to randomly parameters of the hidden layer, and to optimize the weight/bias. Lu et al. [14] calculated the wavelet entropies of the subbands as features, and trained a bat algorithm optimized extreme learning machine (BA-ELM) to identify images of pathological brains. However, doctors and technicians have difficulty in understanding the values of entropy and the weights of the classifier.
For chest CT images of COVID-19 patients, this study proposes a method to screen data images based on wavelet entropy, use extreme learning machine algorithm [15,16] to obtain data models, and then evaluate the performance indicators of data models through k-fold cross-validation. It aims to obtain more accurate chest CT images [17] of patients without redundancy, so as to help clinicians make more accurate judgments of patients' conditions. Compared with other algorithms, this paper adopts an improved algorithm of traditional machine learning algorithm, which overcomes the disadvantages of traditional machine learning, makes the implementation simpler and the result more accurate.
The parts of this paper are organized as follows: Section 2 elaborates the selection method of the dataset. Section 3 describes how wavelet entropy analyzes image signals, applies extreme learning machine algorithms, and k-fold cross-validation methods to train and validate sample data on small datasets. Section 4 conducts the experimental design, gives the experimental results and discusses them. Section 5 summarizes the full text.

Dataset
Using a Philips Ingenuity 64 row spiral CT machine, we randomly selected 66 healthy subjects, including 31 males and 35 females, from healthy people, and acquired 148 chest CT images; 66 patients with COVID-19 were randomly selected, including 41 males and 25 females also acquired 148 chest CT images. The subjects were in a supine position, with both upper limbs lifted up, holding their breath after deep inhalation, and a CT scan was performed from the thoracic inlet to the costal diaphragm angle. scanning layer thick layer spacing is 3 mm., and the CT image resolution was 1024×1024 [18].

Wavelet Entropy
Compared with Fourier transform [19], wavelet transform (WT) [20,21] emphasizes localized analysis in time and frequency [22], and multi-scale refinement of signals through scaling and translation. Finally, time refinement is achieved at high frequencies, and frequency refinement at low frequencies. Wavelet analysis is an important tool for dealing with unstable signals.
The "information entropy" proposed by Shannon is the average amount of information after excluding information redundancy [23]. The average uncertainty of the information H(x) is the statistical average of the uncertainty of the individual information symbols H(xi), the information entropy is defined as shown in Equation (1).
There are n information symbols xi (i=1, …, n) in total, and the uncertainty of each information symbol xi is p(xi). The greater the entropy, the greater the uncertainty of the information, and the more the amount of information [24].
Applying the theory of information entropy to wavelet analysis, the wavelet entropy can be obtained. Signals are refined in time and frequency by WT [25,26], and WE is used as the quantitative evaluation standard for local analysis at each resolution. WE (L) is define as shown in Equation (2).
The signal after wavelet transformation [27,28] is distributed in m resolutions , and qj is the relative energy of each resolution. Remove the part with small entropy value, that is, the part with less redundancy and information, and keep the part with large entropy value, that is, the part with large uncertainty and more information. In this way, the key feature parts of the signal are obtained as the input of the dataset. Deep learning methods [29] are not used because of the small-size dataset.

Extreme Learning Machine
Feedforward neural network is one of the most widely used neural networks today. It consists of input layer, hidden layer and output layer [30]. Each layer consists of several neurons, each neuron receives the output of the previous layer as input, and then generates output to the next layer [31], which belongs to one-way propagation [32]. According to the number of hidden layers, it is divided into single-layer feedforward neural network and multi-layer feed-forward neural network.  [33] is a learning algorithm for a single-layer feedforward neural network (SLFN), which is an improvement on the back-propagation (BP) algorithm [34]. Based on the theory that when the activation function in the hidden layer of a single-layer feedforward neural network [35] is infinitely differentiable, the input weights and the bias of the hidden layer can be randomly selected [36]. No adjustment is required, independent of the training data, it can still accurately learn multiple different observations, and the learning speed is faster than the traditional back-propagation algorithm [37], and it tends to achieve the minimum training error and minimum weight norm. Bartlett's [38] theory on the generalization performance of feedforward neural network shows that the smaller the training error of the current feedforward neural network [39,40], the smaller the weight norm, the better the generalization performance of the network.
An SLFN with n nodes (xi, yi) in the input layer, the input matrix is xi=[xi1,xi2,...,xin] T ∈ R n , and the output matrix is yi=[yi1,yi2,...,yim] T ∈R m , its hidden layer has l hidden layer nodes, which can be expressed as (3) [36]: g(x) is activation function. β i=[ β i1, β i2,..., β im] T is the weight from the ith hidden layer node mapped to the output layer node. wi=[wi1,wi2,...,win] T is the weight from the input layer node mapped to the ith hidden layer node. bi is the threshold of the ith hidden node. The ELM algorithm is summarized as: (1) The input weight wi and bias bi are randomly assigned to l hidden layer nodes.

k-fold Cross Validation
The purpose of cross-validation is to solve the situation of insufficient data volume of the dataset [41,42]. In simple cross-validation, the original data set is divided into two parts, one is the training set and the other is the validation set [43]. In this way, the data can only be used as training data or validation data once, and it is not fully utilized; the validation results will also vary depending on the standard by which the original data is divided. k-fold cross-validation can solve the shortcomings of simple cross-validation. It solves the problem of selectivity bias of results due to the selection of local data as the validation set in simple cross-validation, and has good generalization ability [44]. The original dataset is divided into k similarly sized subsets (folds) [45]. Each time the data model is trained, one of the folds is used as the validation set, and the other folds are used as the training set [46,47], and the performance value si of the trained data model is obtained after verification. In this way, the training iteration of the data model is performed k times [48], and each fold will be used as a validation set once and k-1 times as a training set [49]. Finally, the performance evaluation result of the data model is the average value s of the k performance values, as shown in formula (6). The method of k-fold cross validation is illustrated in Figure 1. [44] Table 2 is the result of running ten k-fold cross-validation (k=10). Sensitivity refers to the percentage of positive samples detected in patient samples (including true positives and false negatives), indicating the rate of no missed diagnosis. Specificity refers to the percentage of negative samples detected in healthy samples (including true negatives and false positives), indicating the non-misdiagnosis rate. Precision refers to the percentage of true positive samples detected in positive patients (both true positives and false positives). Accuracy refers to the percentage of true positive samples and true negative samples detected in the total samples. The mean(M) of the ten running results is higher than 75%, and the standard deviation(SD) is basically not higher than 2.5. The values of FMI in Table 2

Comparison to State-of-the-art Approaches
We compared our WE-ELM algorithm with state-of-the-art approaches: RBFNN, K-ELM, ELM-BA. As shown in Table  3 and Figure 3, in the detection of precision, accuracy and F1 for COVID-19, the best is WE-ELM (Ours), the second is RBFNN, the third is ELM-BA, and the last is K-ELM. In the detection of sensitive for COVID-19, the best is WE-ELM (Ours), the second is RBFNN, the third is K-ELM, and the last is E-LMBA. In the detection of specific for COVID-19, the best is WE-ELM (Ours), the second is ELM-BA, the third is RBFNN, and the last is K-ELM. Furthermore, the MCC value of WE-ELM is 53.35 ± 2.26%, and the FMI value of WE-ELM is 76.39 ± 1.41%. So, WE-ELM (Ours) presents better performance than the current method in the detection of COVID-19.

Conclusions
This paper proposes a computer vision-based method for diagnosing Covid-19 by detecting lung CT images. The method consists of three parts: WE extracts image features, ELM is used for training, and k-CV is used as data validation. It can be seen through experiments that after 10 times 10-fold CV, the MCC value of our method is 53.35 ± 2.26%, and the performance of other indicators is also better than the other three state-of-the-art approaches (see Section 4.3). Compared with the existing diagnostic methods (see Section 4.3), the WE-ELM algorithm proposed in this paper has better performance than other methods. The algorithm proposed in this paper for diagnosing 2019-nCoV has two shortcomings: firstly, WE was extracted, and only the features of health or pathology were identified, and In future studies, according to the differences of chest CT of COVID-19 patients, we will conduct a classification study of COVID-19 pathology based on computer vision. Second, the WE-ELM proposed in this paper can be applied to other computer vision-based recognition, such as emotion recognition, gesture recognition, etc. Third, we will continue to collect more data, and test the deep learning method when the amount of data is sufficient.