Quality Analysis of Extreme Learning Machine based on Cuckoo Search and Invasive Weed Optimization

This paper explicates optimization driven Extreme Learning Machine (ELM) strategy developed with feed forward neural network (FFNN) for classification of data. In this paper we proposed hybrid optimization based Extreme Learning Machine which is based on modified Cuckoo Search and modified Invasive Weed Optimization. To optimize the input weights and hidden neurons we proposed CSIWO ELM method. With this intelligent hybrid optimization method, we find out optimal solution. We are measuring the quality of the solution through linear function-based analysis, objective function-based analysis, and optimization function-based analysis. The experimental results proposed in this paper show the feasibility and e ff ectiveness of the developed CSIWO ELM method with encouraging performance compared with other ELM methods.


Introduction
Extreme learning machines are feed-forward neural networks having a single layer or multiple layers of hidden nodes for classification, regression, clustering, sparse approximation, compression, and feature learning, where the hidden node parameters do not need to be modified. These hidden nodes might be assigned at random and never updated, or they can be inherited from their predecessors and never modified. In most cases, the weights of hidden nodes are usually learned in a single step which essentially results in a fast-learning scheme. These models are capable of producing good generalization performance and learning thousands of times quicker than backpropagation networks. These models can also outperform support vector machines in classification and regression applications, according to the research [1].
Extreme learning machine (ELM) [2] is an extremely efficient algorithm for training a single hidden layer layer weights. Therefore, ELM tends to have faster training process than that of back-propagation based neural networks and support vector machine [6], while achieving the good generalization. The ELM algorithm is an effective single-hidden layer feed forward neural networks learning algorithm. Traditional extreme learning machine (ELM) and Support Vector Machine (SVM) provides equal importance to all the samples leading to results biased towards the majority class. Many variants of ELM-like Weighted ELM (WELM), Boosting WELM (BWELM) etc. are designed to solve the class imbalance problem effectively [7].
Many real-life problems can be described as imbalanced classification problems, where the number of samples belonging to one of the classes is heavily outnumbered than the numbers in other classes. The samples with larger and smaller class proportion are referred to as the majority and the minority class respectively [7]. ELM randomly initializes the weights between the input and the hidden layer. These random weights are used to map the input data to the feature space. These weights remain unaltered amid the training phase. Due to this some of the samples usually at the classification boundary are misclassified in certain realizations. The methods available for the imbalanced classification problems [8] can be broadly categorized as the data level methods, the algorithmic level methods and the cost sensitive methods. The data level methods like oversampling and under sampling [9][10] alter the data space to reduce the impact of the class imbalance. The under-sampling method randomly selects a fraction of data from the majority class examples and balances the data distribution at the cost of information loss. For example, Easy Ensemble and Balance Cascade [11] algorithms employ under sampling for balancing the dataset. The oversampling method randomly duplicates the examples of the minority classes in order to increase the cardinality of the minority class. This may lead to over-fitting. The informed oversampling approach, on the other hand, generates synthetic minority class examples to balance the class distribution like synthetic minority over-sampling techniques (SMOTE) [12]. The algorithmic level methods [13][14] directly modify the classifier design to address the imbalanced learning. The cost-sensitive methods assign more penalty for misclassifying the minority class examples with respect to the majority class examples, that is, misclassification of minority class examples bears more cost [15].
The major bottleneck for ELM is random selection of hidden nodes and input weights that hamper the performance of ELM. To improve this, our model CSIWO ELM provides a method to optimize hidden neurons and input weights. The following results shows that the effectiveness of hybrid approach CSIWO ELM for improving data classification [16][17][18].
We hereby proposing the new classification algorithm CSIWO ELM which gives the better classification on heart disease dataset. This paper proposes the quality analysis of new developed algorithm. The following result shows that the developed algorithm gives the good classification result.
The detailed structure of the work is we are modifying the cuckoo search and invasive weed optimization to select the optimised input weight and hidden bias. And this selected input weight and hidden bias will send it to the ELM FFNN to improve the classification result. And in this paper we are analysing the quality of the classification result using Linear function based analysis, objective function based analysis and optimization function based analysis on Cleveland dataset, Switzerland dataset and Hungarian dataset. The major contribution of this paper are: • An improved CSIWO ELM method introduced to optimize the hidden nodes and input weights.
• Forming of mathematical equation of the proposed method.
• Presented the result which shows effectiveness of the proposed model.
• Promoted Statical analysis of the proposed model which gives the validation of the result.
• Quality analysis of the proposed model.

Modified CSIWO ELM-based FFNN Approach for Data Classification
This section describes the developed CSIWO ELMbased FFNN for the classification of data. There are three phases which, involves Pre-Processing, Selection of Feature, and Classification of data. At first, the input data is subjected to pre-processing phase for missing value imputation and the transformation is done using exponential kernel. Then, the feature selection process is carried out using Jaro-Winkler distance for selecting the significant features for classifying the medical data. Then, the selected features are provided to FFNN for further classification. Here, the proposed modified CS-IWO is designed by combining CS and IWO algorithm. Thus, the optimal solution for performing the classification is obtained using FFNN. Figure 1 depicts the block diagram of developed modified CSIWO ELM-based FFNN for data classification. Consider the database used for obtaining the input data as, H. The database is represented as, where, the total data and the number of features is denoted as, A and B respectively. The dataset is of the size [A ×B], The mth data in nth feature is denoted as, H mn . The exponential kernel function is given by the below expression as, where, the smoothing factor and constant is represented by the term, d and R. The exponential kernel transformation improves the data classification process by improving the computation speed and simplifying the process of computation [4][5][6][7]. The output data from the preprocessing process is denoted as, P H . The similarity between the two features h 1 and h 2 Jaro-Winkler distance is expressed as, where, Jaro similarity metric and matching sequences are represented as, f and S, respectively. The number of transposition and common prefix length is denoted as, W and β, respectively. The length of two sequences are given as, k 1 and k 2 , respectively. The scaling factor is represented as, λ. The suitable features for the data selection process are compared using the Jaro-Winkler distance. The output data obtained from the feature selection process is represented as, W = {W 1 , W 2 , . . . , W l }. For further processing, the selected features are provided to the data classification module.

Data classification using modified CSIWO ELM-based FFNN
This section explains the developed modified CSIWO ELM-based FFNN method used for the classification of data. The FFNN is trained using the CSIWO ELM for the classification of data. The input to the FFNN classifier is the output data obtained from the feature selection process. The developed CSIWO ELM method is the integration of ELM with hybrid optimization algorithm. The CSIWO is developed by integrating the CS algorithm in IWO. The CS algorithm helped in optimization of the hidden neurons and input weights of ELM. The pairwise competition within the particles is randomly selected from the swarm in the CS. The generation, whereas the loser of the competition is updated generation, whereas the loser of the competition is updated and transferred to the next generation. The CS provided better the problems in high dimension [19][20][21]. The IWO depends on the colonization behaviour is invasive weeds.
where, the weight vector connecting the input with the output node is represented as, a j = a j1 , a j2 , . . . , a jy T . In j th hidden node, the bias is represented as, e j . The hidden and output nodes are connected with the output weights using the set of values, The minimum norm least square solution is determined for obtaining the output weight matrix, ρ. , . The minimum norm least square solution is given as, where, the output matrix of hidden layer is represented as, Q and MP pseudo inverse of H is denoted as, Q + . The output weight is determined by the regularized ELM for ensuring invertibility using the equation as, where, the identity matrix and regularization matrix are characteristics of specific species. represented as, I and C, respectively. The FFNN classifier is trained for the classification of data by the optimal solution provided by the developed modified CSIWO algorithm.
Training process of FFNN using introduced modified CSIWO ELM. . This The training process of FFNN based on devised modified CSIWO approach is explicated in this section, and its algorithmic steps are represented as below, [22][23][24][25] 1. Modified CS algorithm The modified CS approach is explicated in below sections.
Initialize the population of j host nests Randomly get a cuckoo (say g ) by Levy flights using equation (16)  5 Compute the fitness value using Fitness function expression (13)  6 Select the nest amongst j (say u ) arbitrarily 7 if F g > F u 8 Replace u by new solution 9 end 10 Abandon a fraction of worse nests and generate new ones at new location by Levy 11 flights 12 Keep the optimal solution Rank the solutions and find current optimum 13 end while The algorithm 1 shows that modified approach of cuckoo search, this algorithm present the how modification will improve the performance of the model.
following are the steps in CS algorithm. i) Initialization : The population of CS approach are initialized arbitrarily. The elements, such as input weights and hidden biases are optimized in a particle. Moreover, candidate ELM is specified through the hidden biases as well as input and the elements are initialized within the range of [−1, 1]. The solution G(δ) at iteration δ is illustrated as, where, J denotes the feasible solution k is attained amongst solution set. The population of V particles are randomly initiated and updated.
ii) Fitness function computation : The fitness measure is estimated for finding optimal solution and it should be minimal, which is calculated by following expression.
where, LF indicates loss function, χ implies fitness function, ψ denotes Mean Square Error (MSE), B signifies accuracy, and µ refers entropy. The accuracy is a closeness measure, which is calculated by, where, M is true positive, N indicates true negative, O symbolizes false positive and P specifies false negative.
In least square solution is given as, addition, MSE is calculated by, where, ψ c represents fitness function, q indicates total indicates total amount of samples, E * q refers target output and E q denotes the final output of FFNN.
iii) Computation of updation equation Here, the CS and IIWO is modified for introducing modified CSIWO model. Usually, CS approach is the breed characteristics of specific species. The velocity updation equation in CS method is expressed in below expression. Here, the step size κ is made iterative and it is specified as, where, G δ h denotes solution at current iteration, symbolizes entry wise multiplication. The distance W given by, w = r m x − r m y (18) where, r m x is fitness of solution m x , which has best value, and r m y denotes fitness of solution m y , that has second best value. Additionally, Levy flights are significant to afford random walk, when their random steps are drawn from Levy distribution for huge steps.
The above equation gives the levy flight equation. iv) Re-evaluation of fitness function : The fitness measure is computed for each iteration, and the fitness measure with maximal value is taken as optimal solution. v) Termination : The above processes are repeated until the best optimal solution is obtained.
2. Modified IWO algorithm The general IWO model is population driven optimization approach, which is motivated by weed colonies characteristics. Here, instead of IWO method, improved IWO technique is employed. i) Initialization : At first, population of plants are formulated by means of chaotic mapping, which is specified as, where, d signifies total amount of solution and G e denotes e th solution.
ii) Calculation of fitness function : The fitness measure is estimated for identifying optimum solution. The fitness function with minimal value is taken as best possible solution, which is already expressed above in fitness function equation.
iii) Update the solution : After the computation of fitness function, solution updation is performed using modified IWO technique. The standard equation of Improved IWOA for obtaining best position is where, G δ+1 h refers new weed in h th iteration, G best signifies best weed in entire population and γ(δ) symbolizes standard deviation.
Here, g(δ) is equal to chaotic mapping in δ th iteration.
Here, the modulation index k is modified as, where, max indicates maximal value of k found till modulation index, k l represents current iteration, and W specifies distance, which is already represented in denoted in above distance equation.
iv) Verify feasibility of solution: The optimum solution is computed using the fitness measure and if a new solution is better than previous solution, then update a value with new one. v) Termination: The above processes are continued until the better solution is obtained. Initialize the population of plant 5 Estimate the fitness function based on fitness function equation (13)  6 while (h < max imum amount of weed ) 7 Update the solution by means of expression 8 end while 9 Verify feasibility of solution 10 Replace the optimal solution 11 End The algorithm 2 shows the modified approach of invasive weed optimization. This modification helps in improving model performance [36][37][38][39][40].

Result and Discussion
This section describes the results and discussion of proposed modified CSIWO ELM-based FFNN with respect to evaluation metrics by varying the training data percentage.

Experimental setup
The experimentation of proposed modified CSIWO ELM based FFNN approach is done in PYTHON tool. The dataset utilized for demonstration of proposed modified CSIWO ELM-based FFNN is heart disease dataset, which consists of Cleveland, Switzerland, and Hungarian dataset from UCI machine learning repository. This dataset is a type of multivariate dataset, and it is mainly used for the purpose of classification tasks. This dataset consists of 76 attributes out of which only 14 attributes are used. The presence of heart disease is represented utilizing the integer from 0 to 4 . The number of attributes existed in proposed modified CSIWO ELM-based FFNN approach is 303.

Comparative analysis
This section explains the comparative analysis of proposed chaotic IWO+FFNN-tangential based on three functions, namely linear, objective, and optimization using three datasets. Moreover, the techniques employed for comparison are Cuckoo search + FFNN, Chaotic IWO + FFNN, IWO-CS ELM-exponential, CSIWO ELM-based FFNN-tangential, and Modified CS + FFNN-exponential.
Linear function-based analysis: We can analyse the model by first representing it as a linear function and then interpreting the components of the function. As long as we can figure out, the initial value and the rate of change of a linear function. Initial value is a term that is typically used in applications of functions. It can be represented as the starting point of the relationship we are describing with a function. In the case of linear functions, the initial value is typically the y-intercept. This analysis gives us the quality in the solution [41][42].
This section illustrates the analysis of proposed modified chaotic IWO+FFNN-tangential based on linear function in terms of performance measures using three datasets. Linear activation function is also referred as identity function in which the activation is directly proportional to the input. Typically, linear function is constrained with two major limitations. There is no possibility to employ backpropagation since the derivative of the function is always a stable value and it has no similarity to the input. The second limitation is that all layers in the neural network will aggregate into one layer if a linear activation function is utilized. Specifically, a linear function converts the neural network into a single layer. Here, linear activation is analyzed based on two types, such as exponential and tangential function [43].
(i) Analysis using Cleveland dataset Figure 2 represents the analysis of proposed modified CSIWO ELM-based FFNN in terms of accuracy. When the training data = 90%, the accuracy obtained by proposed modified CSIWO ELM-based FFNN is 0.947, whereas the existing approaches attained the accuracy of 0.883 for cuckoo search + FFNN, 0.898 for chaotic IWO + FFNN, 0.901 for modified CS + FFNN, 0.913 for modified chaotic IWO + FFNN, 0.929 for IWO-CS ELM, and 0.938 for CSIWO ELM-based FFNN. However, the proposed scheme shows the performance enhancement of proposed approach with that of conventional techniques, such as cuckoo search + FFNN is 6.715%, chaotic IWO + FFNN is 5.175%, modified CS + FFNN is 4.808%, modified chaotic IWO + FFNN is 3.601%, IWO-CS ELM is 1.854%, and CSIWO ELMbased FFNN is 0.925%. However, the proposed scheme shows the performance enhancement of proposed approach with that of conventional techniques, such as cuckoo search + FFNN is 7.348%, chaotic IWO + FFNN is 5.794%, modified CS + FFNN is 4.567%, modified chaotic IWO + FFNN is 3.397%, IWO-CS ELM is 2.402%, and CSIWO ELM-based FFNN is 1.495% In the same way we can calculate and draw the sensitivity and specificity based on the linear function of Cleveland dataset, the values are mentioned in the Table1.
(ii) Analysis using Switzerland dataset Figure 4 shows the comparative analysis of proposed modified CSIWO ELM-based FFNN in terms of accuracy using Switzerland dataset. By considering the training data percentage as 90%, accuracy obtained by proposed modified CSIWO ELM-based FFNN is 0.957. Results the performance improvement of proposed scheme with that of existing methods, like cuckoo search + FFNN is 6.498%, chaotic IWO + FFNN is 5.399%, modified CS + FFNN is 3.981%, modified  However, the proposed scheme shows the performance enhancement of proposed approach with that of conventional techniques, such as cuckoo search + FFNN is 6.592%, chaotic IWO + FFNN is 5.863%, modified CS + FFNN is 4.893%, modified chaotic IWO + FFNN is 2.761%, IWO-CS ELM is 2.670%, and CSIWO ELMbased FFNN is 0.896%. Similarly, we have calculated the sensitivity and specificity of the Switzerland dataset. And accordingly, we have drawn the accuracy, sensitivity, specificity and precision graph for all the dataset i.e., Cleveland, Switzerland and Hungarian for Linear function-based analysis.
Objective function-based analysis: The objective function may involve plugging the candidate solution into a model and evaluating it against a portion of the training dataset, and the cost may be an error score, often called the loss of the model. In order to find the optimal solution, we need some way of measuring the quality of any solution. This function, taking data and model parameters as arguments, can be evaluated to return a number. Any given problem contains some parameters which can be changed; our goal is to find values for these parameters which maximize this number. In this paper we are analysing quality of the solution using objective function based on the Cleveland, Switzerland, and Hungarian dataset.

Nilesh Rathod, Sunil Wankhade
The following effective result shows that our proposed modified CSIWO ELM based FFNN selects the optimal parameter and evaluate our model solution against the training dataset.
(i) Analysis using Cleveland dataset Figure 6 represents the analysis of proposed modified CSIWO ELM-based FFNN in terms of accuracy. When the training data = 90%, the accuracy obtained by proposed modified CSIWO ELM-based FFNN is 0.945, whereas the existing approaches attained the accuracy of 0.885 for cuckoo search + FFNN, 0.896 for chaotic IWO + FFNN, 0.904 for modified CS + FFNN, 0.910 for modified chaotic IWO + FFNN, 0.933 for IWO-CS ELM, and 0.936 for CSIWO ELMbased FFNN. However, the proposed scheme shows the performance enhancement of proposed approach with that of conventional techniques, such as cuckoo search + FFNN is 6.285%, chaotic IWO + FFNN is 5.188%, modified CS + FFNN is 4.296%, modified chaotic IWO + FFNN is 3.656%, IWO-CS ELM is 1.274%, and CSIWO ELMbased FFNN is 0.855%. Figure 7 illustrates the analysis of proposed modified CSIWO ELM-based FFNN with respect to precision. With 90% of the training data taken into consideration, the precision attained by developed modified CSIWO ELMbased FFNN is 0.906, while the existing methods attained the precision of 0.839 for cuckoo search + FFNN, 0.847 for chaotic IWO + FFNN, 0.864 for modified CS + FFNN, 0.869 for modified chaotic IWO + FFNN, 0.878 for IWO-CS ELM, and 0.888 for CSIWO ELM-based FFNN. Also, the proposed scheme showed performance improvement with that of conventional techniques, like cuckoo search + FFNN is 7.37%, chaotic IWO + FFNN is 6.41%, modified CS + FFNN is 4.61%, modified chaotic IWO + FFNN is 4.07%, IWO-CS ELM is 3.02%, and CSIWO ELM-based FFNN is 1.96%.
Similarly, we have calculated the sensitivity and specificity of the Switzerland as well as Hungarian dataset. And accordingly, we have drawn the accuracy, sensitivity, specificity and precision graph for all the dataset i.e., Cleveland, Switzerland and Hungarian for Objective function-based analysis. All calculated values is reflected in Table1.
Optimization function-based analysis: Optimization function is the of finding a set of inputs to an objective function that results in a maximum function evaluation. An optimization algorithm is used to find the values of the parameters that minimize the error of the function when used to map inputs to outputs. We are performing the optimization function-based analysis on Cleveland, Switzerland, and Hungarian dataset to check the set of input that can perfectly fit to objective function to maximize function evaluation. The proposed modified CSIWO ELM based FFNN selects the optimal parameters which minimize the error and gives greater results on all datasets. Similarly, we have calculated the sensitivity and specificity of the Cleveland dataset. And accordingly, we have drawn the accuracy, sensitivity, specificity and precision graph for all the dataset i.e. Cleveland, Switzerland and Hungarian.
Similarly, we have calculated the sensitivity and specificity of the Switzerland as well as Hungarian dataset. And accordingly, we have drawn the accuracy, sensitivity, specificity and precision graph for all the dataset i.e., Cleveland, Switzerland and Hungarian for Optimization function-based analysis. All calculated values is reflected in table 1.

Comparative discussion
Table1 portrays the comparative discussion of proposed modified CSIWO ELM-based FFNN with respect to evaluation metrics using three datasets, namely Cleveland dataset, Switzerland dataset, and Hungarian dataset. From the discussion, it is very vivid that the proposed modified CSIWO ELM-based FFNN has achieved a maximum accuracy, sensitivity and

Statistical analysis
Pair-wise statistical tests are performed to assess the algorithmic performance by combining various algorithms. Here, statistical test is performed on the accuracy, sensitivity, and specificity values.  Table 3 represents the pair-wise statistical test of algorithms based on sensitivity. The methods considered for the comparison are analyzed with the proposed CSIWO ELMbased FFNN to find the performance deviation of the methods using statistical test. In Table 3, the proposed CSIWO ELMbased FFNN discards null hypothesis by reaching the t value 0.0254 as compared with the existing methods. Table 4 represents the pair-wise statistical test of algorithms based on specificity. In Table 4 , the proposed CSIWO ELM-based FFNN discards null hypothesis by reaching the t value 0.0203 as compared with the existing methods.

Conclusion and Future Scope
In this paper, a novelty ELM method is named Hybrid Intelligent Feed Forward Neural Network Extreme Learning Machine based on modified Cuckoo Search and modified Invasive weed optimization (CSIWO -ELM) method is proposed. The main features of this model to optimize the hidden neurons and input weights. The modified CS and IWO selects the input weights and hidden neurons which effectively gives the greater result according to the proposed model. The proposed CSIWO -ELM model analysed using evaluation metrics for three functions, such as linear-based function, objectivebased function, and optimization-based function using three datasets. Moreover, the developed modified CSIWO ELM based FFNN has achieved maximum accuracy of 0.959, maximum sensitivity of 0.920, and maximum specificity of 0.940 for Switzerland dataset based on objective function. Also statical test is performed on the model which also shows the performance of the proposed model.
The future improvisation of developed modified CSIWO ELM-based FFNN would be the inclusion of more advanced classifiers with some other hybrid optimization for accurate classification of data. Also, it challenging to see the use of ELM in the deep learning technique.