Integrating Metaheuristics and Two-Tiered Classification for Enhanced Fake News Detection with Feature Optimization

INTRODUCTION: The challenge of distributing false information continues despite the significant impact of social media on opinions. The suggested framework, which is a metaheuristic method, is presented in this research to detect bogus news. Employing a hybrid metaheuristic RDAVA methodology coupled with Bi-LSTM, the method leverages African Vulture Optimizer and Red Deer Optimizer. OBJECTIVES: The objective of this study is to assess the effectiveness of the suggested model in identifying false material on social media by employing social network analysis tools to combat disinformation. METHODS: Employing the data sets from BuzzFeed, FakeNewsNet, and ISOT, the suggested model is implemented on the MATLAB Platform and acquires high accuracy rates of 97% on FakeNewsNet and 98% on BuzzFeed and ISOT. A comparative study with current models demonstrates its superiority. RESULTS: Outperforming previous models with 98% and 97% accuracy on BuzzFeed/ISOT and FakeNewsNet, respectively, the suggested model shows remarkable performance. CONCLUSION: The proposed strategy shows promise in addressing the problem of false information on social media in the modern day by effectively countering fake news. Its incorporation of social network analysis methods and metaheuristic methodologies makes it a powerful instrument for identifying false news.


Introduction
The proliferation of false news on social media will severely impact every domain.The social media platform should facilitate the identification scheme that distinguishes fake news from original data.As we know, social media relies on the Internet, but before the Internet was even a thing, there were hoaxes and fake news [1] [2].Fake news on the Internet is generally defined as "fictitious stories that have been purposefully created to deceive readers."To promote reading or as a kind of psychological warfare, news organizations and social media sites disseminate false information.However, distinguishing fake and genuine data from social media platforms is highly complicated due to the exponential proliferation of information [3] [4] [5].As a result, social media is now contaminated with fake data; therefore, a requirement is to develop a highly efficient model to detect fake news from social media sites.The detection model faces severe complications due to a lack of tagged data identifying fake news [6].Fake news can be P. Narang, A. V. Singh and H. Monga 2 broadly categorized as clickbait, misinformation, disinformation, hoaxes, satire, parody, deceptive news, rumours, and similar data [7] [8].Fake news on social media is circulated several times, leading to substantial detrimental effects on society due to misleading information [9] [10].Social media-based fake news identification is becoming a hot topic where the quick circulation of data may contain misleading material [11] [12].The spread of false information in today's media environment has become a prevalent issue with far-reaching effects on society.For instance, a lot of misinformation spread extensively on social media platforms during the 2016 U.S. presidential election [48], affecting public views and influencing voting behavior.These events made clear how fake news may influence political discourse and jeopardize the democratic process.Likewise, in the field of public health, the dissemination of false information about COVID-19 therapies and vaccinations [49] has presented significant challenges for medical experts and decisionmakers.Online conspiracies and false information have weakened attempts to contain the pandemic and increased vaccine mistrust.Thus, identifying false news is becoming inevitable and getting extreme attention from every field [13].However, the existing models cannot perform well in detecting fake news due to the complexity of identifying similar data [14] [15].For better identification, the auxiliary information related to individual social media contribution because the fake news primarily misguided the individual into false information.The detection model performs only with the existing data but does not work well with the newly arrived data.Second, supplementary data may be complex due to significant, chaotic, sparse, and noisy data.Therefore, detecting fake news is highly complicated on social media datasets.Therefore, the contribution of this work is to analyze the recent research works n fake news detection and to define the problem statement of the same.Then the significant features from the pre-processed data for efficient data categorization (fake/genuine).After that, the most relevant features among the extracted features with a new hybrid optimization model to reduce the computational complexity of the model.Finally, a new two-level deep-learning classifier model for accurate data classification can be developed-the hyper-parameters of the deep-learning model with a new hybrid optimization model to process fine-tuning-moreover, validation for proving the performance efficiency of the projected model.

Research Objective
The primary objective of this study is to develop and apply a reliable model for identifying false information on social media websites.The study intends to investigate how social media affects both individual and group viewpoints, with a particular emphasis on the spread of false and true information.The main research objectives of the proposed approach are: 1. Examine how individual and collective viewpoints are susceptible to the impact of social media, with a focus on the novel aspects of this influence.2. Highlighting the newness of these concerns, examine the particular difficulties brought about by the ease with which false information can proliferate on social media.3. Create a novel metaheuristic approach for the inventive identification of bogus news by combining African Vulture Optimisation and Red Deer Optimisation.4. Utilise a unique hybrid RDAVA approach and Bi-LSTM classifier for feature selection and two-level classification to improve accuracy.

Literature Review
The literature review section mainly provides different authors' approaches that helped classify or predict fake news.
The existing techniques and their performance on other datasets are highly considerable criteria for developing novel courses.
The semantic structure of text data can only be understood with Natural language processing (NLP) [45][46]activities such as text classification, sentiment analysis, POS tagging, and machine translation.Through the analysis of syntactic structures and contextual information, these tasks aid in the differentiation between various word senses.For example, classification helps to discern between fake and real news by classifying documents according to their content.Sentiment analysis helps to discern the sentiments conveyed in text, and POS tagging helps to distinguish between word senses by giving words grammatical labels and improving the capacity to distinguish between real and false news.NLP tasks including text categorization, sentiment analysis, and named entity recognition have been transformed by recent advances in deep learning, best demonstrated by transformer-based designs like BERT [42][43], GPT [44], and their variations.These architectures find complex linguistic patterns and semantic subtleties in text data by utilising massive pre-trained language models and advanced attention processes.Furthermore, models of neural networks with attention mechanisms added to them have shown to be remarkably effective in modelling long-range relationships and encoding contextual information, which has resulted in more precise predictions and a better comprehension of textual semantics.Drawing insights from studies [38][39] [40][41] on distributed database optimization, proactive detection of online antisocial behavior, and crop yield prediction using deep learning and feature selection techniques, our hybrid metaheuristic approach for enhanced fake news detection with feature optimization is informed by these sources.

Proposed Methodology
The Online Social Media platform is a prevalent and powerful tool to transfer data to all users.However, the complex data environment of social media may provide the chance of spreading false news in terms of deliberate, false, or misleading information; therefore, a highly enhanced model is required to identify fake news.Accordingly, a novel fake news detection model has been developed to detect fake news in social media networks.The proposed model's architecture is shown in Fig. 1.Initially, the dataset description is discussed in the following section.All these datasets need to be analyzed to predict and identify fake news as it contains a lot of junk data.Therefore, the dataset is processed with a significant data pre-processing.

Pre-processing
The pre-processing helps to reduce the dimensionality of the collected raw data where various steps like sentence segmentation, tokenization, stopwords removal, and word stemming, as discussed below in detail: Initially, the input sentences are separated by utilizing certain operators such as colon (:), dot (.), semicolon (;), etc.In addition, the white space and tab key can also be utilized to divide sentences.By using all these keys, the sentence can be transferred into tokens.Then the repeated words and stop words are identified for removing them to make more clarity in classification.For instance, the words such as on, in, at, thus, and too, a, and an are some stop words of the sentence.These words do not have any specific meaning, so the removal of these will not impact performance.Then the stemming process may help to reform the word to its basic form.After all this process, the feature extraction is processed, and the detailed description of the feature extraction process is discussed in the following section.

Feature Extraction
The computation complexity is always higher in the learning process with high dimensional data.The data may be associated with a large volume of phrases and words that make learning complex.On the other hand, the learning performance may be digested because of redundancy and irrelevant data.Therefore, an appropriate feature selection algorithm is required for feature reduction to reduce the size and overcome ample dimensional feature space.
Initially, the Bag of Words (BoW) converts textual data into vectors of word frequencies or occurrences, providing a numerical representation of documents.This representation is then utilized in various NLP tasks such as text classification, sentiment analysis, and document clustering by applying machine learning algorithms that operate on these vectorized representations to make predictions or extract insights from the text data.Then the POS tags for statistical NLP tasks since they can be used to discriminate between different word senses, which is highly useful for realizing texts and deriving semantic information from given texts.Then the incorporation of the N-gram is to illustrate the context of the document and provide features to classify the document.Moreover, it is utilized to formulate several sets of n-gram frequency profiles from the training data to represent fake and truthful news articles.The I-TF-IDF has been used with the RDAVA algorithm for further processing, i.e., clearly stated as follows.

I-TF-IDF using RDAVA
Improved Term Frequency-Inverse Document Frequency, i.e., I-TF-IDF, is based on evaluating the fitness function.
Therefore, the Mean Absolute Difference (MAD) is considered the Fitness Function, i.e., assessed using the RDAVA algorithm.The I-TF-IDF helps to select more significant features.The weight of the TF-IDF algorithm is supposed to update the essential features using the following equation. Where, (2) Where  , Represents the feature value in which "i" is a textual document and j is a feature.The selected features are described as   , t is the unique features, and  �  Represents the mean of vector i.The Word2Vec generates vectors of the words that are distributed numerical representations of word features.These word features encapsulate words that illustrate the context of each word present in the sentence.Word embedding finally helps establish the collection of a word with another similar meaning word through the created vectors.Later, the feature selection with the RDAVA algorithm is obtained and discussed in detail below.

RDAVA-based Feature Selection
The pre-processed data based on several word-level processing, including TF-IDF, are considered.Moreover, the TF-IDF-based feature extraction approach is further enhanced by incorporating a novel RDAVA algorithm.The RDAVA algorithm emerged from the idea of two different algorithms.The RDAVA algorithm emerged from the idea of two different algorithms-Red deer optimization (RDO) and African Vulture Optimization Algorithm (AVOA).RDO [47] exhibits an equilibrium between exploration and exploitation when pursuing the optimal solutions; it was inspired by the red deer's foraging behavior.RDO is appropriate for a variety of optimisation problems because of its behavior, which enables it to travel solution spaces effectively.RDO exhibits competitive convergence speed, solution quality, and scalability when measured against other metaheuristic algorithms such as Genetic Algorithms (GA) or Particle Swarm Optimisation (PSO).For instance, robust convergence to near-optimal solutions is regularly demonstrated by RDO in optimisation problems with highdimensional search spaces.Its adaptability in modifying its search approach according to the specifics of the problem also adds to its efficacy in a variety of optimisation sectors.
The following shows detailed information about the AVOA algorithm.

Background model of the AVOA
AVOA algorithm is inspired by the behavior of African vultures that execute specific strategies for hunting their prey [33].The following steps describe the process of AVOA.
Rule 1: The population of the African vultures is denoted as N, and D represents the position space or dimension of each vulture.In addition, D alas dimension of the problem may vary regarding different issues.Likewise, T as maximum iteration, i.e., helps to solve the complex problem.Therefore, "i" denotes each vulture's position and is determined based on condition 1≤i≤N in different iterations as t(1≤t≤T).This can be defined using the equation.( 3) Rule 2: The entire population of the vultures is divided into three categories, where the first is determined as the best feasible solution from the whole group.The second is the second-best solution, and the rest are denoted as the third group.
Rule 3: The vultures work together to hunt their prey by taking the different leads in the population.
Rule 4: The best and worst vultures in the population are represented by analyzing the fitness values of the hungriest and weakest.Conversely, the strongest vulture plays a leading role and best as well.In the populations of African vultures, the vultures in the population always depend on the strongest vultures, whereas they stay far from the weakest vultures.By analyzing the steps mentioned above of African Vulture, the AVOA has developed five stages of execution strategy.
First Phase: This phase has executed rule 2 to form grouping based on quality, which must be performed after initialization.The first and second are grouped as the first and second best solutions.And the remaining vultures are gathered into the third group.Equation ( 4) is designed to determine the vulture that should be moved into the current iteration.
Where    Indicates the fitness value, i.e., determined from the i st () group and 2 nd () group vultures, and m denotes the total number of I st () group and 2 nd () group vultures.The  in the group represents the weakest, the remaining vultures in the population.After all the identification, the target vulture is identified by applying the significant parameters.

Second Phase: The Hunger of Vultures
The second phase is based on a hungry vulture, which does not have significant strength to find food from a farther location.Therefore, the hungry vulture's behavior is aggressive and stays together with a vulture.Thus exploration and exploitation can be developed based on the degree of hunger.Equation ( 6) helps to determine the degree of hunger of the i th at the t th iteration.
Where  1  , is a random number in the range of [0,1], z is a random number in the range of [−1,1], and g is calculated by Eq (5).
Where h represents the random number between the range of −2,2 and k as the probability of the exploitation stage, a larger k denotes the final optimization stage, i.e., to determine the higher possibility of entering the exploration stage.
Conversely, the smaller k determines to enter into the exploitation phase.According to the formula, as long as the number of iterations is increasing,    will get decreased.In this process,    is exceeded above 1, the exploration process is executed, and looking for a new location to find food is.It is converse when the    get below 1 that executes exploitation where food will be identified from the current location.

Phase 3: Exploration Stage
The AVOA has provided two exploration phases, i.e., incorporated with the parameter P1 to make a significant decision at the current time.The P1 has initialized with a specific range between 0 and 1.
Here   +1 denotes i th vulture's position at the t+1th iteration   1  ,   2  , and   3  are three different random numbers set between the range of 0 and 1. Equation ( 4) and equation ( 4) are respectively utilized for obtaining    and   .The upper and lower bounds are represented as ub and lb of the problem solution.   is determined by equation ( 9), which helps to determine the distance between the vulture and the new optimal vulture.
Where    represents the ith vulture position in iteration t, and C is uniformly distributed based on a random number in the range [0,2].

Phase 4: Exploitation Stage
The contribution of this phase 4 is highly inevitable in maintaining the balance between exploration and exploitation.Here, |   | is obtained between 0.5 and 1, leading to the exploitation phase, especially medium-term based.The medium-term based exploitation phase is considered the parameter, i.e., 2 with the range of 0 and 1.In the medium-term exploitation stage, a parameter 2 utilized to execute rotating flight or food rivalry by considering the range between 0 and 1.These rand p2 t random numbers are generated before executing the vultures act.If the  2  ≥ 2, the food competition process will be executed.Conversely, the rotating flight will be performed if the  2  ≤ 2.a) Food Competition The strongest vultures have the |   | the range between 0.5 and 1, and they avoid the weakest vulture from sharing the food, but the weakest one competes with the strongest to get food.This process is mathematically derived using equation (10)   +1 =    × (   +  4  ) −    (10) Where the equation ( 9) helps to determine    , the equation (4) to determine    ,    is a uniformly distributed random number between the range of 0 and 1.At last the    is determined using equation ( 11) b) Rotating Flight As far as the vulture is full and energetic, it processes food competition and hovers at high altitudes.The vultures are executing spiralling in a rotating fight where they update their position based on equation (12).
Where  1  and  2  are mathematically denoted using the following ( 13) and ( 14) equations

Phase 5: Exploitation stage (later)
As long as the vultures |   | is come below 0.5, almost the entire population of the vulture is becoming full, but the best two vultures in the group are becoming weak and starving due to long-term practice.In this scenario, the vultures are gathered in the exact location and attack their prey.In this process, 3 is utilized in between the range of 0 and 1.Therefore, the vulture's behavior can be either aggregation or attacking.Thus, the vultures, while reaching into exploitation with  3  , as a random number with a specific range [0,1], these numbers will be generated before the vulture act.Moreover, if  3  ≤ 3, the attacking behavior will be executed.Conversely,  3  ≥ 3 the aggregation behavior will be taken place.a) Aggregation Behavior The vultures can consume an abundance of food while they are at the late stage.Many vultures are gathered at the food location and execute the competition behavior.Equation ( 13) is used to update the vulture's position.
Where the equation ( 14) and ( 15) are utilized to determine  1 and  2  respectively.
b) Attack Behavior Similar to aggregation behavior, the attack behavior will be executed until it reaches the last stage, but the vultures are moving to the food location to get the abandoned food, i.e., determined by equation (16).
Where,    is based on equation (11), dim denotes the dimension of the problem.(. ) denotes the levy flight, i.e., derived using the equation ( 19) Where r1 and r2 are uniformly distributed random numbers between the range of 0 and 1,  is a constant with a value of 1.5.The equation (18) to determine the .

Proposed RDAVA algorithm
The AVOA algorithm has the ultimate exploration and exploitation strategy; however, it does not ensure premature convergence because of an imbalance in the exploration and exploitation phases.It is considered a significant difficulty of AVOA; therefore, a novel algorithm is developed based on the combination of RDA and AVOA where the RDA strategy helps to enhance the performance of AVOA.Specifically, the RDA's strategy is to determine the best neighbor vulture and use complete historical vulture information to obtain a better optimal solution.As discussed, the AVOA suddenly fell into local optima due to an insignificant mechanism to maintain individual historical data of vultures.Equation ( 7) is executed based on the current optimal solution, i.e., utilized only when the parameter 1 ≥  1  .However, the AVOA-based approaches are not as effective in exploring unknown areas.Thus, the process may lead to poor performance in the exploitation process.Therefore, an RDAVA has emerged to maintain the record of the historical optimal solution and update the location in exploration accordingly.Thus, the exploration performance of RDAVA is much higher than the traditional AVOA.The RDAVA is executed on AVOA's exploration phase only if  1  ≤ 1.In this case, the RD's strategy is added with AVOA by improving the equation ( 9) The formulation of RDAVA is based on considering the mating strategy with the nearest hind by the stag.Using the equation below, the RDA [34] is an executed strategy for the stag to determine the nearest hind from the j-dimension.
Where   to determine the distance between the current ith hind and a stag.Here, based on the minimum value, the hind is selected.Similarly, based on equation ( 21) best optimal vulture is obtained, and its current information is stored for further processing.Therefore, an optimal solution can be obtained and balance exploration and exploitation by avoiding premature convergence.The proposed RDAVA can help select significant features and features further transferred into the two-level architecture with BiLSTM to process high-level classification.Table 2 shows the pseudocode of the RDAVA algorithm with TF-IDF.

Return the final best solution
The best solution for improving the TF-IDF Return best features

Two-level Classification with Bi-LSTM
The proposed two-level classification has incorporated RNN and RBM algorithms.The extracted features are transferred into a two-level architecture, where feature vectors are initially moved into the RNN and RBM architecture.Then the processed parts from RNN and RBM are transferred into the Optimized-Bi-LSTM for classification.

RNN-based Feature Extraction
The time series data is based on interconnected flow where individual predictions may fall into false results due to inefficient models dealing with long-term dependencies.In this scenario, the RNN provides better prediction by learning the model weights from the current cell to the last cell.It executes the reverse and forward flows, which differs the RNN from other network models, i.e., specifically, suits to Feedforward Neural Network (FNN).In RNN, the hidden states in the architecture are updated by utilizing the nonlinear activation function for predicting the output.Therefore, the weight is multiplied and retransferred to the input of neurons.The preceding timestamp significantly impacts the current timestamp, which helps the process of memory allocation processes.Moreover, the memory helps to monitor the preceding computed outcome.As discussed, the RNN provides a better platform to predict future data from previous data.The last timestamp of the RNN is denoted as "t-1," which indicates the flow towards the hidden layer to the subsequent timestamp as "t."The prediction model is weights and bias.The weight and bias of the hidden layers are multiplied to predict an expected outcome by applying an appropriate activation function.the hidden-hidden and the hidden output states.The bias of both hidden and output are respectively denote   ℎ and    .The activation functions of the hidden layers are represented as (.), and the output layer is denoted as (.).The  ℎ  Helps memorize the network's hidden state at the "t" time step.In addition, the  ℎ  helps to capture the previous information of the last time steps.However, the RNN network has a limitation in long-term sequential data.The proposed model utilizes RNN-based feature extraction.The RNN selects the significant sentence from the summary.
This model has incorporated the input vectors, the layers of RNN, and summary generation.For effective sentence summarization, the given text is converted into numeric values, i.e., vector representation.The feature extraction is achieved by data received from the pre-processed data.Then the concatenated significant features using the ⊕ concatenated operator.For instance, the Sn is a sentence with an "n" number of sentences where all the sub-sentences are added using a concatenate operator.Moreover, the textual feature is padded by 0. Then the RNN network processes the input feature vectors to generate the quality-enhanced summary.Furthermore, the sentence vectors are transferred into the hidden layer, where learned weights and biases are utilized to multiply the feature vectors separately.The RNN executes the back-propagation algorithm that helps to append features into the model and remove unnecessary features.

RBM-based Feature Extraction
RBM is an energy-based two-layer model with the joint probability function that has incorporated with the visible unit as   = � 1 ,  2 , … ,    � and    represents the overall visible layers and ℎ  = ℎ 1 , ℎ 2 , … ., ℎ   and ℎ   , denotes the overall hidden layers.In addition, each unit in the model is connected to an opposing layer, yet there are direct linkages between variables within the same layer.

Bi-LSTM based classification
The LSTM model is highly effective for processing long-term dependencies that RNNs can achieve.The behavior of LSTM is similar to a conveyor belt because its process flow on the cell state is straight down.The LSTM will not execute any add or remove operation within a cell.In the architecture of the LSTM, the network mode has several gates that are incorporated with pointwise multiplication operators and sigmoid functions.The first step is to throw the actual information using LSTM.The decision-making process is done by executing the sigmoid layer.The information passed via the forget gate is represented as ht−1 and xt.Therefore, two possibilities of results can be observed, such as 0 or 1.The overall operation related to the flow of information is based on several sigmoid values, and the mathematical formulation is shown below.
Where,   Represents the sample vector, k represents the number of classes, the weight vector is w, and the label is j.Moreover, the RDAVA algorithm applies in the LSTM to achieve the optimized outcome.Therefore, the parameters such as minibatch, epoch, momentum, etc., are considered and executed in the hyperparameters tuning operation.The cross entropy-based loss function is deemed to improve model performance.The mathematical form of the loss function is shown below.
Where B represents the class label,   denotes the class at the ith iteration, and the predicted outcome as   �.Moreover, the Dropout operation can be performed to prevent overfitting issues.Eventually, the classification results are obtained in terms of fake or real news.

Table 3: Pseudocode of RDAVA in BiLSTM
When RDAVA is included, Bi-LSTM models perform noticeably better on a variety of tasks, including time series prediction, sentiment analysis, and sequence labelling.RDAVA enhances feature selection through the efficient identification of pertinent features, reduction of dimensionality, and improvement of the model's capacity to capture critical patterns.Furthermore, it facilitates the finetuning of hyperparameters, thereby addressing concerns such as overfitting and underfitting, and optimizes the training procedure by providing direction in seeking optimal weights and biases.The combined utilization of RDAVA and Bi-LSTM models results in accelerated convergence, enhanced solution quality, and improved solution quality.

Result and Discussion
This section describes the proposed model's performance based on performance metrics, including accuracy, specificity, sensitivity, F-measure, and precision.In addition, the error metrics are comprised of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).These metrics evaluate the proposed model using ISOT, BuzzFeed, and FakeNewsNet Datasets.Each dataset is used individually with the proposed classifier to assess the performance of Feature Selection (FS) with RDAVA and 2level classification with RDAVA-optimized Bi-LSTM.
In addition, Table 3 contains the comparison tables for feature selection with and without the RDAVA algorithm on BuzzFeed, FakeNewsNet, and ISOT DSs.This classification aids in determining the effect of improved FS on classification performance.In this analysis, the RDAVA algorithm achieves a higher ACC on BuzzFeed and ISOT DS, obtaining 98% and 98%, respectively.In contrast, the FakeNewsNet DS utilizing the RDAVA algorithm attained 97% accuracy, which is lower than the other datasets.
The performance of 2-level classification with RDAVAbased Optimized Bi-LSTM (Proposed algorithm) is compared to other techniques, including Bi-LSTM, RNN+RBM, RNN, RBM, and ANN.Similar to feature selection-based evaluation, the classification performance is analyzed here.Consequently, we have considered errormeasure-based parameters, including TP, TN, FP, and FN.It is intended that datasets including BuzzFeed, FakeNewsNet, and ISOT were employed on the proposed algorithm for evaluating classification performance.Based on three distinct datasets, the performance of the proposed algorithm is assessed, and the results are tabulated separately to demonstrate that the proposed model is vastly superior to other comparable models.The comparative results of the proposed algorithm were applied on ISOT, Buzzfeed and FakeNewsNet and results are analyzed in table 4, 5 and 6 respectively.Bi-LSTM to achieve superior classification performance.
To enhance the Bi-LSTM-based classification, the RDAVA is also used for hyper-parameter tuning.Fortunately, the proposed algorithm performs well in fake news prediction, achieving 98% accuracy on the BuzzFeed and ISOT datasets but only 97% accuracy on the FakeNewsNet dataset.Two-level analyses, such as feature selection and classification, help demonstrate this point.In the BuzzFeed and ISOT datasets, the RDAVA algorithm performs superiorly for feature selection.Similarly, classification results incorporate RDAVA with two-level classification and Optimized Bi-LSTM more effectively.The generalizability of the suggested system may be restricted due to its dependence on datasets for both training and assessment.Although it achieves good accuracy on selected datasets (BuzzFeed, ISOT, FakeNewsNet), its performance may differ when applied to other datasets from various sources or datasets with different characteristics.
The proposed model will be enhanced by considering the multimodal framework in future work.Moreover, the feature selection model's efficiency will be further enhanced by incorporating a high-level feature selection approach.Future work will be improved by executing a high-level comprehensive performance evaluation and eventually considering the best result for detecting fake news more effectively.

( 26 )
Where   As the partition function, i.e., utilized as the constant for normalization.The following shows the mathematical derivation of the energy function (  , ℎ  ) = −     −  ℎ  ℎ  − ℎ     (27) Where,   and  ℎ are respectively denote the bias of visible and hidden layers. as a weight matrix that incorporated the elements as   .As discussed earlier, the visible and hidden units are independently executed based on specific conditions.The conditional probability is calculated based on energy function and joint distribution.The following equation shows the probabilistic function based on the activation function.�ℎ  = 1|  � =  �   + ∑      � �  = 1|ℎ  � =  �   + ∑   ℎ   � where, ft = Forget state, it = Input state, C′t = Intermediate state,Ct = Cell state and ot = Output state.The complete output of the LSTM operation is based on the cell state with filtered output.The BiLSTM receives the feature vectors from the RNN and the RBM for classifying fake news.The received features are represented as   = 1, 2, ⋯ ⋯   , in which  denotes the current word in the sentence.In the BiLSTM architecture, the forward propagation illustrates data flow in forward order that can be represented as x1 → xn.Similarly, the data flow in the backward order is described as xn → x1.The forward and backward propagation are significant factors of the BiLSTM as the traditional LSTM can only execute the data transmission from front to backflow, the less possibility of maintaining the context information in all the cells.Moreover, the BiLSTM can facilitate the network model to receive the current work with the semantic information by ensuring the context information in all the cells.The feature vectors obtained from RNN and RBM are fused to achieve high-level classification performance.The obtained features are combined by applying the concatenation operator ⨁.For instance, let us consider the RNN-based feature vectors as   =  1 ,  2 , … . .,   and RBM feature as   =  1,  2,……,   .The fused vectors   =   ⨁   Are transferred into the fully connected layer with the Softmax activation function.The probability distribution-based mathematical calculation is executed as follows.Integrating Metaheuristics and Two-Tiered Classification for Enhanced Fake News Detection with Feature Optimization 9 ( = |  ) =

Table 1 ,
which helps to understand all the techniques in the existing approaches and datasets.
Therefore, we have collected research articles from 2019 to 2023.Consequently, we have reviewed 20 recent research articles based on datasets, performance, and techniques.Several techniques were developed to detect Fake news based on several datasets.The literature summary is tabulated in

Table 1 .
Summary of Literature 3.1 Dataset description The ISOT dataset is similar to the BuzzFeed dataset, which is collected from news websites related to PolitiFact.The datasets have two different files such as True.CSV and Fake.CSV.Each dataset has 12600 from reuter.com.Moreover, the dataset is mainly from 2016 to 2017 to the data collected Kaggle dataset [36].