Research on the Performance of Text Mining and Processing in Power Grid Networks

This paper employs deep learning technique to perform the research of text mining for power grid networks, focusing on fundamental elements such as loss and activation functions. Through some analysis and formulas, we explain how these functions contribute to deep learning. We also introduce major deep learning training models, including CNN and RNN, and provide visual aids to aid understanding. To demonstrate the impact of various factors on deep learning training, we employ control variable experiments to analyze the influence of factors such as learning rate, batch size, and data noise on model training trends. While the influence of hyperparameters and data noise are covered in this paper, other factors such as CPU and memory frequency, as well as GPU performance, also play a crucial role in deep learning training. Therefore, continuous adjustments to various factors are necessary to achieve optimal training results for deep learning models in power grid networks.


Introduction
As a typical form of Internet of Things (IoT) networks [1][2][3], the power grid network is a critical infrastructure that provides electricity to consumers across various regions.Research on power grid networks focuses on improving the efficiency, reliability, and resilience of the network while also ensuring the safety of the users and the environment.Recent research in this field has explored the integration of renewable energy sources, such as wind and solar, into the power grid network.This requires the development of new technologies to manage and balance the fluctuating supply of renewable energy, which can affect the stability of the grid.Another area of research in power grid networks is the use of smart grid technologies.Smart grids incorporate advanced communication and control systems that enable real-time monitoring and optimization of the power grid network [4][5][6].This allows for more efficient energy distribution, reduced energy waste, and improved grid reliability [7,8].Research is also being conducted to address the * Corresponding author.Email: yuzhong_zhou@hotmail.com cybersecurity threats to power grid networks.As power grid networks become more interconnected and reliant on digital technologies, the risk of cyber attacks increases.Developing effective cybersecurity measures is crucial to ensuring the safety and reliability of the power grid network.
Text mining plays an important role in the power grid networks, which is useful for the system operation and running [9][10][11].Text mining is a research field that focuses on using computational techniques to extract useful information and knowledge from large collections of textual data.This field encompasses a range of techniques and applications, including natural language processing (NLP), machine learning, and data mining.One area of active research in text mining is sentiment analysis.Sentiment analysis aims to automatically identify the sentiment of a text, such as positive, negative, or neutral.This is important for applications such as opinion mining, customer feedback analysis, and social media monitoring.Researchers are developing new algorithms and techniques to improve the accuracy and effectiveness of sentiment analysis.Another area of research in text mining is topic modeling, to identify latent topics in a collection of documents.This is useful for tasks such as information retrieval, document clustering, and trend analysis.Researchers are exploring new methods to improve the efficiency and scalability of topic modeling, such as distributed and parallel algorithms.Named entity recognition (NER) is another area of active research in text mining [12][13][14].NER is the process of identifying and classifying named entities, such as people, organizations, and locations, in text.This is important for applications such as information extraction and text summarization.Researchers are developing new methods to improve the accuracy and robustness of NER, such as deep learning-based approaches.Finally, researchers in text mining are exploring new applications of the field, such as fake news detection, biomedical text mining, and social media analysis.These applications are becoming increasingly important in today's data-driven world, and researchers are developing new techniques to address the unique challenges posed by these domains.In a word, the current research in the text mining is focused on developing new algorithms and techniques to extract useful information and knowledge from large collections of textual data.
Deep learning-based models have already surpassed classical machine learning-based approaches in various text mining fields [15][16][17].One of the key steps in traditional machine learning is to extract features, but feature extraction has a disadvantage of relying mainly on human labor and each extraction method is not universal.Deep learning dose not rely on human for feature extraction, but is automatically extracted by machines, which gives deep learning excellent expressiveness.Not only that, deep learning is highly dependent on data, the larger the amount of data the better it performs, and sometimes the upper limit can be raised by tuning the parameters, even in some text mining fields it has already exceeded the human performance.It has a wide range of applications, such as the ability to answer questions, handle spam, chat on electronic devices, and more.The most popular model is the transformer model, which is widely used in the field of natural language processing (NLP), such as the famous BERT and GPT-3 models.In recent years, a new algorithm for analyzing written language, named Neural Analysis of Sentiment (NaSent), has been proposed to better understand the emotions that flow between words.
Motivated by the above literature review, this paper employs the deep learning technique to perform the research of text mining for power grid networks, focusing on fundamental elements such as loss and activation functions.Through images and formulas, we explain how these functions contribute to deep learning.We also introduce major deep learning training models, including CNN and RNN, and provide visual aids to aid understanding.To demonstrate the impact of various factors on deep learning training, we employ control variable experiments to analyze the influence of factors such as learning rate, batch size, and data noise on model training trends.While the influence of hyperparameters and data noise are covered in this paper, other factors such as CPU and memory frequency, as well as GPU performance, also play a crucial role in deep learning training.Therefore, continuous adjustments to various factors are necessary to achieve optimal training results for deep learning models in power grid networks.

Basic Priciple of Deep Learning
Text mining is theoretically based on statistics and computer linguistics.Deep learning-based text mining is a process that relies on information retrieval techniques to extract meaningful, implicit and useful information from large amounts of text data.The basic principle is to transform text data into sequences that describe the content of the text, and then use techniques such as classification and clustering to classify the text into organised groups and discover new concepts and corresponding relationships based on the structure.
The main steps of deep learning-based text mining are as follows: in the first step, the creation of the dataset is necessary.In the second step, dataset needs to be scaled down to improve the operational efficiency.In the third step, the classification or clustering operation should be performed to form the corresponding knowledge patterns.In the fourth step, the evaluation of the quality of the knowledge patterns should be performed, to analyze modify one of the previous links and then further mining.Here we take word2vec as an example to explain the deep learning based text mining and processing.

Word2Vec
In Word2Vec, the CBOW (Continuous Bag of Words) and the Skip-Gram model can quickly train wordembeddings and calculate word vectors to operations, where the principle of CROW and Skip-Gram are shown in Fig. 1 and Fig. 2, respectively.The concept of word-embedding is that the text can be scattered and embedded into another discrete space.Most supervised machine learning models can be summarized as the functional form and so can word2vec.The ultimate goal of word2vec is to be able to obtain the word vector matrix after the model has been trained.As shown in Fig. 3, deep learning-based text-mining uses one-hot encoding as the input value to multiply with the weight matrix from the input layer to the hidden layer, which also enables to obtain the Ndimensional word vector of the input words.
The N -dimensional word vector is then multiplied with the weight matrix from the hidden layer to the output layer to obtain a score for the input word, with higher scores representing more accurate predictions.

ACTIVATION FUCTIONS
Activation functions are mathematical operations applied to the output of a neural network layer to introduce nonlinearity into the network.In deep learning, activation functions are used in almost every neural network layer to help the network learn complex representations of data.There are several activation functions used in deep learning, including: • Sigmoid function: This function maps any input to a value between 0 and 1, which is useful for binary classification problems, given by [4,18,19] • ReLU function: The Rectified Linear Unit (ReLU) function maps any negative input to zero and any positive input to the input value.ReLU is popular because it's simple and computationally efficient, given by ReLu(x) = max(0, x). (5) • Tanh function: The Hyperbolic Tangent (tanh) function maps any input to a value between -1 and 1, which is useful for multi-class classification problems, given by tanh(x) = (exp(x) − exp(−x))/(exp(x) + exp(−x)).
• Softmax function: The Softmax function maps a vector of input values to a probability distribution over a set of classes, making it ideal for multi-class classification problems, given by • Leaky ReLU function: Leaky ReLU is a variation of ReLU where the negative input values are mapped to a small, non-zero value instead of zero, to prevent dead neurons in the network.
• ELU function: Exponential Linear Unit (ELU) function is similar to the Leaky ReLU, but with negative input values, it maps them to a small, non-zero value, which is scaled by an exponential factor.
• Swish function: Swish function is a new activation function that is similar to the ReLU function, but has a smoothly varying output, which can result in a better performance in certain types of models.
Choosing the right activation function depends on the type of problem to be solved and the properties of the data.In this work, we choose the ReLU function as the activation function, without loss of generaltiy.

LOSS FUNCTIONS
The loss function is also a crucial part for deep learning model training.Generally speaking, the objective function of the deep learning model is to reduce the error between the real value and the predicted value.The smaller the error, the closer the predicted result of the model is to the real value, and the better the model performance will be.The loss function plays an important role because it must faithfully reduce aspects of the model to a single number, and improvements in that number are used to measure improvements in model performance.The choice of loss function often determines whether the model can achieve the performance results we want and whether it can perform the tasks we need.Here are two commonly used loss functions.One loss function is based on the mean squared error (MSE), where y i is the true label and ŷi is the predicted label for the i th sample, and n is the number of samples.
Another typical form is based on the binary crossentropy (BCE), where y i is the true label (either 0 or 1) and ŷi is the predicted probability that the i th sample belongs to the positive class, and n is the number of samples.Note that BCE loss is commonly used for binary classification problems.In this work, we adopt BCE as the loss function, without loss of generality.

Evaluation metrics and Influencing factors of Training results in Deep Learning
In this part, we investigate the training result performance by influencing factors and evaluation metrics.The evaluation metrics are shown as

P recision = T P T P + FP
(10) where T P is denoted as the number of samples which are correctly classified into positive class, T N is the number of the samples that are correctly classified into negative classes, FN is the number of samples that are misclassified into negative samples, FP is the number of the samples that are misclassified into positive samples, N is the number of samples, F1 − score is the reconciled average of accuracy and recall, Acc describes how many of all predicted positive cases match the actual situation match.
In machine learning, training loss indicates the error between the prediction result of the training set in the model and the true result, and is used to measure the fitting ability of the neural network model trained by the training set.Valid loss denotes the error between the prediction result of the validation set in the model and the true result, and is used to measure the difference between the effect of the neural network model on the test set and the true situation, which can be regarded as the generalization ability.Both the training and valid loss are decreasing indicating that the neural network is still learning, which is the best case; the training loss is decreasing while the valid loss tends to be constant indicating that the neural network is overfitting.When both the training and valid loss tend to be constant, it means that the learning of the neural network has encountered some obstacles and the learning rate needs to be reduced.When both the training and valid loss the step size of parameter update in each iteration.If the learning rate is set too large, the step size of each parameter update will also become too large, which may lead to unstable oscillation.If the learning rate is set too small, the step size of each parameter update will also become too small, resulting in a slower training speed and an increase in the number of iterations required for the training process, which may even cause the model to fail to reach the optimal point or fall into a local optimum prematurely.A suitable learning rate can ensure that the model converges to the optimal point steadily during the training process, and the training speed can also be guaranteed.In general, the appropriate learning rate needs to be determined according to the specific problem and model structure.the training loss with 3 epochs is 0.008 or 0.006 when the batch size is 4 or 8.The similar observation can be also found in Fig. 7, where the valid loss also decreases with the number of epochs, indicating the efficiency in the valid process.In particular, the valid loss with 1 epoch is 0.114 or 0.088 when the batch size is 4 or 8; the valid loss with 2 epochs is 0.103 or 0.1 when the batch size is 4 or 8; the valid loss with 3 epochs is 0.113 or 0.088 when the batch size is 4 or 8.
Fig. 8 shows how noise affects the raw data and its impact on the deep learning model training.Note that the noise in the data may make the model training more difficult because the model needs to learn how to distinguish between the signal and noise.In some cases, noise may overfit the model because the model will overfit the noise instead of the real data.As a result, noise can degrade the performance of the model.Although noise may degrade the performance of the model, in some cases it may help improve the generalization ability of the model.When the model is able to learn the commonality of real data from the noise, it will be more robust and able to handle new data better.Understanding the laws affecting data noise and adopting appropriate processing methods can help improve the performance and generalization ability of deep learning models.

Conclusion
This paper explored the application of deep learning in text mining for power grid networks, with a focus on important concepts like loss and activation functions.Visual aids and mathematical expressions were used to demonstrate how these functions contributed to deep learning.To investigate the impact of various factors on deep learning training, control variable experiments were conducted, and the influence of factors such as learning rate, batch size, and data noise on model training trends was analyzed.While hyperparameters and data noise were discussed in detail in this paper, it should be noted that other factors such as CPU and memory frequency, as well as GPU performance, also played a critical role in deep learning training.Therefore, constant adjustments to multiple factors are necessary to achieve optimal training results for deep learning models.

Copyright
The Copyright licensed to EAI.

Figure 3 .
Figure 3. Principle of deep learning based text mining.

Figure 8 .
Figure 8.The original data and the data after adding noise.

Fig. 6
Fig.6shows the trend of training loss versus the number of normalized epochs, where the batch size is set to 4 or 8. Fig.7illustrates the trend of valid loss versus the number of normalized epochs, with batch size of 4 or 8. From Fig.6, we can find that the system training loss decreases with the number of epochs, indicating the efficiency in the training process.In particular, the training loss with 1 epoch is 0.065 or 0.05 when the batch size is 4 or 8; the training loss with 2 epochs is 0.023 or 0.017 when the batch size is 4 or 8; the training loss with 3 epochs is 0.008 or 0.006 when the batch size is 4 or 8.The similar observation can be also found in Fig.7, where the valid loss also decreases with the number of epochs, indicating the efficiency in the valid process.In particular, the valid loss with 1 epoch is 0.114 or 0.088 when the batch size is 4 or 8; the valid loss with 2 epochs is 0.103 or 0.1 when the batch size is 4 or 8; the valid loss with 3 epochs is 0.113 or 0.088 when the batch size is 4 or 8.Fig.8showshow noise affects the raw data and its impact on the deep learning model training.Note that the noise in the data may make the model training more difficult because the model needs to learn how to distinguish between the signal and noise.In some cases, noise may overfit the model because the model will overfit the noise instead of the real data.As a result, noise can degrade the performance of the model.Although noise may degrade the performance of the model, in some cases it may help improve the generalization ability of the model.When the model is able to learn the commonality of real data from the noise, it will be more robust and able to handle new data better.Understanding the laws affecting data noise and adopting appropriate processing methods can help improve the performance and generalization ability of deep learning models.

t-2 W t-1 W t+1 W t+2 W t Figure
The CBOW model is equivalent to predicting the current word by context, i.e., using w t−2 , w t−1 , w t+1 , w t+2 to predict w t .In contrast, the SkipGram model is equivalent to predicting context by 2 EAI Endorsed Transactions on Scalable Information Systems | Volume 10 | Issue 5 | t Figure 1.Principle of CBOW W 2. Principle of SkipGram the current word, i.e., using w t to predict w t−2 , w t−1 , w t+1 , w t+2 .