Deep Model Training and Deployment in Heterogeneous IoT Networks

As a typical form of machines learning, deep learning has attracted much attention from researchers. It can independently construct (train) basic rules according to the sample data in the learning process. Especially in the field of machine vision, neural networks are usually trained by supervised learning, that is, by example data and predefined results of example data. In this paper, we firstly overview the current research progress on the deep model training and deployment on the scalable Internet of Things (IoT) networks, by taking into account both the latency and energy consumption. We then summarize the existing challenges on the model training and model deployment on the scalable IoT devices. We further give some feasible solutions to solve the challenges on the model training and model deployment on the scalable IoT devices. The study in this paper can serve as an important reference for the development of deep model training and model deployment for scalable IoT networks.


Introduction
AI usually refers to the architecture constructed by machines (usually computer programs) by imitating or copying human behavior [1][2][3]. The term "AI" covers many sub domains, such as expert systems, pattern analysis systems, or robots. AI based systems will use different methods to simulate or model human behavior and decision-making structure, including statistical algorithms, heuristic programs, artificial neural networks (ANN) or other machine learning derivative technologies [4][5][6].
Machine learning is a sub field of AI, which can be classified into "supervised learning" and "unsupervised learning" [7][8][9]. In supervised learning, the sample data of learning contains both the input data and the corresponding expected results (such as classification) [10][11][12], while in unsupervised learning, the system should determine the possible results of the input data by itself [13][14][15]. As a typical form of machines learning, deep learning has attracted much attention from researchers [16][17][18]. As an artificial neural network, it can independently construct (train) basic rules according to the sample data in the learning process. Especially in the field of machine vision, neural networks are usually trained by supervised learning, that is, by example data and predefined results of example data. Deep learning uses some form of artificial neural network (ANN) technology, so it must be trained with sample data first [19][20][21]. The trained ANN can be used to perform related tasks. The process of using trained ANN is called "inference". In reasoning, ANN will evaluate the data provided according to the learned rules. For example, it is possible to evaluate whether an object in an input image has a defect or not.

Analysis of the current state of research
The acquisition and deployment of intelligent models is the core of implementing B5G edge intelligence [22][23][24]. However, the training of intelligent models relies on superb computing devices, while large-scale intelligent models are difficult to deploy in IoT devices where computing and storage resources are extremely scarce. In this regard, researchers have conducted extensive and in-depth research work to propose a series of efficient and feasible solutions from various metrics such as training and inference latency and energy consumption.
We should study the parallel training of intelligent models. To address the difficulty of storing massive data in a single node in a cloud computing center, researchers investigated a model training strategy based on data parallelism, proposed a partitioning scheme for the data set, and subsequently allocated training samples according to the memory capacity of each computing node within the central cloud, and theoretically demonstrated the convergence of parallel training based on this scheme [25][26][27]. To address the difficulty of training large-scale intelligent models in a single node, researchers proposed a model random partitioning strategy for the structural characteristics of neural networks, which randomly partitioned the model into multiple copies and stored them in different computing nodes, and optimized the transfer process of gradient parameters within computing nodes based on the global topology to improve the training efficiency of intelligent models. In addition, the researchers adopt an asynchronous hierarchical training method for the problem of device dropout during parallel training of the model, and also combine the temporal update of the global with the generation of gradient parameters, which greatly accelerates the training process and improves the robustness of the intelligent learning system.
We should study the communication mechanism of distributed training. For the distributed parameter transmission process, in order to minimize the bandwidth consumption of the transmission process, the researchers investigate the depth gradient compression strategy to sparse the gradients and then send some of the gradient elements at each iteration as a way to reduce the communication overhead of computing node interactions. In addition, the researchers investigate the impact of wireless networks on distributed training, using over-the-air computational transmission compression quantization parameters for multi-access channels to minimize transmission errors by regulating power to achieve low-energy and low-latency distributed training. In addition, the gradient merging transmission of adjacent layers of the deep network can be used instead of the traditional hierarchical transmission to improve the bandwidth utilization, and the iterative delay of the training process can be significantly reduced by optimizing the resource scheduling of the merged transmission.
We should further study efficient model deployment and inference for mobile devices. To overcome the shortage of computing power in mobile devices, the previous work reduces the time complexity of operations by building new convolutional operators and optimizes the feature extraction of intelligent models to significantly reduce the end-to-end inference latency while ensuring a certain accuracy. The researchers further explored the use of pruning techniques to remove redundant information from intelligent models and establish a trade-off between latency, energy consumption and model size to achieve flexible and efficient deep model inference in mobile devices. In addition, to reduce the inference latency, a cloud-based fusion model deployment scheme is proposed to utilize the computing power of mobile devices and the central cloud for accelerated inference, and the scheme also reduces the inference latency and energy consumption by scheduling the computing volume based on the real-time channels of wireless networks. In this aspect of research, the researchers propose a training and deployment mechanism based on multiple exit points for the heterogeneity of computing power presented by mobile devices in the B5G edge intelligence network, and realize real-time scheduling of resources through an efficient greedy strategy, which significantly reduces the overall latency of the intelligent system.

Challenges on model training and deployment across data centers
From the analysis of the above research status, it can be seen that the existing research has conducted in-depth research on the training and deployment of intelligent models based on the central cloud, and has conducted in-depth analysis from multiple perspectives, such as training and deployment latency, energy consumption, communication and computation efficiency, and data security, etc. The performance of model training and deployment has been significantly improved by combining the optimal scheduling of communication and computation resources. These research works provide important references for the training and deployment of intelligent models in B5G edge intelligence networks. However, B5G edge intelligence networks can also be applied to highspeed mobile scenarios, where the cross-data center characteristics have an important impact on the training and deployment of intelligent models. It is a difficult challenge to design a new intelligent model training and deployment scheme for B5G edge 2 EAI Endorsed Transactions on Mobile Communications and Applications 09 2022 -01 2023 | Volume 7 | Issue 3 | e5 intelligence networks by deeply exploring the crossdata center characteristics under high mobility and combining over-the-air computing and federal learning technologies.

Feasible solutions to model training and deployment across data centers
First, we study the efficient aggregation and processing of intelligent models to achieve fast real-time response and decision making at the control layer and improve the efficiency of distributed model training across data centers. Consider an over-the-air federation learning system consisting of a parameter server and L ≥ 0 edge data centers. Under the coordination of the parameter server, the edge data centers aggregate and collaborate to train shared machine learning models through wireless updates. Let the parameter vector w denote this federated learning model, where q denotes the model size; and let D l denote the local dataset of edge data center l, where the dth sample and its label are denoted by x d and y d , respectively. Then, the local loss function of the model vector w on D l is where f (w, x d , y d ) denotes the sample-by-sample loss function that quantifies the prediction error of model w in sample x d for its labels y d , and R(w) is a strongly convex regularization function with hyperparameters ρ ≥ 0 as scaling factors. For the convenience of the representation, f i (w) is replaced by f (w, x d , y d ). Thus, the global loss function for all distributed data sets is F(w) = 1 L l∈L D l F l (w) where D = ∪ l∈L D l , and for simplicity of notation, it is assumed that the size of the local data set in all edge data centers is the same, i.e., D l = |D l | =D. The goal of the model training process is to minimize the global loss function: In addition to uploading all local data directly to the parameter server for centralized training,the learning process can be implemented iteratively in a distributed manner based on the gradient averaging method, i.e., as shown in Fig. 1. In each communication process τ, the machine learning model is represented by w (τ) and each edge data center can use its local dataset D l to compute the local gradient g where ∇ is the gradient operation and it is assumed that the whole local dataset is used to estimate the local gradient. Next, the edge data center sends all local gradients simultaneously to the parameter server and averages them to obtain the global gradient g (τ) = 1 L l∈L g l (τ) . Then, the parameter server broadcasts the global gradient estimate to the edge data center, and the edge device can update the local model based on this estimate: w (τ+1) = w (τ) − η · g (τ) , where η is the learning rate. The above learning process is repeated until the convergence criterion is satisfied or the maximum number of iterations is reached.
An efficient and feasible scheme is to make full use of the superposition characteristics of waveforms in air computing and an efficient model/gradient aggregation technology based on air computing should be studied. Letĥ where p (τ) l is the transmission power, z (τ) is the additive Gaussian white noise, subject to z (τ) ∼ CN (0, N 0 I 0 ), in which N 0 is the noise power density and I 0 is the identity matrix. Therefore, the global gradient estimation of the parameter server isĝ (τ) = y (τ) L . The edge data center can adaptively adjust its transmission power to enhance learning performance. In addition, each edge is limited by the maximum transmit powerP l , i.e., p (τ) l ≤P l , ∀l ∈ L, ∀l, and the average power constraintP l , i.e., 1 L l∈L p i l (τ) ≤P l , ∀l ∈ L. In general, the above constraints need to satisfyP l ≤ P l , ∀l ∈ L.
Secondly, the training accuracy and convergence rate are established as the performance metrics of federated learning, and an accurate and reliable mathematical model is established according to the basic computing theory, communication theory and federated learning framework. Let τ 0 be the required total number of communications, and use F (τ+1) to simplify F w (τ+1) , and let F ⋆ = F w ⋆ . After τ 0 communications, the optimal gap of the loss function, that is, F (τ 0 +1) − F ⋆ , can obtain an upper bound related to the transmission power p (τ) 1 , the learning rate η and τ 0 : , η, τ 0 . Since F ⋆ is a constant, the problem of minimizing F w (τ 0 ) can be approximated as minimizing the upper bound G p (τ) l , η, τ 0 . At the same time, according to different application scenarios, the optimization parameters such as transmission power at the transmitting end and learning 3 EAI Endorsed Transactions on Mobile Communications and Applications 09 2022 -01 2023 | Volume 7 | Issue 3 | e5 rate at the receiving end are designed. According to different performance metrics, an optimization model of cross data center federated learning based on over the air computing is constructed. According to the above model, the obtained optimization problem is modeled as: The constraint space S may vary according to different task requirements. However, this problem is a nonconvex optimization problem with large dimension of design parameters and high complexity of solution. Non-convex optimization, online optimization and other methods can be used to reasonably allocate wireless resources (such as time, bandwidth and transmission power) in combination with deep reinforcement learning and other means, so as to improve the convergence speed of the model and realize efficient air federation edge learning while ensuring the training accuracy.
Further, the deployment and deduction of lightweight intelligent models based on model pruning should be studied to minimize the end-to-end delay and energy consumption in the deduction process. Specifically, according to the different application requirements of B5G edge intelligence, the edge end joint inference method is proposed to perform network cutting / pruning on the artificial intelligence model, so as to obtain the lightweight model under different compression rates from the heavyweight complex model. Then, the artificial intelligence model is mixed and deployed in the edge server and the terminal, so that the distributed computing resources can be used to realize rapid model inference. Take the deep neural network as an example, as shown in Fig. 2. In the process of model inference, the neural network can be cut into two layers, and the bottom network is carried out at the terminal to extract the feature information of real-time data, and compress and transmit it to the edge server. The upper layer network performs model inference at the edge server. Finally, the edge server sends the inference result to the terminal device.
It is assumed that there are L terminal devices in the system. Under the coordination of the edge server, they cooperate to complete the inference of the intelligent model N. According to different application requirements, N is compressed by model compression technologies such as network cutting / pruning, and L lightweight sub networks {N l } and heavy sub networks N 0 are obtained. The structure is expressed as N l , the compression rate is c(N l ), and the computing storage and other resources allocated to it by the node are b(N l ). The energy consumption of completing the sub network is e(N l ) and the delay τ(N l ) are both related to b(N l ). Define a(N l ) as the accuracy index of the sub network, and Ω as the deployment strategy of L lightweight sub networks at different terminals.
Finally, we should optimize the network wide communication computing resources and model deployment strategy. In order to cope with different performance metric constraints, assuming that the maximum computing resource given by the terminal l to the model N k is B l , the energy consumption limit γ E,l and the delay limit γ th,l , the problem can be expressed as a multiobjective optimization problem: where {ϕ l } is the weight coefficient of the edge server, {α 1 , α 2 , α 3 } is the weight coefficient of the performance metric compromise, and l ∈ {1, ..., L}. The weighted sum problem is a non-convex optimization problem, which is difficult to solve directly. Non-convex optimization, machine learning and other methods can be used to optimize data compression and transmission and lightweight network deployment strategies. In combination with computing and communication resource allocation in the network, the end-to-end transmission load can be reduced, and the end-to-end delay and energy efficiency deduced by the model can be optimized.

Conclusions
As a typical form of machines learning, deep learning has attracted much attention from researchers. It can independently construct (train) basic rules according to the sample data in the learning process. Especially in the field of machine vision, neural networks are usually trained by supervised learning, that is, by example data and predefined results of example data. In this paper, we firstly overview the current research progress on the deep model training and deployment on the scalable Internet of Things (IoT) networks, by taking into account both the latency and energy consumption. We then summarize the existing challenges on the model training and model deployment on the scalable IoT devices. We further give some feasible solutions to solve the challenges on the model training and model deployment on the scalable IoT devices. The study in this paper can serve as an important reference for the development of deep model training and model deployment for scalable IoT networks.

Acknowledgements
The work in this paper was supported by the NSFC with grant number 61871235.

Copyright
The Copyright licensed to EAI.