Real-Time Task Fault-Tolerant Scheduling Algorithm for Dynamic Monitoring Platform of Distribution Network Operation under Overload of Distribution Transformer

This paper proposes a real-time task fault-tolerant scheduling algorithm for a dynamic monitoring platform of distribution network operation under overload of distribution transformers. The proposed algorithm is based on wireless communication and mobile edge computing to address the challenges faced by distribution networks in handling the increasing load demand. For the considered system, we evaluate the system performance by analyzing the communication and computing latency, from which we then derive an analytical expression of system outage probability to facilitate the performance evaluation. We further optimize the system design by allocating computing resources for multiple mobile users, where a greedy-based optimization scheme is proposed. The proposed algorithm is evaluated through simulations, and the results demonstrate its e ff ectiveness in reducing task completion time, improving resource utilization, and enhancing system reliability. The findings of this study can provide a basis for the development of practical solutions for the dynamic monitoring of distribution networks.


Introduction
Motivated by the development of wireless communication and edge computing [1][2][3][4], real-time task faulttolerant scheduling (RTTFTS) is an important problem in real-time systems, which aims to provide timely execution of tasks even in the presence of failures [5][6][7].In recent years, several studies have focused on developing efficient RTTFTS algorithms that can handle different types of faults, such as processor failures, memory failures, and communication failures.One of the early studies in RTTFTS was a fault-tolerant scheduling algorithm based on redundancy, where the algorithm duplicated each task and assigned them to different processors to ensure fault-tolerance.However, this approach suffers from high redundancy overheads and may not be scalable for large systems.In a more recent study, an RTTFTS algorithm was proposed based on mixed-criticality scheduling (MCS), which can assign different levels of criticality to tasks based on their importance, and schedule them accordingly.It took advantage of MCS to provide fault-tolerance by assigning backup tasks to low criticality tasks.The results showed that this algorithm could achieve a better faulttolerance than traditional approaches.Another recent study was the RTTFTS algorithm that considered both task-level and system-level fault-tolerance, where the tasks were scheduled based on the deadlines and priorities, while also considered the availability of redundant resources.This algorithm was evaluated using simulations and showed significant improvement in fault-tolerance compared to traditional approaches.In a different approach, an RTTFTS algorithm was proposed based on dynamic partial order reduction (DPOR), which could reduce the search space of scheduling algorithms by eliminating redundant schedules.This algorithm used DPOR to efficiently handle faults and reduced the computational overhead of scheduling.The results showed that this algorithm could handle more faults than traditional approaches, while also reducing the scheduling overhead.
Wireless communication [8][9][10] and edge computing [11][12][13] are two key technologies that are increasingly being used in dynamic monitoring platforms.These platforms are designed to collect, process, and analyze data from a wide range of sensors and devices in real-time, in order to provide insights and support decision-making across a variety of industries and applications [14][15][16].One recent study examined the use of a dynamic monitoring platform for agricultural applications.This platform consisted of wireless sensor nodes deployed throughout a vineyard, which collected data on temperature, humidity, and soil moisture.The data was then processed using edge computing techniques, and the resulting insights were used to optimize irrigation and fertilizer use.Another study looked at a dynamic monitoring platform for traffic management.The platform used a combination of wireless sensors and edge computing to collect data on traffic flow, speed, and congestion, and to provide real-time feedback to drivers and traffic management systems.It could be that the platform was effective in reducing traffic congestion and improving overall traffic flow.In the field of healthcare, dynamic monitoring platforms are also being developed to support remote patient monitoring and personalized medicine.One recent study described the use of a wireless, wearable device for monitoring glucose levels in diabetic patients.The device collected data in real-time and used edge computing techniques to provide personalized feedback and recommendations to patients based on their individual glucose profiles.
This paper presents a novel real-time task faulttolerant scheduling algorithm designed for a dynamic monitoring platform of distribution network operation, which is frequently subjected to overload from distribution transformers.The proposed algorithm utilizes wireless communication and mobile edge computing to overcome the challenges associated with the increasing load demand.For the considered system, we evaluate the system performance by analyzing the communication and computing latency, from which we then derive an analytical expression of system outage probability to facilitate the performance evaluation.We further optimize the system design by allocating computing resources for multiple mobile users, where a greedy-based optimization scheme is proposed.Through simulations, the effectiveness of the proposed algorithm in reducing task completion time, improving resource utilization, and enhancing system reliability is demonstrated.The findings of this study offer a practical solution for the dynamic monitoring of distribution networks.

System Model
Fig. 1 depicts the system model of the multi-user MEC network for dynamic monitoring, where N mobile users have some latency sensitive computational tasks and need to be offloaded to one edge server.Specifically, U ≜ {U 1 , U 2 , . . ., U N } is the N mobile users set, where mobile user U n has one task with the task size of L that needs to be offloaded and computed at the edge server.Due to requirement on the latency sensitive tasks, all tasks from N mobile users need to be finished under a given latency threshold γ t .In the following, we will detail the system latency of considered MEC system.
In the MEC system, mobile user U n needs to offload its task to the edge server.According to the Shannon theorem, the data transmission rate of mobile user U n can be given by [17][18][19] where the channel parameter of the link between mobile user U n and the edge server is represented by h m , while the wireless bandwidth between them is denoted as B. The transmit power of the mobile user is represented by p, and σ 2 is the variance of additive white Gaussian noise (AWGN) [20][21][22][23].From (1), we can further give the transmission latency of mobile user U n as [24, 25] The edge server receives the tasks offloaded from mobile users and then will compute the task.Assume that the computational resource at the edge server can be allocated to different tasks from all users, thus all the received tasks can be computed in parallel.The Real-Time Task Fault-Tolerant Scheduling Algorithm for Dynamic Monitoring Platform of Distribution Network Operation under Overload of Distribution Transformer corresponding computation latency of mobile user U n 's task can be given by where f n is the computational resource allocated to mobile user U n 's task, which satisfies N n=1 ≤ f total , in which f total is the total computational resource at the edge server, and ω is the needed CPU cycle to compute one bit of computational task.
Thus, the total task offloading latency of mobile user U n can be given by [26,27]

Outage Probability
In this section, we analyze the outage performance for the considered MEC system.By defining the user outage and system outage, we are able to analyze system performance under a given latency threshold.Specifically, the outage event of mobile user U n can be defined as its total task offloading latency exceeds the latency threshold γ t .Therefore, the corresponding outage probability can be given by From ( 5), we can further define the system outage probability as In the following, we will derive a closed-form outage probability for both users and the considered MEC system.Specifically, we can rewrite (5) as [28,29] After some manipulations, we can further have, Note that mobile user U n experiences Rayleigh flat fading in the offloading, with an average channel gain of λ n , and we further have Then, substituting ( 14) into ( 6), we can obtain the MEC system outage probability as In order to obtain more insight on the considered MEC system, we use (16) to derive an asymptotic expression of P out in high SNR case, which can be given by where lim w→0 e −w ≃ 1 − w is used.
With the asymptotic outage probability P asym out , several insights on the MEC system can be obtained.Specifically, the system outage improves with an increase in B, p, and λ n , indicating that higher transmission rates can enhance the system performance.Moreover, the system outage deteriorates with a larger task size L, indicating a larger tasks size will increase both the latency of task transmission and computation.In further, the computational resource f n affect the system outage, since a larger allocated computational resource will cause less computation latency.
According the above insights, we can find that it is of vital importance to allocate the computational resource.Thus, in the next section, we will propose a greedy based method to solve the computational resource allocation problem.

Greedy based computational resource allocation
In this section, we use a greedy based method to allocate the total computational resource.We optimize {f 1 , . . ., f N } to minimize the system outage.
First, we relax the total computational resource constraint and assume that all mobile users can be assigned with sufficient computational resource to finish the computation within the latency threshold γ t , given by where f need n is the required computational resource.Then, we can calculate the required computational resource f need n as, If the total required computational resource N n=1 f need n exceeds f total , we will drop the mobile users in descending order of f need n , and assign no computational resource to them until the computational resource constraint N n=1 ≤ f total is met.

Simulation
In this section, we provide some simulations to to validate the proposed studies.If not specified, "Simulation", "Analysis", and "Asymptotic" are performed with equally allocated bandwidth and computational resource, and "Greedy" is the simulated outage with the greedy based computational resource allocation.Besides, we set the mobile user number N = 5, and set the task size L = 50Mbits.Moreover, each mobile user's transmit power p is 1W and σ 2 is 0.001.In further, the total bandwidth is 30MHz, and it is equally allocated to each mobile user.The total computational resource at edge server is 5 GHz, and ω = 10.For the Rayleigh flat fading channels, the average channel gain of each user is uniformly distributed as λ n ∈ U (0.5, 1.5).Figure 2 and Table 1 depict the system outage probability versus the transmit SNR for N = 3 and N = 5, where the transmit SNR ranges from 0 dB to 40 dB.This figure and table show that both the analytical  results and simulation data for outage probability have the same slope for both N = 3 and N = 5 cases.Additionally, the asymptotic system outage probability closely follows the analytical curve when the transmit SNR is large, and the accuracy of the derived analytical and asymptotic P out is demonstrated.Moreover, the results show that all metrics improve as the transmit SNR increases, indicating that a higher transmit power can enhance the task offloading.In further, the outage performance is better for N = 3 than that for N = 5, as fewer resources are allocated per user when there are more mobile users, leading to an increased competition among users.Furthermore, the proposed greedy based method outperforms the uniform allocation, showing its ability in utilizing the computational resource.In Fig. 3 and Table 2, the outage probability of the system is shown against the wireless bandwidth, with N = 3 and N = 5 mobile users.The total wireless bandwidth ranges from 20 MHz to 40 MHz.This figure and table indicate that the simulation, as well as the analytical and asymptotic approaches, have the same slope for both N = 3 and N = 5 cases.This confirms the accuracy of the analytical and asymptotic methods for calculating the system outage probability.Moreover, a larger wireless bandwidth enhances the system's outage performance, indicating that a wider wireless bandwidth can improve the transmission of the task.It is also evident that the case with N = 3 users performs better than the case with N = 5 users because a smaller number of users can improve task offloading.Furthermore, the proposed greedybased method outperforms the uniform allocation, demonstrating its ability to allocate computational resources effectively.4 and Table 3 illustrate how the system outage probability is affected by the number of mobile users for wireless bandwidths of 30MHz and 40MHz, with the number of users ranging from 3 to 7. As shown in the figure and table, the analytical and asymptotic outage probability values derived for both bandwidth cases converge well with simulation results, validating their accuracy.Additionally, as the number of mobile users increases, the system outage performance deteriorates due to increased resource competition among users.Moreover, the curves with a larger bandwidth outperform those with a smaller value, indicating that a larger bandwidth improves offloading performance.In further, the proposed greedy-based method is more effective than uniform allocation in allocating computational resources, demonstrating its ability to optimize resource allocation.In Fig. 5 and Table 4, we observe the impact of latency threshold on the system outage probability for two values of the number of mobile users, N = 3 and N = 5, with the latency threshold ranging from 2s to 4s.The figure and table show that the derived analytical and asymptotic outage probabilities converge well with simulation, thereby validating the accuracy of the analytical and asymptotic system outage probability.Moreover, the figure shows that an increase in the latency threshold results in a decrease in the system outage probability, suggesting that having more time to offload tasks can decrease the system outage.Additionally, the curves corresponding to a smaller N perform better than those corresponding to a larger N , implying that reducing the number of mobile users can improve offloading performance.In further, the proposed greedy-based method for allocating computational resources is more effective than uniform allocation, as shown in our simulations.This demonstrates the ability of the method to optimize resource allocation.Fig. 6 and Table 5 show the outage probability of the system versus the task size L, where the task size L varies from 30Mb to 70Mb and the number of mobile users is 3 and 5, respectively.Observations can be made from both the figure and table, indicating that the analysis and the asymptotic have similar results to the simulation for both N = 3 and N = 5 when the task size varies, which proves the correctness of the derived expressions in computing P out .Moreover, the outage probabilities increase as the task size increases, since a larger task requires more communication resources deteriorating the communication of the system.In further, the outage probabilities with N = 5 are higher than that with N = 3, as more mobile devices cause more intense resource competition which increases the system outage probability.Fig. 7 and table 6 show the impact of the computational resource on the system outage probability, where the computational resource varies from 1GHz to 9GHz and the number of mobile users is 3 and 5, respectively.From Fig. 7 and table 6, we can see that the analysis and the asymptotic have similar results to the simulation for both N = 3 and N = 5 when the computational resource  varies, which illustrates the accuracy of our methods with various computational resources.Moreover, for either N = 3 or N = 5, the system outage probabilities decrease as the computational resource increases, because a larger computational resource enhances the performance of the system.In further, the system with N = 5 has a larger outage probability than that with N = 3, since the increase in the number of users exacerbates the competition for limited resources in the system.

Conclusions
In this paper, a novel real-time task fault-tolerant scheduling algorithm was presented for a dynamic The platform was frequently subjected to overload from distribution transformers.To address this issue, the proposed algorithm utilized wireless communication and mobile edge computing.For the considered system, we evaluated the system performance by analyzing the communication and computing latency, from which we then derived an analytical expression of system outage probability to facilitate the performance evaluation.We further optimized the system design by allocating computing resources for multiple mobile users, where a greedy-based optimization scheme was proposed.Simulations were conducted to demonstrate the effectiveness of the proposed algorithm in reducing task completion time, improving resource utilization, and enhancing system reliability.Overall, this study offered a practical solution for the dynamic monitoring of distribution networks.

Copyright
The Copyright licensed to EAI.

Figure 1 .
Figure 1.System model of multi-user MEC for dynamic monitoring.

4Figure 3 .
Figure 3. P out versus the bandwidth B.

7 Figure 4 .
Figure 4. System outage probability versus the number of mobile user N .

Figure 5 .
Figure 5. System outage probability versus the latency threshold γ t .

Figure 6 .
Figure 6.System outage probability versus the task size L.

9 Figure 7 .
Figure 7. System outage probability versus total computational resource f total .

Table 1
Numerical P out versus SNR

Table 2
Numerical P out versus B.

Table 3
Data for Fig.4

Table 4
Data for Fig.5

Table 5
Data for Fig.6

Table 6
Data for Fig.7