Computer Science, University of Pittsburgh

Computer Science, University of Pittsburgh
Energy-aware Fault-tolerant and Real-time Wireless Sensor Network for Control System Thesis Proposal Wenchen Wang Computer Science, University of Pittsburgh Hello everyone, welcome to my thesis proposal. Today I am going to present our proposal on energy-aware fault-tolerant and real-time wireless sensor network for control system. Committee: Dr. Daniel Mosse, Computer Science, University of Pittsburgh (Advisor) Dr. Rami Melhem, Computer Science, University of Pittsburgh Dr. Youtao Zhang, Computer Science, University of Pittsburgh Dr. Daniel Cole, Mechanical Eng and Materials Science, University of Pittsburgh

Outline Background and Motivation Preliminary work Proposed work
Wireless control systems Major challenges Thesis statement Preliminary work Fault-tolerant network design Network reconfiguration: time-correlated faults Proposed work Network reconfiguration: space-correlated faults Real-time network flow scheduling Timeline During this talk, I will first intro the bg and motiv for my thesis work Then I will talk about two pieces of prelim work, followed by two proposed work The last part is proposed timeline and summary

Background and Motivation
Next I’ll introduce the background and motivation of this proposal

Wireless Control Systems
Ehealth, smart home, power grid etc Wireless control system has been used in many fields, like ehealth, smart home and smart power grid. Background and Motivation

Wired vs. Wireless Control System (WCS)
Sensors Remote Controller Actuator Plant measurements control signal Wired Control System Not easy to do deployment and maintenance Traditionally, control systems are connected by wires. We’re only considering the control system when controllers are on the remote side. The plant is a physical system. The sensors attached to the physical system will send measurement data, such as temperature or pressure to the remote controller periodically and the controller will calculate the action the physical system should do, it’s called control signal and send the control signal to the actuator to actuate the physical system to do that action. However, using wires in the control system is not easy… So people replace wires with wireless communication. Sensors Remote Controller Actuator Plant measurements control signal Wireless Control System (WCS) Wireless Network Background and Motivation

Wired vs. Wireless Control System (WCS)
Sensors Remote Controller Actuator Plant measurements control signal Wired Control System Not easy to do deployment and maintenance While the wireless control system can address the problems of the traditional control system, it has its own limitations, Because wireless network can cause network delay and message losses. The controller may get delayed data, wrong data or no data at all, which will affect the calculation of the control signal. The actuator may receive inaccurate control signal or loss the control signal during the transmission. So the network delay and message loss can have bad effect on the control system. They are called network imperfections. Sensors Remote Controller Actuator Plant Delay and Message Loss measurements control signal Wireless Control System (WCS) Network Imperfections Background and Motivation

Major Challenges of WCS
Instability [Zhang CS’01, Jusuf ICCSII’12] When the physical system is unstable, the plant or the device can be damaged and leads to serious safety issues and financial loss. Performance Degradation [Li ICCPS’16] Network imperfections can induce additional error, network-induced error Network imperfections can bring two major challenges for the control system. First challenge is instability. When the network induces too much delay and message losses, the physical system could be unstable. The plant or part of the plant will be damaged. The other challenge is performance degradation. Network imperfections can induce additional errors, it’s called network-induced error that can degrade the control system performance. For example, the black line is the wired control system output and the red line is the wireless control system output. The difference between them is network-induced error. Instability: use state equation to quantify (xdot = Ax+Bu), the real-part of the eigen value is negative -> stable Wired control system output Wireless control system output Network-induced error Background and Motivation

Current Solutions Control system solution [De AC’ 08, Shi IJC’10] Network solution Fault-tolerance [Han RTAS’11] Real-time scheduling [Hong ECRTS’15] Network and control system co-design solution Limited works Simulator development [Li ICCPS’15] Redesign network protocol [Gatsis ICCPS’16] Limitations No study from network perspective to address control system stability issue No research addressing time/space-correlated link failures in WCS Lack of research on the impact of network real-time performance on control quality The current solutions for the two challenges can be classified as control system solution, network solution and network and control system co-design solution. For The control and network solutions, people only consider the other system as black box and dose not consider the other system’s performance. There is no actual interaction between these two systems. For the network and control system co-design solution, there are limited works in this field. Current research works focus on simulator development and redesign network protocols according to the control system demands. The existing solutions have limitations, there is no study from …, and no research on ,,,. Time-correlated link failures mean link quality varies over time. Space-correlated link failures are normally caused by interference source. Links close to the source are affected with low link quality. It is lack of research on …. Reference: Model predictive control [De AC’ 08] Kalman filter design [Shi IJC’ 10] Lyapunov-based model predictive control of nonlinear systems subject to data losses Kalman filter-based identification for systems with randomly missing measurements in a network environment Background and Motivation

My proposal Sensors Controller Actuator Plant Wireless Network measurements control signal Energy Consumption Our proposal is to design a fault-tolerant and real-time wireless network for the control system to solve the two major challenges. We propose three sub-problems that belong to these two challenges. The first problem is …, which belongs to the first challenge. The last two problems are belong to the second challenge. They are network … and total … During the design of wireless network, we need to consider network energy consumption, since sensors are all powered with batteries. //All these problems consider network energy consumption during network design. Instability P1: how do we guarantee control system stability? P2: how do we reduce network-induced error for a single control system? Performance Degradation P3: how do we reduce the total network-induced error for multiple control systems? Background and Motivation

Thesis statement Is it possible to build a power-aware fault-tolerant real-time wireless sensor network for control system? P1: how do we guarantee control system stability? P2: how do we reduce network-induced error for a single control system? Network reconfiguration: time-correlated faults (completed) [Wang RTAS’17: WiP, Wang ECRTS’17 submitted] Fault-tolerant network design (completed) [Wang IRI’16] Our proposal is about is it possible… ? We present four solutions to the three sub-problems. As I mentioned before, too many message losses can make control system unstable. So we studied a fault-tolerant network design to guarantee control system stability. Since network interference can produce time-correlated link failures that will affect the control system performance, we explored a network reconfiguration framework to reduce network-induced error for a single control system. After studying time-correlated faults, we will study another network reconfiguration scheme to address space-correlated faults. When the network is shared among multiple control systems, the order of measurement transmissions will affect the overall control systems performance. Each control system performance is affected by the delay of measurement transmissions. We will explore a real-time network scheduling scheme to reduce the total network-induced error for multiple control systems. P2: how do we reduce network-induced error for a single control system? P3: how do we reduce total network-induced error for multiple control systems? Network reconfiguration: space-correlated faults (future) Real-time network flow scheduling (future) [Wang RTAS’17: WiP] Background and Motivation

Fault-tolerant Network Design (Completed)
P1: how do we guarantee control system stability? Next I will introduce our first preliminary work to solve the first problem, which is the control system stability guarantee

Background Based on a fault-tolerant wireless protocol: ridesharing [Gobriel SECON’06] TDMA scheduling A node has one primary parent and multiple backup parents Link failures Link success ratio (LSR) Link fails with probability, (1-LSR) Network reliability Delivery ratio (DR) This work is based on … , it is a TDMA scheduling protocol. TDMA scheduling reserves time slots for each node to do transmission. In this protocol, each node … We only consider link failures in this work. We use LSR as an indicator of network interference. Link success ratio is the ratio of number of received messages and transmitted messages by that link. We assume link fails with probability …, We use delivery ratio to indicate network reliability. Delivery ratio is the percentage of messages received by the remote controller. Preliminary Work 1

Background: Our Control System
Primary heat exchanger system (PHX) in a small modular reactor (SMR) of a nuclear power plant (NPP) Transfer power from inside the reactor to the outside Temperature and mass flow rate Next I will introduce the control system we are working on. This is a nuclear power plant with several small modular nuclear reactors. It’s a new trend in today’s nuclear power plant, they use multiple SMRs, instead of one large nuclear reactor. This is a small modular nuclear reactor. We focus on the primary heat exchanger system, which is the green block. PHX is to transfer power from inside the reactor to the outside. The sensors attached to the PHX will transmit temperature and mass flow rate measurements periodically to the remote controller. Preliminary Work 1

Problem Statement Control system stability requirement, network health (NH): 𝑁𝐻= 𝑝 1 ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 2 + 𝑝 2 ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 + 𝑝 3 −(1−𝐷𝑅) where 𝑝 1 , 𝑝 2 , 𝑝 3 are constants when NH≥0, the control system is stable. Objective To satisfy the stability requirement: NH≥0 Minimum energy consumption Solution Fault-tolerant node placement design Computation model design to select the best node placement with minimum number of relay nodes network delay delivery ratio The is the control system stability requirement derived by nuclear control people. We call it network health. It’s in terms of network delay and delivery ratio. P1… are characteristic constants that are specific for a certain control system. They assume when network health is above 0, the control … . Our objective is to satisfy … with minimum energy consumption. To achieve this objective, we propose a solution with two steps. First, we propose a fault-tolerant node placement design and then present a computation model design to select the best node placement with minimum number of relay nodes. Preliminary Work 1

Fault-tolerant Node Placement Design
K-connected region K-edge disjoint paths from sensors to virtual roots Consume fewer nodes, less flexible Relay region One line of primary nodes Several lines of backup nodes Nodes placed as close as possible More flexible, consume more nodes Node placement set creation Activate backup paths/ backup nodes Now I will introduce our fault-tolerant node placement design. This is the remote controller, and this is the nuclear power plant, there are the sensors that will send measurement data all the way to the remote controller. We combine two different node placement strategies, because they both have their own advantages and disadvantages. We divide the network area into two regions, … and … . In k-connected…. In relay region, …. Nodes placed as close as possible to achieve minimum node placement in relay region. After placing the nodes in the network area, we activate different number of backup paths in the k-connected region and number of backup nodes in the relay region, in order to generate different network node placement design. We call these node placements a node placement set. Preliminary Work 1

Computation Model Network health estimation on the node placement set
Delivery ratio Network delay Choose best node placement design for a given average LSR NH ≥ 0 Minimum number of relay nodes -> minimum energy consumption Based on the fault-tolerant node placement design, we then propose a computation model to estimate network health for each node placement design of the node placement set. Since network health is related to delivery ratio and network delay, to estimate network health we need to estimate delivery ratio and network delay for each node placement design. Then we are able to choose the best node placement design. The best node placement is the placement that can meet the stability requirement with minimum number of relay nodes, which means with minimum energy consumption. Preliminary Work 1

Computation Model: Delivery Ratio Estimation
Expected number of messages received by remote controller (RC) DR= 𝑖=1 𝑚 ( 𝑝 𝑅𝐶 𝑖 ×𝑖) , 𝑝 𝑅𝐶 𝑖 is the probability of received i messages by the remote controller, m is the total number of messages sent from sensors State: 𝑚 0 , 𝑚 1 , …,𝑚 𝑛 , 𝑝 𝑖 message-receiving situation for a level sorted array We estimate delivery ratio as the expected number of messages received by remote controller. It’s shown in this equation. Prc(i) is … and m is …. We introduce the state concept to present message-receiving situation for a level. Mi is the number of messages received by one of the nodes in that level. This is a sorted array that sorted by the number of received messages. pi is the probability getting to this state. For example, level l has three nodes. The first node received 2 messages, …. 0.2 is the probability getting to this state. Each level has multiple states, since each level can have different message-receiving situations. We propose an algorithm to calculate PRC(i). The basic idea is, in each level, we enumerate all possible states that can be generated from the previous level. For example, level (l+1) has two states. From each state in level (l+1) can generate one ore more states for next level. We call this process as state generation. In order to reduce computation, we then combine states with the same message-receiving array. And then generate the next level’s states until we get to the remote controller level. The probabilities of …. Algorithm complexity: 𝑙∗ 𝑘=1 𝑚 𝑖=1 𝑛 𝑃(𝑘, 𝑖) probability, depends on LSR level l 2 1 3 1,2,3 ,0.2 Preliminary Work 1

Computation Model: Delivery Ratio Estimation
Expected number of messages received by remote controller (RC) DR= 𝑖=1 𝑚 ( 𝑝 𝑅𝐶 𝑖 ×𝑖) , 𝑝 𝑅𝐶 𝑖 is the probability of received i messages by the remote controller, m is the total number of messages sent from sensors State: 𝑚 0 , 𝑚 1 , …,𝑚 𝑛 , 𝑝 𝑖 𝑝 𝑅𝐶 𝑖 calculation Probabilities of final states at RC level are corresponding to 𝑝 𝑅𝐶 𝑖 (1≤ i ≤𝑚) message-receiving situation for a level sorted array We propose an algorithm to calculate PRC(i). The basic idea is, in each level, we enumerate all possible states that can be generated from the previous level. For example, level (l+1) has two states. Each state in level (l+1) can generate one ore more states for next level. We call this process as state generation. In order to reduce computation cost, we then combine states with the same message-receiving array. These are the final states of level l. This process is called state-combination. Then we generate the next level’s states using the same process until we get to the remote controller level. The probabilities of …. Algorithm complexity: 𝑙∗ 𝑘=1 𝑚 𝑖=1 𝑛 𝑃(𝑘, 𝑖) probability, depends on LSR states of level (l+1) 1 State-generation Intermediate states of level l 𝑝 2 𝑝 3 𝑝 1 𝑝 4 𝑝 5 2 3 4 5 6 State-combination Final states of level l 𝑝 1 + 𝑝 3 𝑝 2 + 𝑝 4 𝑝 5 7 8 9 … Preliminary Work 1

Computation Model: Network Delay and NH
Worst-case network delay ( ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 ) estimation ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 = ∆ 𝑠𝑙𝑜𝑡 𝑁 NH estimation 𝑁𝐻= 𝑝 1 ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 2 + 𝑝 2 ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 + 𝑝 3 −(1−𝐷𝑅) Node placement selection with minimum number of nodes, given LSR TDMA scheduling time slot Total #nodes We estimate worst-case network delay as total number of nodes times the time slot of TDMA scheduling. In this way, we are able to estimate the control system stability requirement, network health for each node placement. Given LSR, we can select the initial node placement with minimum number of nodes. Preliminary Work 1

Evaluation Computation Model Simulation Comparison
Metrics Meaning DR Delivery ratio Network health Minimum number of nodes of computation model results Minimum number of nodes of simulation results Computation Model Up to 3 lines of backup nodes Up to 4-edge disjoint paths Simulation Up to 7 lines of backup nodes TOSSIM simulator [Levis SenSys’03] Comparison Let's switch gear now: I'll show you some results. We evaluate computation model results and simulation results. Then we compare the computation and simulation results. For the computation model, we evaluate up to 3 lines of backup nodes in relay region and up to 4-edge disjoint paths in the k-connected region. For the simulation, we evaluate up to 7 lines of backup nodes on Tossim simulator. We use noise trace from Washington university Testbed [ICCPS15]; CPM model is closest-fit pattern matching noise model proposed by Stanford university. Preliminary Work 1

Computation Model Results
These are the computation model results of delivery ratio estimation and network health estimation. For the delivery ratio estimation, each line represent the delivery ratio of a fixed k connected region. We found that the inflection points …; And the slope …, it is because that the probability of using the last node in one level to handle messages decreases. For the network health estimation results, with these results, we are able to select best node placement given LSR. For example, when LSR is 0.9, the best node placement deign is 28 nodes, because it can meet the system stability requirement with minimum number of relay nodes. Why inflection point happen when there is a complete line of backup nodes. The slope …, it is because that the probability of using the last node in one level handing messages decreases as the number of backup nodes in each level increases. For example, when the average LSR is 0.9, the minimum node placement is 28; when LSR is 0.8, the minimum node placement is 35; when the average LSR is 0.7, the minimum node placement is 48; The inflection points happen when there is a complete line of backup nodes. The slope decreases when adding more lines of backup nodes. With the NH computation results, we can estimate best node placement design for given LSR Preliminary Work 1

Simulation Results Adding more nodes does not always help
These are the simulation results of delivery ratio and network health. We found that adding more nodes does not always help. For the delivery ratio results, when there are 52 nodes in the network, DR already reaches the maximum. And adding more nodes after than does not help in increasing network reliability. For the NH result, after a certain point, nh decreases. Because dr remain the same, adding more nodes will just induce more network delay into the system. Adding more nodes does not always help When there are 52 nodes, DR reaches maximum NH decreases Preliminary Work 1

Minimum Number of Nodes Comparison: Computation vs. Simulation
RSSI LSR LSR stdv MinCMR MinSR Diff -64 0.93 0.020 26 0% -70 0.88 0.024 29 30 -3.4% -76 0.82 0.031 33 32 3.0% -82 0.77 0.035 37 39 -5.4% -84 0.71 0.037 46 42 8.7% This is the comparison of the minimum number of nodes with computation and simulation results. This is the minimum number of nodes of computation results and this is the minimum number of nodes of simulation results for different RSSI values. We conclude that our computation model is accurate with average 4.1 % difference. Difference increases as the LSR stdv increases, since our computation model assume LSR follows uniform distribution, which the LSR follows CPM model in our simulation. CPM model is closest-fit pattern matching noise model proposed by Stanford university. Computation model is accurate with average 4.1% difference. Preliminary Work 1

Network Reconfiguration: Time-correlated Faults (Completed)
P2: how do we reduce network-induced error for a single control system? Here comes our second prelim work to solve the second problem: network-induced error reduction for a single control system.

Background Sensors sense and send measurements periodically to the controller with sensing sampling period Controller calculates control signal with control sampling period In wireless control system, sensors and controllers are able to do sensing and calculation at different frequency. In detail, the sensors sense … and controller calculate … For example, in nuclear power plant system, the sensing sampling period can be 0.05s and control sampling period is 0.1s. Sensors Controller Actuator Plant Wireless Network measurements control signal sensing sampling period 0.05s Control sampling period 0.1s Preliminary Work 2

Problem Statement Trade-off between delivery ratio and delay
Higher delivery ratio -> more redundant nodes -> more delay Optimal network configuration Time-correlated link failures [Baccour TOSN’12] Network reconfiguration Objective: network-induced error reduction for a single control system Solution Network reconfiguration framework We found that there is a trade-off between delivery ratio and delay in our node placement design in the first prelim work. We found that the higher delivery ratio, the more redundant nodes, and the more network delay there is. This motivates us to find an optimal network configuration which can minimize the network-induced error for the control system. Another difficulty is interference and noise can produce time-correlated link failures. Time-correlated link failures means the LSR change over time. This is an example, LSR is around a certain value for a certain duration and then change to another value for another amount of time. This motivate us to explore network reconfiguration at run time. In this work, our objective is to reduce the network-induced error for a single control system. Preliminary Work 2

Network Reconfiguration Framework
Input: network configuration set The network node placement set Offline Optimal network configuration table indexed by LSR values. Online LSR estimation at run time Centralized network reconfiguration This is our proposed network reconfiguration framework. The input of network configuration set. In our case, our network configuration set is the network node placement set of the previous preliminary work. The framework has offline and online parts. In the offline part, we generate an optimal network configuration table and store the table in the remote controller. At run time, the network will estimate the LSR and sends current LSR estimation to remote controller. The remote controller will choose a new network configuration and broadcasts the new configuration to all the nodes in the network to do reconfiguration. This framework is a general framework that can be used in any wireless control system. RC broadcasts the new network configuration to all the nodes in the network. Preliminary Work 2

Offline Computation Network imperfection model
Define total induced delay to the control system ∆ estimation as ∆= ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 + 𝑛 𝑙𝑜𝑠𝑠 ∆ 𝑠𝑠𝑝 ∆ 𝑐𝑠𝑝 ∆ 𝑐𝑠𝑝 𝑛 𝑙𝑜𝑠𝑠 ~ 𝑑𝑟 Control sampling period consecutive message losses sensing sampling period In the offline computation, we propose a network imperfection model. It transfers the network delay and message losses into total induced delay to the control system, we call it delta estimation. It is related to the network delay and number of consecutive message losses. The nloss can be estimated by dr. Preliminary Work 2

Define total induced delay to the control system ∆ estimation as ∆= ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 + 𝑛 𝑙𝑜𝑠𝑠 ∆ 𝑠𝑠𝑝 ∆ 𝑐𝑠𝑝 ∆ 𝑐𝑠𝑝 𝑛 𝑙𝑜𝑠𝑠 ~ 𝑑𝑟 Estimate ∆ for each network placement Optimal network placement Given LSR, placement with minimum ∆ estimation Optimal network placement table indexed by LSR values Control sampling period consecutive message losses sensing sampling period By utilizing the computation model in the first preliminary work, we are able to estimate ∆ for each network node placement design. Then we can select the optimal network configuration for the given LSR value, optimal network placement means the placement with minimum ∆ estimation. we can find optimal network configuration table indexed by LSR values. The optimal in this work does not mean actual optimal, it’s estimated optimal. No one knows what the optimal is, since the network message loss and network delay are unpredictable. Preliminary Work 2

Online Reconfiguration
Remote Controller LSR Estimate node placement 0.8 Placement 1 0.5 Placement 8 0.2 Placement 20 Network Estimated LSR Estimated LSR For the online part of the framework we proposed, we put the offline table in the remote controller. The network estimates LSR periodically and send the current estimated LSR to the remote controller. The remote controller will know the optimal estimated node placement by looking up the offline table. Remote controller runs a centralized online reconfiguration algorithm to select a new node placement and broadcast the new node placement to all the nodes in the network to do reconfiguration. There are two main works in the online part: LSR estimation and online reconfiguration algorithm. LSR estimation Optimal estimated placement Online reconfiguration algorithm New node placement Preliminary Work 2

LSR Estimation During LSR interval (LSRI), each node will record its own average LSR over all its receiving links Every LSRI, each node sends out its own LSR. Parent node will average all its children’s LSRs and its own LSR. RC estimates average LSR over all the links. In order to address time-correlated link failures, we need to do LSR estimation at run time. During LSR interval, each node will … over all its receiving links. For example, if the LSRI is 20s. From time 0 to time 19, node 1 will record its own LSR over the green links. At time 20, each node will send out its own LSR to its parents. The parent nodes will average all its children’s LSR and its own LSR. In this way, remote controller will estimate average LSR over all the links in the network. What’s the difference between the LSR estimation with the others? 𝐿𝑆𝑅𝐼=20𝑠 3 1 2 𝐿𝑆𝑅 1 𝐿𝑆𝑅 2 𝐿𝑆𝑅 3 ∆𝑇=[0𝑠,19𝑠] 𝑇=20𝑠 3 1 2 𝐿𝑆𝑅 1 𝐿𝑆𝑅 2 𝐿𝑆𝑅 3 =( 𝐿𝑆𝑅 1 + 𝐿𝑆𝑅 2 + 𝐿𝑆𝑅 3 )/3 Preliminary Work 2

Centralized Reconfiguration algorithms Direct Jump to Optimal (DO) Multiplicative Increase Conservative Decrease (MICD) Adaptive Control (AC) time # nodes 20 30 𝑡 1 𝑡 2 𝑡 3 𝑡 4 current estimate time # nodes 20 30 𝑡 1 𝑡 2 𝑡 3 𝑡 4 time # nodes 20 30 𝑡 1 𝑡 2 𝑡 3 𝑡 4 After the remote controller knows the current estimated LSR, it will know the estimated optimal number of nodes according to the offline table. We studied three centralized reconfiguration algorithms to select a new network node placement to do reconfiguration. When the estimate number of nodes is greater the current number of nodes, DO will direct jump to the estimated number of nodes. MICD will multiplicatively increases the number of nodes, which means exponentially increase to make sure the network reliability. For AC, it will add number of nodes fast at first, then slowly approaching to the estimate number. 𝑐𝑢𝑟𝑟 𝑛𝑜𝑑𝑒 =𝛼× 𝑐𝑢𝑟𝑟 𝑛𝑜𝑑𝑒 +(1−𝛼)× 𝑒𝑠𝑡 𝑛𝑜𝑑𝑒 DO MICD AC Preliminary Work 2

Centralized Reconfiguration algorithms Direct Jump to Optimal (DO) Multiplicative Increase Conservative Decrease (MICD) Adaptive Control (AC) Considering consecutive losses (CL) Add k more nodes, whenever there are m consecutive losses CL-DO, CL-MICD and CL-AC time # nodes 20 30 𝑡 1 𝑡 2 𝑡 3 𝑡 4 estimate current time # nodes 20 30 𝑡 1 𝑡 2 𝑡 3 𝑡 4 time # nodes 20 30 𝑡 1 𝑡 2 𝑡 3 𝑡 4 When the current number of nodes is less than the estimate number of nodes, DO will direct jump to the estimated number of nodes again. But the MICD will conservatively decrease the number of nodes by 1 at a time to make sure the network reliability. For AC, it will first decrease the number of nodes fast, then slowly approaching to the number of estimated nodes. Since time-correlated link failures could cause consecutive message losses. According to our network imperfection model, consecutive message losses can affect the control system performance. We propose a scheme to add k more nodes, whenever there are m consecutive losses. We add this scheme to the above three algorithms. Then we can have three more algorithms, CL-DO, CL-MICD and CL-AC DO MICD AC Preliminary Work 2

Evaluation Case study: one PHX Simulator: WCPS [Li ICCPS’15]
Offline simulation Static RSSI Online simulation Dynamic RSSI: dynamic LSR over time LSRI Metrics RMS error (RMSE): network-induced error (comparing with wired control system) Network lifetime (days) For the evaluation, we did a case study on one PHX. We use the wcps simulator that can run both network simulator and control system simulator together. We did two types of simulation. For the offline simulation, we simulate different static RSSI values to evaluate network imperfection, For the online simulation, one input is dynamic RSSI values to dynamically change LSR over time. This is an example of RSSI change over time. The other input is different LSRI values. The online simulation is to evaluate the online reconfiguration algorithms. We evaluate two metrics, RMS error, it is the indicator of network-induced error comparing with wired control system. Another metric is network lifetime to evaluate the network energy consumption. Preliminary Work 2

Offline Table Number of optimal nodes increases, as the LSR decreases
This is the offline table that generated from the network imperfection model. The plateaus happen due to the celling in our imperfection model. Number of optimal nodes increases, as the LSR decreases Preliminary Work 2

Network Imperfection Model vs. Offline Simulation Results
We compare the network imperfection model result and offline simulation results. The first figure is the network imperfection model results, which is the computation results of network induced delay. This is the simulation results of power output RMSE, which is the network-induced error. Since the network induced delay is statistically correlated with the power output RMSE and we conclude that our network imperfection model is accurate to estimate the network-induced error. Network imperfection model is accurate Network induced delay is statistically correlated with the power output RMSE (Pearson correlation 𝑟=0.993, 𝑝<0.001) Preliminary Work 2

Online Results: sensitivity analysis of LSRI
This is the online simulation results for sensitivity analysis of LSRI. “static” means the number of nodes in the network is fixed. We test the number of nodes 20 to 50 and choose the best static scheme with minimum RMSE. Even the best static scheme is significantly worse than the dynamic schemes with higher RMSE and shorter network lifetime, which demonstrates that our reconfiguration algorithms are necessary. Best static scheme performs worse than the dynamic schemes Preliminary Work 2

We also found that LSRI value affect the performance of schemes without considering consecutive losses. When the LSRI increases, the RMSEs of the schemes without considering consecutive losses increase. because when the LSRI value is high, LSR estimation is not accurate. Best static scheme performs worse than the dynamic schemes LSRI value affects the performance of schemes without considering CL Preliminary Work 2

But the schemes considering consecutive message losses are not affected by the LSRI values. When the LSRI values increase, the RMS error of schemes considering consecutive message losses stay the same. Because these schemes add additional nodes in the network to guarantee the network reliability. But the side-effect is that they consumes more energy. Best static scheme performs worse than the dynamic schemes LSRI value affects the performance of schemes without considering CL Schemes considering CL are not affected by the LSRI values Preliminary Work 2

Network Reconfiguration: Space-correlated Faults (Future)
P2: how do we reduce network-induced error for a single control system? Next I will introduce our first proposed work to solve the second problem.

Motivation and Problem Statement
Spatial link failures caused by Interference Sources (IS) affect the network reliability [Low CIMCA’05, Fadel CC’15] -> control system performance Mobile phone, WiFi, radio jammer A Mobile IS has not been fully researched in WSN. Objective: network-induced error reduction for a single control system The motivation of this proposed work is spatial … Examples of IS are …. What we want to study is a mobile interference source. This is the IS and the grey area is the interference region. As the interference source moves forward, the links within the grey region will be affected with low quality. A mobile IS has not been fully researched in wsn. Most of the works focus on tracking the IS, but not actually tolerating the space-correlated link failures. Proposed Work 1

Methodology Build a space-correlated fault model with one moving IS
With a certain speed Determines which links fail with what probability Study strategies to tolerate space-correlated link failures Distributed network reconfiguration algorithm Conduct a case study in NPP with a single PHX Compare network reconfiguration strategies with baseline of the second prelim work In order to achieve our objective, we propose three future tasks. First we will build … the model will determine which links fail with what probability: the links that are closer to the interference source will have lower quality than the other links; Second we will study … I will change the original node placement, instead of placing as close as possible. We place node with space. We will propose a distributed network reconfiguration algorithm, instead of centralized algorithms proposed in the 2 prelim work. Proposed Work 1

Real-time Network Flow Scheduling (Future)
P3: how do we reduce total network-induced error for multiple control systems? This is our second proposed work

Motivation: Observations
Test the network-induced error on one PHX Different reference functions with one ramp power change amount (PCA) power change duration (PCD) Different delivery ratio and delay We did a test on one PHX with different reference functions for different network delays and delivery ratios. Reference function is the function that the operator wants the control system output to follow. These are the examples coming from the nuclear control people. For example, if the nuclear operator wants the power of phx to change from 42 MW to 32 MW within 30s. They will set reference function to be ramp30. In our case, a reference function with one ramp is determined with power change amount and power change duration. For example, the PCA of ramp 30 is 10MW and PCD is 30s. Ramp30 PCA: 10 MW PCD: 30s Proposed Work 2

Motivation: Observations
For reference functions with shorter PCDs, the network delay becomes a more significant factor. RMSEs are similar This is the RMSE results for different PCDs with static PCA and delivery ratio. For example, the rmse of pcd 15 with delay 0.2 is similar to the rmse of pcd 45 with delay 0.4. But when PCD is greater than 60, the RMSEs are similar. Note that I make decreases in the X-axis to be visually consistent with the first figure. This is the RMSE results for different PCAs with static PCD and delivery ratio. For example, the rmse of PCA 10 with delay 0.2 is similar to the rmse of pca 6 with delay 0.5. But when PCA is greater than 4MW, the RMSEs are similar. PCA: 10 MW; DR: 0.9 For reference functions with higher PCAs, the network delay becomes more significant factor. RMSEs are similar PCD: 30s; DR: 0.9 Proposed Work 2

Motivation: NPP demands
Multiple Small Modular Reactors (SMRs) in an NPP Different PHX may have different power demands Dynamic application demands -> different reference functions over time Cross-layer real-time scheduling Inject the application demands into the network layer to change measurement deadlines dynamically. Assign smaller deadlines for more urgent application demands (smaller PCDs or larger PCAs) In addition to the observation motivation, we have another motivation of NPP demands. There are multiple small modular reactors in nuclear power plant, multiple PHXs will use one shared wireless network to transmit measurements. Each SMR may have different power demand, which means different reference functions over time. Theses two motivations motivates us to do cross-layer real-time scheduling. We will Inject the application demands into the network layer to change measurement deadlines dynamically. Assign smaller deadlines for more urgent application demands (smaller PCDs or larger PCAs). As far as we know the cross-layer real-time scheduling has not been researched in wireless control system yet. Proposed Work 2

Problem Statement Network flow Control systems application demands
A set of 𝑚 end-to-end network flows F= 𝐹 1 , 𝐹 2 , …,𝐹 𝑚 𝐹 𝑖 associates with one source 𝑠 𝑖 , a destination 𝑑 𝑖 , a period 𝑝 𝑖 , and a deadline, 𝐷 𝑖 Control systems application demands Control systems have different reference functions with multiple ramps Objective: reduce total network-induced errors for multiple control systems: 𝑒𝑟𝑟𝑜𝑟= 𝑖=1 𝑛 𝑅𝑀𝑆𝐸 𝑖 Each flow delivers one measurement to the remote controller. And each network flow associates with one… ; Different physical systems may have different reference functions with multiple ramps. For example, ref1 has 3 ramps and ref2 has 2 ramps. From time t0 to t1, network flows of ref1 have smaller deadlines than the network flows of ref2. From time t1 to t2, network flows of control system 2 have smaller deadlines than the network flows of control system 1. time Required Power 𝑡 0 𝑡 1 𝑡 2 𝑡 3 𝑡 4 ref1 ref2 𝑡 5 Proposed Work 2

Methodology Define the deadline for each network flow, according to the offline control system analysis Related to PCA and PCD Study a cross-layer real-time scheduling algorithm to schedule network flows dynamically. Conduct a case study in an NPP with three PHXs and evaluate the results on WCPS To achieve the objective, we propose three future tasks. Study a cross-layer …, according to the application demands. Proposed Work 2

Summary and Timeline

Summary Challenges Problems Solutions Instability stability guarantee
Fault-tolerance Network Design Performance Degradation Network-induced error reduction for a single control system Network reconfiguration: time-correlated faults Network reconfiguration: space-correlated faults Network-induced error reduction for multiple control systems Real-time network flow scheduling We aim to solve two challenge by three sub-problems. And we propose four solutions for the three problems. Summary and Timeline

Timeline Date Content Deliverable results May. - Aug. 2017
Dynamic network flow scheduling algorithm design and implementation on WCPS Network deadline formulation and a WCS with the function of dynamic network flow scheduling Sep. – Dec. 2017 Measure the performance of WCS with dynamic network flow scheduling A paper for publication Jan. – Feb. 2018 Finish the implementation of bitvector protocol [Wang ICESS’15] and space-correlated fault model on WCPS A WCS with a fault-tolerance protocol to deal with space correlated link failures March. 2018 Come up with a network reconfiguration algorithm and implement it on WCPS A WCS with the function of network reconfiguration for space-correlated link failures April. 2018 Measure the performance of WCS with network reconfiguration mechanism May. – Jun. 2018 Thesis writing Thesis ready for defense Jul. – Aug. 2018 Thesis revising Completed thesis Summary and Timeline

Energy-aware Fault Tolerance and Real-time Wireless Sensor Network for Control System
Challenges Problems Solutions Instability stability guarantee Fault-tolerance Network Design Performance Degradation Network-induced error reduction for a single control system Network reconfiguration: time-correlated faults Network reconfiguration: space-correlated faults Network-induced error reduction for multiple control systems Real-time network flow scheduling

Backup Slides

Contributions and Impact
A computation model to satisfy control system stability with minimum energy consumption A network reconfiguration framework to address time-correlated and space-correlated link failures in wireless control system Exploration of cross-layer network flow scheduling to enhance overall performance of multiple control systems Summary and timeline

Comparison: Computation vs. Simulation

Transform network delay and message losses to total network induced delay 𝑇 𝑢𝑠𝑒𝑑 − 𝑇 𝑠𝑒𝑛𝑠𝑒𝑑 , define total network induced delay ∆ as ∆= ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 + 𝑛 𝑙𝑜𝑠𝑠 ∆ 𝑠𝑠𝑝 ∆ 𝑐𝑠𝑝 ∆ 𝑐𝑠𝑝 Each network configuration corresponds to different ∆ estimation for different LSR values Control sampling period consecutive message losses sensing sampling period ∆ 0 =0.2 ∆ 1 =0.2 ∆ 2 =0.3 ∆ 3 =0.4 ∆ 4 =0.2 ∆ 5 =0.2 𝑀 0 𝑀 1 𝑀 1 𝑀 1 𝑀 4 𝑀 5 Remote controller Sensors 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 𝑀 0 𝑀 1 𝑀 2 𝑀 3 𝑀 4 𝑀 5

Transform network delay and message losses to total network induced delay 𝑇 𝑢𝑠𝑒𝑑 − 𝑇 𝑠𝑒𝑛𝑠𝑒𝑑 , define total network induced delay ∆ as ∆= ∆ 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 + 𝑛 𝑙𝑜𝑠𝑠 ∆ 𝑠𝑠𝑝 ∆ 𝑐𝑠𝑝 ∆ 𝑐𝑠𝑝 Each network configuration corresponds to different ∆ estimation for different LSR values Control sampling period consecutive message losses sensing sampling period 𝑛 𝑙𝑜𝑠𝑠 = 𝑖=0 𝑛 𝑖 1−𝑑𝑟 𝑖 ( 1−𝑑𝑟 𝑖 ≥t)

Static scheme is worse than the dynamic schemes LSRI will affect the performance of schemes without considering CL Schemes considering CL are not affected by the LSRI values “static” is when the number of nodes in the network is fixed. We test the number of nodes 20 to 50 and choose the static scheme with minimum RMSE among these tests. Even the best static scheme is significantly worse than the dynamic schemes, because it consumes the most network energy consumption, and it has the most RMSE, demonstrating that our reconfiguration schemes are necessary and perform better. When the LSRI increases, the RMSEs of the schemes DO, MICD and AC increase, because the estimation is less accurate at high LSRI values. As we can see in this figure, the yellow line is the real LSR and the black line is LSRI with 2s, which tracks the yellow line better than the other lines. CL-* schemes (CL-DO, CL-MICD and CL-AC) are not affected by the LSRI values, because CL-* schemes add additional nodes to make the network robust. But consumes more energy consumption

Sensitivity analysis of 𝛼 values

Online Results: AC vs CL-AC (LSRI=2s)
CL-* schemes add more nodes in the network, when there are consecutive losses

Interference Source Examples
An operator walks around with a mobile phone [Baccour TOSN’12] A mobile robot connected with WiFi [Lin RTSS’09] A mobile radio jammer [Wei FGCS’16] Interference example: office building [Lin RTSS’09], >20% difference PDR: packet reception ratio Reference: Towards Stable Network Performance in Wireless Sensor Networks [Lin RTSS’09]: It is essential to differentiate between stable and unstable links, and give preference to stable links. We notice that these two types of links have different qualities over a long term period, e.g. tens of minutes. Therefore, we propose a new link metric competence to characterize the long term link quality. The competence metric can help choose those good and stable links for routing, and drop those currently good, but unstable ones. However, a system using only long term estimations would react too slowly to link quality changes. To react quickly and provide stable performance, we combine competence with current short term estimations in novel routing algorithm designs, selecting links that are good in both the short and the long term. Proposed work 1

Interference Source Examples
Microwave interference on IEEE [Guo 12’TIM] PER: packet error ratio, 1-PDR Reference: Impacts of 2.4-GHz ISM band interference on IEEE wireless sensor network reliability in buildings Proposed work 1

Distributed Network Reconfiguration Algorithm
Primary node in each level will decide how many nodes to be activated or deactivated Compare with centralized algorithms in prelim work2 More Accuracy Reconfiguration according to average LSR estimation is not enough; Local information will improve space-correlated faults detection and tolerance. Low overhead: save network bandwidth No need to send LSR estimation to the remote controller periodically No need to broadcast new configuration to all the nodes in the network Proposed work 1

Motivation: observations
Set a deadline according to the application demands Small deadline for reference functions with less PCD or more aggressive PCA Cross-layer dynamic schedule the network flows Delay has more significant effect on the control system performance PCD: 30s; PCA: 10MW

Computer Science, University of Pittsburgh

Similar presentations

Presentation on theme: "Computer Science, University of Pittsburgh"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Science, University of Pittsburgh

Similar presentations

Presentation on theme: "Computer Science, University of Pittsburgh"— Presentation transcript:

Similar presentations

About project

Feedback