Growth Codes: Maximizing Sensor Network Data Persistence Abhinav Kamra, Vishal Misra, Dan Rubenstein Department of Computer Science, Columbia University.

Growth Codes: Maximizing Sensor Network Data Persistence Abhinav Kamra, Vishal Misra, Dan Rubenstein Department of Computer Science, Columbia University Jon Feldman Google Labs ACM SIGCOMM 2006

Outline  Problem Description  Solution Approach: Growth Codes  Experiments and Simulations  Conclusions

Background: A generic sensor network Sink(s) Sensor Nodes Data follows multi-hop path to sink(s) Sensed Data x1x1 x9x9 x 10 x 12 x 11 x 13 x4x4 x5x5 x6x6 x3x3 x2x2 x8x8 x7x7 A few node failures can break the data flow Generic Aim: Collect data from all nodes at sink(s)

Data Persistence We define data persistence of a sensor network to be the fraction of data generated within the network that eventually reaches the sink. Focus of Work: Maximizing Data Persistence

Specific Context: Disaster Scenarios  e.g., Monitoring earthquakes, fires, floods, war zones  Problems in this setting  Congestion near sink(s)  All nodes simultaneously forward data  Overwhelm sink(s) capacity Congestion near sink Virtual queue:

Specific Context: Disaster Scenarios - 2  Problems in this setting  Network Collapsing: nodes failing rapidly  Pre-computed routes may fail  Data from failed nodes can be lost  Data Recovery from subset of nodes acceptable

Challenges  Networking Challenges:  Disaster scenarios: feedback often infeasible  Frequent disruptions to routing tree if setup  Difficult to predict node failures: sink locations unknown, surviving routes unknown  Difficult to synchronize nodes’ clocks  Coding Challenges:  Data source distributed (among all sensor nodes)  Prior approaches ( Turbo codes, LDPC codes ) aim at fast complete recovery  Sensor nodes have very limited memory, CPU, bandwidth

Maximize Data Persistence Preserve data from failed sensor nodes Deliver data to sink(s) as fast as possible Objectives 6 of 10 symbols reach sink. Persistence = 60% Fraction of data that eventually reaches the sink(s) x1x1 x9x9 x5x5 x3x3 x2x2 x8x8 x 10 x 12 x 11 x6x6 + = Sink Data Persistence

Limitations of Previous Work  Channel Coding based (e.g. Turbo Codes [Anderson-ISIT94], LT Codes [Luby02] )  Aim for complete recovery in minimum time  Difficult to implement with distributed sources  Routing-based (e.g. Directed Diffusion [Govindan00], Cougar [Yao-SIGMOD02] )  Conjecture: Too fragile (disrupted easily) for disaster scenarios

Our Approach  Two main ideas  Randomized routing and replication  Avoid actively maintaining routes  Replicate data to increase data survival  Distributed channel codes (Growth Codes)  Expedite data delivery & survivability First (to our knowledge) distributed channel codes

Outline  Problem Description  Our Solution: Growth Codes  Experiments and Simulations  Conclusions

Network Assumptions  N node sensor network  Limited storage: each node stores small # of data units  Large storage at sink(s): sink receives codewords from random node(s)  All sensed data assumed independent (no source coding) 5 1 4 3 7 2 6 S S

Terminology  Codewords  linear combinations of (randomly selected) groupings of data units  original data or XOR’d conglomerates of original data  C = (A ⊕ B) ⊕ (A ⊕ B ⊕ C)  Degree of a codeword  The number of symbols XOR’d together to form the codeword

Growth Codes  Degree of a codeword “grows” with time  At each timepoint codeword of a specific degree has the most utility for a decoder (on average)  This “most useful” degree grows monotonically with time  R: Number of decoded symbols sink has R1R1 R3R3 R2R2 R4R4 d=1 d=2d=3d=4 Time ->

Ideas of Proposed Method  Method:  Growth Codes:  Been designed for sensor networks in catastrophic or emergency scenarios.  To make new received encoded packet useful. –Can be decoded immediately.  To avoid new received encoded packet useless. –Cannot be decoded. http://www.powercam.cc/slide/284

Ideas of Proposed Method  Growth Codes:  A received encoded packet is immediately useful:  if d - 1 of the data used to form this encoded packet are already decoded/known. y4y4 x3x5x6x3x5x6 already decoded data:new received packets: x1x1 x2x2 x3x3 x5x5 x3x3 x5x5   y4y4 x6x6 d = 3 d – 1 data are already decoded. http://www.powercam.cc/slide/284

Ideas of Proposed Method  Growth Codes:  A received encoded packet is useless:  if all d data used to form a encoded packet are already known. y1y1 x1x3x1x3 already decoded data:new received packets: x1x1 x2x2 x3x3 x5x5 d = 2 d data are already decoded. new received packet is useless. http://www.powercam.cc/slide/284

Ideas of Proposed Method  Consider the degree of an encoded packet:  Decoder has decoded r original data.  The probability that new received encoded packet is immediately decodable to the decoder: Number of decoded original data: r Importance of Immediately Decodable Packet : Low Degree : High Degree http://www.powercam.cc/slide/284

2 8 1 x1x1 x3x3 In the beginning: Nodes 1 and 3 exchanging codewords 3 x3x3 x3x3 x3x3 x3x3 x1x1 x1x1 x1x1 x1x1 Later on: Node 1 is destroyed: Symbol x 1 survives in the network. Nodes are now exchanging degree 2 codewords 2 8 1 3 x4⊕x3x4⊕x3 x8x8 x8⊕x7x8⊕x7 x1⊕x4x1⊕x4 x2⊕x8x2⊕x8 x3x3 x6⊕x3x6⊕x3 x4⊕x5x4⊕x5 x2⊕x8x2⊕x8 x1⊕x4x1⊕x4 Figure 1: Localized view of the network. In the beginning, the nodes exchange degree 1 codewords, gradually increasing the degree over time. Even when a node fails, its data survives in the another node’s storage

Figure 2: Growth Codes in action: The sink receives low degree codewords in the beginning and higher and higher degree later on

Growth Codes: Encoding  R i is what the sink has received  What about encoding?  To decode R i, sink needs to receive some K i codewords, sampled uniformly  Sensor nodes estimate K i and transition accordingly  Optimal transition points a function of N, the size of the network  Exact value of K 1 computed. Upper bounds for K i, i > 1 computed.

Implementation of Growth Codes  Time divided into rounds  Each node exchanges degree 1 codewords with random neighbor until round K 1  Between round K i and K i-1 nodes exchange degree i codewords  Sink receives codewords as they get exchanged in the network  Growth Code degree distribution at time k

High Level View of the Protocol 1 4 2 3 Nodes send data at random times (Current implementation: exponentially distributed timers)

High Level View of the Protocol (2) 1 2 After time K 1, nodes start sending degree 2 codewords  Degree 2 codeword Symbols Degree 1 codewords Sender picks a random symbol XORs it with its own symbol 4 3 Even if node 3 fails Node 3’s data survives 0 K2K2 K3K3 K1K1

High Level View of the Protocol (3)  After time K 1, nodes start sending degree 2 codewords  After time K 2, nodes start sending degree 3 codewords.  After time K i, nodes start sending degree i+1 codewords (Times K i can be out of sync at different nodes) Note: No need to tightly synchronize clocks 0 K2K2 K3K3 K1K1

The Intuition behind Growth Codes Set of symbols decoded at Sink Codewords When very few symbols decoded Easy to decode low degree codewords time

The Intuition behind Growth Codes(2) When significant number of symbols decoded Low degree codewords often redundant Higher degree codewords more likely to be useful Set of symbols decoded at Sink Codewords

Outline  Problem Description  Growth Codes  Simulations and Experiments  Conclusions

Simulations/Experiments: Compare data persistence of various approaches 1. Simulations:  Centralized Setting: compare GC with other channel coding schemes  Distributed Simulation: assess large-scale performance of coding vs no coding 2. Experiments on motes:  Compare time of complete recovery for GC vs routing  Measure resilience to node failures

 No coding is fast in beginning: slowdown is explained via Coupon Collector’s problem  Soliton/ R-Soliton: poor partial recovery (reason: high degree codewords sent too early)  Growth Codes closest to theoretical upper bound (reason: right degree at the right time) Centralized Simulation (to compare with other channel coding schemes for which only centralized versions exist)  Single source, single sink  Source generates random codewords according to coding scheme (GC, Soliton)  Zero failure rate Comparison with various coding schemes (N = 1500) 1 Source Sink

Growth Codes vs No Coding (Varying N) Distributed Simulation (to assess the performance gain of coding)  N sources, single sink  Random graph topology (avg degree 10)  Sink receives 1 codeword per time unit  Complete recovery takes:  O(N logN) time without coding (Coupon Collector’s effect)  Linear time with Growth Codes  Soliton/R-Soliton: cannot compare in a distributed setup

Recovery Rate Without coding, a lot of data is lost during the disaster even when using randomized replication

Effect of Topology 500 nodes placed at random in a 1x1 square, nodes connected if within a distance of 0.3 R : the radius of the network

Resilience to Random Failures 500 node random topology network Nodes fail every second with a probability of 0.0005 (1 every 4 seconds in the beginning)

Experiments with Motes  Crossbow micaz  2.4GHz IEEE  802.15.4  250 Kbps  High Data Rate Radio

Experiments with (micaz) motes (to measure data persistence with time)  GC vs TinyOS’s “MultiHop” routing protocol  No routing state at time 0 (scenario where sensor nodes are deployed rapidly)  “MultiHop” for persistence: takes long time to complete route setup  Comparison with GC simulator validates simulator performance S Experimental Topology

Motes experiments: Resilience to node failures  Nodes generate data every 300 seconds  3 nodes fail just after 3 rd data generation 0300 600900 Nodes generate data “MultiHop” sets up routing “MultiHop” repairs routes Nodes send data to sink 3 random nodes fail S Experimental Topology

Motes experiments: Resilience to node failures  1 st generation: GC faster, MH takes time to setup routes  2 nd generation: routing already setup, MH very fast  3 rd generation: MH needs to repair routes 0300 600900 Nodes generate data “MultiHop” sets up routing “MultiHop” repairs routes Nodes send data to sink 3 random nodes fail

Conclusions  Data persistence in sensor networks:  First distributed channel codes (GC)  Protocol requires minimal configuration  Is robust to node failures  Simulations and experiments on micaz motes show:  GC achieves complete recovery faster  GC recovers more partial data at any time

Received codewords Iterative Decoding x1x1 x3x3 x5x5 x2x2 x1x1 x3x3 x4x4 x3x3 Recovered symbols Unused codewords 5 original symbols x 1 … x 5 4 codewords received Each codeword is XOR of component original symbols

Online Decoding at the Sink x1x1 Recovered Symbols x6x6 x3x3 Undecoded codewords x2⊕x5x2⊕x5 Sink New codeword x2⊕x6x2⊕x6 x1x1 Recovered Symbols x6x6 x3x3 Undecoded codewords x2x2 = x6x6 ⊕ x2⊕x5x2⊕x5 x5x5 = x2x2 ⊕ x2⊕x6x2⊕x6 x5x5 Sink x2x2

Growth Codes: Maximizing Sensor Network Data Persistence Abhinav Kamra, Vishal Misra, Dan Rubenstein Department of Computer Science, Columbia University.

Similar presentations

Presentation on theme: "Growth Codes: Maximizing Sensor Network Data Persistence Abhinav Kamra, Vishal Misra, Dan Rubenstein Department of Computer Science, Columbia University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Growth Codes: Maximizing Sensor Network Data Persistence Abhinav Kamra, Vishal Misra, Dan Rubenstein Department of Computer Science, Columbia University.

Similar presentations

Presentation on theme: "Growth Codes: Maximizing Sensor Network Data Persistence Abhinav Kamra, Vishal Misra, Dan Rubenstein Department of Computer Science, Columbia University."— Presentation transcript:

Similar presentations

About project

Feedback