Presentation is loading. Please wait.

Presentation is loading. Please wait.

DNA Research Group 1 Growth Codes: Maximizing Sensor Network Data Persistence Vishal Misra Joint work with Abhinav Kamra, Jon Feldman (Google) and Dan.

Similar presentations


Presentation on theme: "DNA Research Group 1 Growth Codes: Maximizing Sensor Network Data Persistence Vishal Misra Joint work with Abhinav Kamra, Jon Feldman (Google) and Dan."— Presentation transcript:

1 DNA Research Group 1 Growth Codes: Maximizing Sensor Network Data Persistence Vishal Misra Joint work with Abhinav Kamra, Jon Feldman (Google) and Dan Rubenstein

2 MSR Cambridge, 7/13/06 2 A generic sensor network Sink(s) Sensor Nodes Data follows multi-hop path to sink(s) data Sensed Data

3 MSR Cambridge, 7/13/06 3 An abstract channel Sensor Nodes Sinks Node on route to sink fails Communication dies Nodes fail Erasure Channel: Need Some Reliability Mechanism

4 MSR Cambridge, 7/13/06 4 Data Persistence We define data persistence of a sensor network to be the fraction of data generated within the network that eventually reaches the sink. Focus of Work: Maximizing Data Persistence

5 MSR Cambridge, 7/13/06 5 Specific Context  Sensor Networks in a Disaster setting  Monitoring earthquakes, fires, floods etc..  Network might get destroyed before delivering data  Disaster event might cause spikes in sensed data: congestion near sinks  Partial recovery of data also useful

6 MSR Cambridge, 7/13/06 6 Increasing persistence  Open Loop Approach (coding)  Apply channel codes to recover from errors  Closed Loop Approach (networking)  Employ feedback to retransmit lost data  Exploit topology awareness to route along surviving paths

7 MSR Cambridge, 7/13/06 7 Traditional approaches  Coding: erasure codes  Gallager Codes [1962], Rediscovered as LDPC  RS Codes [1960, Reed and Solomon]  Tornado Codes [1997, Luby et al.]  Luby Transform Codes [1998, Luby]  Come back to them later  Raptor Codes [2001, Shokrollahi]  Networking: reliable transport protocols for sensor networks  PSFQ [2002, Wan et al.]  RMST [2003, Stann et al.]  ESRT [2003, Akylidz et al.]

8 MSR Cambridge, 7/13/06 8 Why our problem is different (coding perspective)  Traditional approaches implement single source channel coding  Our data source is distributed  Traditional approaches aim at full recovery from errors (erasures)  In sensor networks partial recovery is useful and important

9 MSR Cambridge, 7/13/06 9 Why our problem is different (networking perspective)  In disaster scenarios need quick delivery of data  Feedback often infeasible  Often, no time to set up routing trees  Approach should employ minimal configuration  Difficult to predict which nodes will survive  Sinks might get destroyed. Location of sinks unkown  Surviving routes unknown  Feedback based approaches may not scale  Sensor nodes have limited resources to implement complex functionality

10 MSR Cambridge, 7/13/06 10 Our Approach  Two main ideas  Randomized routing and replication  Push data in random directions to ensure survival  Distributed channel codes that optimize data delivery (Growth Codes)  Based on LDPC erasure codes

11 MSR Cambridge, 7/13/06 11 Solution Features  Data replication (for persistence)  Explicit routing not required  Can employ if present  No feedback from sink necessary  Partial data recovery  Completely distributed

12 MSR Cambridge, 7/13/06 12 First Idea: Random Replication  Nodes transfer sensed data with random neighbors  Process iterates and sensed data is copied across the network  Sensed data goes on a “random walk” through the network  Process robust to localized failures  Can be thought of as a replication code Codes Naïve: Can we do better?

13 MSR Cambridge, 7/13/06 13 Brief Segway: Digital Fountain  Digital Fountain:  Source splits message into smaller data symbols  Data symbols are encoded into codewords  Potentially infinitely many unique codewords  Clients can decode original data with sufficiently many unique codewords  Low overhead erasure resistant channel codes

14 MSR Cambridge, 7/13/06 14 Luby Transform (LT) Codes  Rateless erasure codes  LT Codes are universal in the sense that they  Are near optimal for every erasure channel  Are very efficient as the data length grows.

15 MSR Cambridge, 7/13/06 15 Erasure Codes: LT-Codes b1b1 b2b2 b3b3 b4b4 b5b5 F= n=5 input blocks

16 MSR Cambridge, 7/13/06 16 LT-Codes: Encoding b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 1.Pick degree d 1 from a pre- specified distribution. (d 1 =2) 2.Select d 1 input blocks uniformly at random. (Pick b 1 and b 4 ) 3.Compute their sum (XOR). 4.Output sum, block IDs E(F)= F=

17 MSR Cambridge, 7/13/06 17 LT-Codes: Encoding E(F)= b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 F=

18 MSR Cambridge, 7/13/06 18 LT-Codes: Decoding b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 b5b5 b5b5 b5b5 b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c5c5 c6c6 c7c7 b5b5 c4c4 b5b5 b5b5 b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c5c5 c6c6 c7c7 b5b5 c4c4 b5b5 b5b5 b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c5c5 c6c6 c7c7 b5b5 c4c4 b5b5 b5b5 b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c5c5 c6c6 c7c7 b5b5 c4c4 b5b5 b5b5 b2b2 b2b2 b1b1 b2b2 b3b3 b4b4 b5b5 c1c1 c2c2 c3c3 c5c5 c6c6 c7c7 b5b5 c4c4 b5b5 b5b5 b2b2 b2b2 Key to efficiency: the right degree distribution Receiver

19 MSR Cambridge, 7/13/06 19 Degree Distribution for LT-Codes  Soliton Distribution:  Avg degree H(N) ~ ln(N)  In expectation: Exactly one degree 1 symbol in each round of decoding  Distribution very fragile in practice, fixed with Robust Soliton  Soliton wave is one where dispersion balances refraction perfectly.  Soliton Distribution: input symbols are added to the ripple at the same rate as they are processed

20 MSR Cambridge, 7/13/06 20 Thought: Sensor Digital Fountain? Sensor Nodes Sinks Information survives losses

21 MSR Cambridge, 7/13/06 21 LT codes for sensor networks?  Sensed data could be the data units, but…  How do we achieve a given degree distribution?  LT codes designed for centralized sources  Sensor networks have distributed data sources  As a thought experiment, assume that magically we can implement distributed LT codes

22 MSR Cambridge, 7/13/06 22 Perfect Source Simulation: Sampling ideal distributions (N = 1500) Initially, no coding does Better than Robust Soliton! Robust Soliton improves as more codewords are received

23 MSR Cambridge, 7/13/06 23 Toy problem  Suppose a sink could ask for a codeword of the right degree, still chosen randomly, what would be the most useful?  A:Time dependent!

24 MSR Cambridge, 7/13/06 24 Coupon Collector’s Problem

25 MSR Cambridge, 7/13/06 25 Growth Codes  Degree of a codeword “grows” with time  At each timepoint codeword of a specific degree has the most utility for a decoder (on average)  This “most useful” degree grows monotonically with time  R: Number of decoded symbols sink has R1R1 R3R3 R2R2 R4R4 d=1 d=2d=3d=4 Time ->

26 MSR Cambridge, 7/13/06 26 Growth Codes: Encoding  R i is what the sink has received  What about encoding?  To decode R i, sink needs to receive some K i codewords, sampled uniformly  Sensor nodes estimate K i and transition accordingly  Optimal transition points a function of N, the size of the network  Exact value of K 1 computed. Upper bounds for K i, i > 1 computed.

27 MSR Cambridge, 7/13/06 27 Distributed Implementation of Growth Codes  Time divided into rounds  Each node exchanges degree 1 codewords with random neighbor until round K 1  Between round K i and K i-1 nodes exchange degree i codewords  Sink receives codewords as they get exchanged in the network  Growth Code degree distribution at time k  k) :=  i = max(0, min( (K i -K i-1 )/k, (k-K i-1 )/k))

28 MSR Cambridge, 7/13/06 28 Sensor Network Model  N node sensor network  Limited storage at each sensor node  Large storage at sink  All sensed data assumed independent  Do not consider source coding 10 8 4 9 1 2 3 Sink x1x1 x9x9 x 10 x1x1 x2x2 x2x2 x3x3 x4x4 x6x6 x4x4

29 MSR Cambridge, 7/13/06 29 High Level View of the Protocol 2 8 1 x1x1 x3x3 In the beginning: Nodes 1 and 3 exchanging codewords 3 x3x3 x3x3 x3x3 x3x3 x1x1 x1x1 x1x1 x1x1 Later on: Node 1 is destroyed: Symbol x 1 survives in the network. Nodes are now exchanging degree 2 codewords 2 8 1 3 x4⊕x3x4⊕x3 x8x8 x8⊕x7x8⊕x7 x1⊕x4x1⊕x4 x2⊕x8x2⊕x8 x3x3 x6⊕x3x6⊕x3 x4⊕x5x4⊕x5 x2⊕x8x2⊕x8 x1⊕x4x1⊕x4

30 MSR Cambridge, 7/13/06 30 Received codewords Iterative Decoding x1x1 x3x3 x5x5 x2x2 x1x1 x3x3 x4x4 x3x3 Recovered symbols Unused codewords 5 original symbols x 1 … x 5 4 codewords received Each codeword is XOR of component original symbols

31 MSR Cambridge, 7/13/06 31 Online Decoding at the Sink x1x1 Recovered Symbols x6x6 x3x3 Undecoded codewords x2⊕x5x2⊕x5 Sink New codeword x2⊕x6x2⊕x6 x1x1 Recovered Symbols x6x6 x3x3 Undecoded codewords x2x2 = x6x6 ⊕ x2⊕x5x2⊕x5 x5x5 = x2x2 ⊕ x2⊕x6x2⊕x6 x5x5 Sink x2x2

32 MSR Cambridge, 7/13/06 32 Revisiting earlier simulation (N = 1500)

33 MSR Cambridge, 7/13/06 33 Time to recover all data Phase transition in obtaining last few data units (coupon collector’s problem)

34 MSR Cambridge, 7/13/06 34 Recovery Rate Without coding, a lot of data is lost during the disaster even when using randomized replication

35 MSR Cambridge, 7/13/06 35 Effect of Topology 500 nodes placed at random in a 1x1 square, nodes connected if within a distance of 0.3

36 MSR Cambridge, 7/13/06 36 Resilience to Random Failures 500 node random topology network Nodes fail every second with a probability of 0.0005 (1 every 4 seconds in the beginning)

37 MSR Cambridge, 7/13/06 37 Experiments with Motes  Crossbow micaz  2.4GHz IEEE  802.15.4  250 Kbps  High Data Rate Radio

38 MSR Cambridge, 7/13/06 38 Motes experiment

39 MSR Cambridge, 7/13/06 39 Motes experiment: continued

40 MSR Cambridge, 7/13/06 40 Conclusions  Developed distributed channel codes to maximize data persistence in (sensor) networks  First (to our knowledge) time varying LDPC codes  Proved Optimality of Growth Codes  Protocol requires minimal configuration (only rough estimate of network size needed)  Tested system with simulations and implementation on mica motes  More information: http://dna-wsl.cs.columbia.edu (tech report available, paper appearing in Sigcomm 2006)http://dna-wsl.cs.columbia.edu


Download ppt "DNA Research Group 1 Growth Codes: Maximizing Sensor Network Data Persistence Vishal Misra Joint work with Abhinav Kamra, Jon Feldman (Google) and Dan."

Similar presentations


Ads by Google