Network Coding Network coding is useful

Network Coding Network coding is useful
Increasing network capacity  existing focus Improving data persistence and robustness  important new direction

Growth Codes: Maximizing Sensor Network Data Persistence
Abhinav Kamra, Vishal Misra Jon Feldman, and Dan Rubenstein

A generic sensor network
Sensor Nodes Data follows multi-hop path to sink(s) Sink(s) data Sensed Data

An abstract channel Erasure Channel: Need Some Reliability Mechanism
Communication dies Node on route to sink fails Nodes fail Sinks Sensor Nodes Erasure Channel: Need Some Reliability Mechanism

Data Persistence We define data persistence of a sensor network to be the fraction of data generated within the network that eventually reaches the sink. Focus of Work: Maximizing Data Persistence

Specific Context Sensor Networks in a Disaster setting
Monitoring earthquakes, fires, floods etc.. Network might get destroyed before delivering data Disaster event might cause spikes in sensed data: congestion near sinks Partial recovery of data also useful

Increasing persistence
Closed Loop Approach (networking) Employ feedback to retransmit lost data Exploit topology awareness to route along surviving paths

Why our problem is different (networking perspective)
In disaster scenarios need quick delivery of data Feedback often infeasible Often, no time to set up routing trees Approach should employ minimal configuration Difficult to predict which nodes will survive Sinks might get destroyed. Location of sinks unkown Surviving routes unknown Feedback based approaches may not scale Sensor nodes have limited resources to implement complex functionality

Increasing persistence
Closed Loop Approach (networking) Employ feedback to retransmit lost data Exploit topology awareness to route along surviving paths Open Loop Approach (coding) Apply channel codes to recover from errors

Traditional approaches
Coding: erasure codes Gallager Codes [1962], Rediscovered as LDPC RS Codes [1960, Reed and Solomon] Tornado Codes [1997, Luby et al.] Luby Transform Codes [1998, Luby] Come back to them later Raptor Codes [2001, Shokrollahi] Networking: reliable transport protocols for sensor networks PSFQ [2002, Wan et al.] RMST [2003, Stann et al.] ESRT [2003, Akylidz et al.]

Why our problem is different (coding perspective)
Traditional approaches implement single source channel coding Our data source is distributed Traditional approaches aim at full recovery from errors (erasures) In sensor networks partial recovery is useful and important

Our Approach Two main ideas Randomized routing and replication
Push data in random directions to ensure survival Distributed channel codes that optimize data delivery (Growth Codes) Based on LDPC erasure codes

Solution Features Data replication (for persistence)
Explicit routing not required Can employ if present No feedback from sink necessary Partial data recovery Completely distributed

First Idea: Random Replication
Nodes transfer sensed data with random neighbors Process iterates and sensed data is copied across the network Sensed data goes on a “random walk” through the network Process robust to localized failures Can be thought of as a replication code Codes Naïve: Can we do better?

Brief Segway: Digital Fountain
Source splits message into smaller data symbols Data symbols are encoded into codewords Potentially infinitely many unique codewords Clients can decode original data with sufficiently many unique codewords Low overhead erasure resistant channel codes

Luby Transform (LT) Codes
Rateless erasure codes LT Codes are universal in the sense that they Are near optimal for every erasure channel Are very efficient as the data length grows.

Erasure Codes: LT-Codes
F= b1 b2 b3 b4 b5 Start of with a file F, n=5 blocks. n=5 input blocks

LT-Codes: Encoding E(F)= F= c1
Pick degree d1 from a pre-specified distribution. (d1=2) Select d1 input blocks uniformly at random. (Pick b1 and b4 ) Compute their sum (XOR). Output sum, block IDs F= b1 b2 b3 b4 b5 For each block, sample a degree distribution to figure out its degree. Then pick that many neighbors.

LT-Codes: Encoding E(F)= F= b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6 c7
Note that some blocks have “degree 1” – they are exactly equal to their neighbors on the graph.

LT-Codes: Decoding Key to efficiency: the right degree distribution b1
Receiver b1 b2 b3 b4 b5 c1 c2 c3 c5 c6 c7 c4 b1 b2 b3 b4 b5 c1 c2 c3 c5 c6 c7 c4 b1 b2 b3 b4 b5 c1 c2 c3 c5 c6 c7 c4 b1 b2 b3 b4 b5 c1 c2 c3 c5 c6 c7 c4 b1 b2 b3 b4 b5 c1 c2 c3 c5 c6 c7 c4 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6 c7 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6 c7 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6 c7 b1 b2 b3 b4 b5 c1 c2 c3 c4 c5 c6 c7 Key to efficiency: the right degree distribution

Degree Distribution for LT-Codes
Soliton Distribution: Avg degree H(N) ~ ln(N) In expectation: Exactly one degree 1 symbol in each round of decoding Distribution very fragile in practice, fixed with Robust Soliton Soliton wave is one where dispersion balances refraction perfectly. Soliton Distribution: input symbols are added to the ripple at the same rate as they are processed Our goal: maximize the amount of information recovered when only a small number of codewords are received LT code goal: minimize # codewords required to retrieve all data (not in line with our goal)

Thought: Sensor Digital Fountain?
Sinks Sensor Nodes Information survives losses

LT codes for sensor networks?
Sensed data could be the data units, but… How do we achieve a given degree distribution? LT codes designed for centralized sources Sensor networks have distributed data sources As a thought experiment, assume that magically we can implement distributed LT codes

Perfect Source Simulation: Sampling ideal distributions (N = 1500)
Initially, no coding does Better than Robust Soliton! Robust Soliton improves as more codewords are received Values are kj are calculated in the paper. These are upper bounds and hence the degree distribution does not perform as well for some region of k.

Toy problem Suppose a sink could ask for a codeword of the right degree, still chosen randomly, what would be the most useful? A:Time dependent!

Coupon Collector’s Problem
No coding: If N original symbols are generated uniformly randomly, the sink needs to receive approximately O(NlogN) symbols to recover all N original symbols.

Growth Codes Degree of a codeword “grows” with time
At each timepoint codeword of a specific degree has the most utility for a decoder (on average) This “most useful” degree grows monotonically with time R: Number of decoded symbols sink has R1 R3 R2 R4 d=1 d=2 d=3 d=4 Time ->

Growth Codes: Encoding
Ri is what the sink has received What about encoding? To decode Ri, sink needs to receive some Ki codewords, sampled uniformly Sensor nodes estimate Ki and transition accordingly Optimal transition points a function of N, the size of the network Exact value of K1 computed. Upper bounds for Ki, i > 1 computed.

Distributed Implementation of Growth Codes
Time divided into rounds Each node exchanges degree 1 codewords with random neighbor until round K1 Between round Ki and Ki-1 nodes exchange degree i codewords Sink receives codewords as they get exchanged in the network Growth Code degree distribution at time k k) := i = max(0, min( (Ki-Ki-1)/k, (k-Ki-1)/k)) R(1) = (N-1)/2, .., R(i) = (i*N-1)/(i+1)

Sensor Network Model N node sensor network
Limited storage at each sensor node Large storage at sink All sensed data assumed independent Do not consider source coding 10 8 4 9 1 2 3 Sink x1 x9 x10 x2 x3 x4 x6

High Level View of the Protocol
2 8 1 x1 x3 In the beginning: Nodes 1 and 3 exchanging codewords 3 Later on: Node 1 is destroyed: Symbol x1 survives in the network. Nodes are now exchanging degree 2 codewords 2 8 1 3 x4⊕x3 x8 x8⊕x7 x1⊕x4 x2⊕x8 x3 x6⊕x3 x4⊕x5

Iterative Decoding Recovered symbols Received codewords
x1 x3 x5 x2 x3 Recovered symbols x1 x3 x4 Received codewords Unused codewords Same decoder used for LT, Tornado codes. There is actually a third set of symbols which were discarded because all their component symbols were already decoded. This set is different from the unused symbols which have more than one of their component symbols not decoded yet. 5 original symbols x1 … x5 4 codewords received Each codeword is XOR of component original symbols

Online Decoding at the Sink
Undecoded codewords Undecoded codewords x2⊕x6 x2⊕x5 = ⊕ x2 x6 Sink Sink New codeword x2⊕x6 x1 x1 x3 x3 x2⊕x5 x2 x6 = ⊕ x6 x5 x5 x2 Recovered Symbols Recovered Symbols

Revisiting earlier simulation (N = 1500)

Time to recover all data
Phase transition in obtaining last few data units (coupon collector’s problem)

Recovery Rate Without coding, a lot of data is lost during the disaster even when using randomized replication

Effect of Topology 500 nodes placed
at random in a 1x1 square, nodes connected if within a distance of 0.3

Resilience to Random Failures
500 node random topology network Nodes fail every second with a probability of (1 every 4 seconds in the beginning)

Experiments with Motes
Crossbow micaz 2.4GHz IEEE 250 Kbps High Data Rate Radio

Motes experiment

Motes experiment: continued

Conclusions Developed distributed channel codes to maximize data persistence in (sensor) networks First (to our knowledge) time varying LDPC codes Proved Optimality of Growth Codes Protocol requires minimal configuration (only rough estimate of network size needed) Tested system with simulations and implementation on mica motes

Limitations Current approaches offer limited improvement over no coding Ignore correlation between data Highly correlated data  high coding efficiency Ignore broadcast nature of wireless channels More coding opportunities when leveraging opportunistic listening Coding and random replication is expensive for power constrained devices

Practical Data-Centric Storage
Cheng Tien Ee, Sylvia Ratnasamy, Scott Shenker UC Berkely, ICSI

Problem Interested in an event that occurred within sensor network
Where and what sort of elephant has been sighted? Where to store information? How do we retrieve info. from sensor network? Flooding query is highly inefficient! Querying node Answering node

Possible Solutions Flood query, node with answer replies
Large overhead Store the data at node whose id is H(k), where k is data id Require point-to-point routing Store the data at beacon node whose id is H(k) Beacon nodes become bottleneck Is there an alternative that achieves Doesn’t require point-to-point routing Load balancing Small overhead

Data Centric Storage (DCS)
Data driven networking  we control data loc. Associate data with and store it at a particular location Data and queries sent to the same location Reduces number of packet transmissions Stores elephant sightings Querying node Detecting node * Under certain conditions, see [29] Ratnasamy, et.al. Data-centric storage in sensornets with GHT, a geographic hash table

DCS Requirements What is required for DCS to work?
A common reference system All nodes need to locate same storage node Data-to-location mapping For a given piece of data, where do we store it? Querying node Detecting node Storage location

DCS Requirements (contd.)
To obtain common reference Build DCS over a GPS-enabled system Issues with data-to-location mapping How to obtain network boundary? (A) Hard to obtain range of data-to-location mapping How to handle “holes” in network? (B) Complex solutions (see GHT* paper) Is there a solution that doesn’t require point-to-point routing, is simple and easy to deploy? Storage location Range of locations (A) X (B) Storage location X * [29] Ratnasamy, et.al. Data-centric storage in sensornets with GHT, a geographic hash table

Outline PathDCS algorithm Supporting algorithms High-level simulation
Packet-level simulation Deployment

PathDCS Algorithm (Sketch)
beacon destination data source Segment Beacon Hops 1 id closest to h(keydata,1) All the way to beacon 2 id closest to h(keydata,2) [h (keydata, 2) % max_hops_2]+1 3 id closest to h(keydata,3) [h (keydata, 3) % max_hops_3]+1

PathDCS Algorithm (contd.)
Define a storage location based on existing paths Beacons act as reference locations Same destination location regardless of packet origin A node always exists at storage location No need to know network boundary for data-location mapping (A) No need to handle “holes” in network (B) Storage location Range of locations (A) X (B) Storage location X

Issues How to select beacons? How to maintain beacons?
How to route data and queries? How to achieve successful lookups?

Beacon Election Each node assigned random identifier (id), e.g. hash(MAC addr) Divide identifier space into equal-sized partitions Number of partitions = number of beacons Node with greatest id in its partition becomes beacon for that partition Beacon ids for each partition advertised in distance-vector packets 4000 partition 1 partition 2 partition 3 partition 4 1000 2000 3000 Node X’s id (1130) increasing id # Node Y’s id (1850) partition 1 beacon id hops to beacon 1 partition 2 beacon 2 Packet fields

Beacon handoff / takeover
Beacons can become overloaded or fail with time 1-hop neighbor takes over with explicit handoff, or after timeout period Proximity of new beacon   changes in paths   changes in storage locations Beacon handoff Edge in both old + new paths Edge in old path Edge in new path Key

Routing Data and Queries
Tree routing Routes packets from all nodes to beacons Common routing primitive in today’s sensor networks Uses MT/ETX metric when determining best end-to-end path How many beacons and path segments? More beacons  more balanced load but more overhead More path segments  more balanced load but larger routing stretch 20 beacons and 2 path segments offer reasonable tradeoff between balancing load and routing stretch

Lookup Success Lookup is successful when data and queries arrive at the same node In stable networks with no routing changes  100% success In dynamic networks, we need additional schemes Local replication Replicates data in one-hop neighborhood for robustness Also helps in retrieving data if paths fluctuate slightly Data refreshing

Data Refreshing Storage location is a function of the current network routing state Routing state changes  paths fluctuate  storage location changes Periodic pushing of data into network to the next storage location Beacon Storage location data A Beacon data New storage location A B Route update

High-Level Simulation
Fixed # beacons, increasing fraction of nodes acting as beacons To evaluate load-balancing ability Storage & transmission Doesn’t take into account low-level effects Parameters: 5000 nodes Mean neighbors = 14.5 No. of beacons = 20 Results: Storage load-balance: okay Transmission load-balance: close to “direct” Stretch:  # path segments Maintained at 2 path segments Stretch of about 2.4

Packet-level Simulation + Deployment
Need to consider effects of path fluctuation Due to varying link quality, node failure Metrics Route completion: Pr(packet reaches destination) Lookup success: Pr(query finds data | query reaches destination) Also of interest: destination location spread for each data item How far apart are destination locations? What is the distribution?

Packet-Level Simulation
Ran actual implementation code in simulation Models lossy medium, queue overflow, etc. Parameters: 500 nodes Network diameter: 18 Mean neighbors: 10.4 5 beacons Route completion: 86% (dependent on underlying routing primitive)

Lookup Success Rate Data refresh interval (sec) Lookup success Refreshing data more frequently increases successful queries. Faster route adaptation results in lower success rate and higher variation in success rate.

Destination Spread 1-hop replication good enough 80% packets land within one hop of the mode node.  One-hop replication is good. More dynamic routing does not affect the resulting destination spread.

Overhead Additional parameters: 100 data items/keys
Refresh interval: 100 seconds Distance vector advertisement interval: 10 seconds Overhead reduces with increasing application rate. Cost of refreshing data is lower than initial data replication and forwarding.

Deployment Deployed on Intel Berkeley’s Mirage micaZ testbed
Parameters: 100 nodes Network diameter: 6 Mean neighbors: 11.8 Route completion: 96-8% Lookup success > 95% Lookup success Data refresh interval (sec) 1-hop replication good enough

Summary PathDCS is simple and easily deployable
Builds on commonly-used routing primitive in sensornets: trees Has good enough load-balancing ability Adjustable data refreshing + local replication mechanisms can counter effects of path fluctuations

Network Coding Network coding is useful

Similar presentations

Presentation on theme: "Network Coding Network coding is useful"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Network Coding Network coding is useful

Similar presentations

Presentation on theme: "Network Coding Network coding is useful"— Presentation transcript:

Similar presentations

About project

Feedback