Distributed Storage Allocations for Optimal Delay Derek Leong 1, Alexandros G. Dimakis 2, Tracey Ho 1 1 California Institute of Technology 2 University.

Slides:



Advertisements
Similar presentations
Mobility Increase the Capacity of Ad-hoc Wireless Network Matthias Gossglauser / David Tse Infocom 2001.
Advertisements

Design of the fast-pick area Based on Bartholdi & Hackman, Chpt. 7.
Alex Dimakis based on collaborations with Dimitris Papailiopoulos Arash Saber Tehrani USC Network Coding for Distributed Storage.
Fast Algorithms For Hierarchical Range Histogram Constructions
Queuing Network Models for Delay Analysis of Multihop Wireless Ad Hoc Networks Nabhendra Bisnik and Alhussein Abouzeid Rensselaer Polytechnic Institute.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Capacity of Wireless Channels
Enhancing Secrecy With Channel Knowledge
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Delay and Throughput in Random Access Wireless Mesh Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE Department Rensselaer Polytechnic Institute (RPI)
DYNAMIC POWER ALLOCATION AND ROUTING FOR TIME-VARYING WIRELESS NETWORKS Michael J. Neely, Eytan Modiano and Charles E.Rohrs Presented by Ruogu Li Department.
Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod
1 Stochastic Event Capture Using Mobile Sensors Subject to a Quality Metric Nabhendra Bisnik, Alhussein A. Abouzeid, and Volkan Isler Rensselaer Polytechnic.
1 Crosslayer Design for Distributed MAC and Network Coding in Wireless Ad Hoc Networks Yalin E. Sagduyu Anthony Ephremides University of Maryland at College.
By Libo Song and David F. Kotz Computer Science,Dartmouth College.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
Dynamic Tuning of the IEEE Protocol to Achieve a Theoretical Throughput Limit Frederico Calì, Marco Conti, and Enrico Gregori IEEE/ACM TRANSACTIONS.
Using Redundancy to Cope with Failures in a Delay Tolerant Network Sushant Jain, Michael Demmer, Rabin Patra, Kevin Fall Source:
Energy-Efficient Rate Scheduling in Wireless Links A Geometric Approach Yashar Ganjali High Performance Networking Group Stanford University
Evaluating Hypotheses
The Impact of Spatial Correlation on Routing with Compression in WSN Sundeep Pattem, Bhaskar Krishnamachri, Ramesh Govindan University of Southern California.
1 Validation and Verification of Simulation Models.
Variable-Length Codes: Huffman Codes
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Mobile Ad hoc Networks COE 549 Delay and Capacity Tradeoffs II Tarek Sheltami KFUPM CCSE COE 8/6/20151.
Network Coding vs. Erasure Coding: Reliable Multicast in MANETs Atsushi Fujimura*, Soon Y. Oh, and Mario Gerla *NEC Corporation University of California,
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu.
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
Does Packet Replication Along Multipath Really Help ? Swades DE Chunming QIAO EE Department CSE Department State University of New York at Buffalo Buffalo,
Fluid Limits for Gossip Processes Vahideh Manshadi and Ramesh Johari DARPA ITMANET Meeting March 5-6, 2009 TexPoint fonts used in EMF. Read the TexPoint.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
Salah A. Aly,Moustafa Youssef, Hager S. Darwish,Mahmoud Zidan Distributed Flooding-based Storage Algorithms for Large-Scale Wireless Sensor Networks Communications,
Optimal Content Delivery with Network Coding Derek Leong, Tracey Ho California Institute of Technology Rebecca Cathey BAE Systems CISS 2009 March 19, 2009.
Erasure Coding for Real-Time Streaming Derek Leong and Tracey Ho California Institute of Technology Pasadena, California, USA ISIT
1 Dr. Ali Amiri TCOM 5143 Lecture 8 Capacity Assignment in Centralized Networks.
A Sociability-Based Routing Scheme for Delay-Tolerant Networks May Chan-Myung Kim
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
MAIN RESULT: Depending on path loss and the scaling of area relative to number of nodes, a novel hybrid scheme is required to achieve capacity, where multihop.
On Coding for Real-Time Streaming under Packet Erasures Derek Leong *#, Asma Qureshi *, and Tracey Ho * * California Institute of Technology, Pasadena,
CS3502: Data and Computer Networks Local Area Networks - 1 introduction and early broadcast protocols.
1 The Encoding Complexity of Network Coding Michael Langberg California Institute of Technology Joint work with Jehoshua Bruck and Alex Sprintson.
 Tree in Sensor Network Patrick Y.H. Cheung, and Nicholas F. Maxemchuk, Fellow, IEEE 3 rd New York Metro Area Networking Workshop (NYMAN 2003)
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
CS3502: Data and Computer Networks Local Area Networks - 1 introduction and early broadcast protocols.
Minimizing Delay in Shared Pipelines Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) Yoram Revah, Aviran Kadosh.
A Fast Repair Code Based on Regular Graphs for Distributed Storage Systems Yan Wang, East China Jiao Tong University Xin Wang, Fudan University 1 12/11/2013.
Dynamic Control of Coding for Progressive Packet Arrivals in DTNs.
1 On the Channel Capacity of Wireless Fading Channels C. D. Charalambous and S. Z. Denic School of Information Technology and Engineering, University of.
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
Toward Reliable and Efficient Reporting in Wireless Sensor Networks Authors: Fatma Bouabdallah Nizar Bouabdallah Raouf Boutaba.
Information Theory for Mobile Ad-Hoc Networks (ITMANET): The FLoWS Project Collision Helps! Algebraic Collision Recovery for Wireless Erasure Networks.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Courtesy Piggybacking: Supporting Differentiated Services in Multihop Mobile Ad Hoc Networks Wei LiuXiang Chen Yuguang Fang WING Dept. of ECE University.
Pouya Ostovari and Jie Wu Computer & Information Sciences
OPERATING SYSTEMS CS 3502 Fall 2017
DELAY TOLERANT NETWORK
Group Multicast Capacity in Large Scale Wireless Networks
Introduction to Wireless Sensor Networks
Computing and Compressive Sensing in Wireless Sensor Networks
Server Allocation for Multiplayer Cloud Gaming
Highway Vehicular Delay Tolerant Networks: Information Propagation Speed Properties Emmanuel Baccelli, Philippe Jacquet, Bernard Mans, and Georgios Rodolakis.
Symmetric Allocations for Distributed Storage
Throughput-Optimal Broadcast in Dynamic Wireless Networks
Information Theoretical Analysis of Digital Watermarking
Presentation transcript:

Distributed Storage Allocations for Optimal Delay Derek Leong 1, Alexandros G. Dimakis 2, Tracey Ho 1 1 California Institute of Technology 2 University of Southern California ISIT

Distributed Storage Allocations for Optimal Delay / 2 How should we store a data object over a set of mobile nodes, subject to a given storage budget, so as to achieve the optimal recovery delay? When is coding beneficial? When will uncoded replication suffice? Motivation

Distributed Storage Allocations for Optimal Delay / 3 Network Model Consider a network of n mobile storage nodes Assume that the number of contacts between any given pair of nodes follows a Poisson distribution with rate parameter ¸ ; the time between contacts is therefore described by an exponential distribution with mean 1/ ¸ Introduction

Distributed Storage Allocations for Optimal Delay / 4 Storage Allocation A source node creates a data object of unit size, and subsequently disseminates an encoded representation of it to other nodes for storage, subject to a given total storage budget T At the end of the dissemination process, node 1 stores x 1 amount of data, node 2 stores x 2 amount of data, and so on, such that Introduction

Distributed Storage Allocations for Optimal Delay / 5 Recovery by a Data Collector At some time after the completion of the data dissemination process, a data collector node begins to recover the data object by contacting other nodes and accessing the coded data stored in them We make the simplifying assumption that the stored data is instantaneously transmitted on contact Let random variable D denote the recovery delay incurred by the data collector Introduction

Distributed Storage Allocations for Optimal Delay / 6 Objectives We seek an allocation ( x 1 ; … ; x n ) of the given budget T that produces the optimal recovery delay; specifically, we consider two objectives involving the recovery delay D : (i) maximization of the probability of successful recovery by a given deadline d, or recovery probability (ii) minimization of the expected recovery delay Therefore, for each objective, we need to find (i) an optimal allocation of the given budget over the nodes, and (ii) an optimal coding scheme that jointly optimize the objective Introduction

Distributed Storage Allocations for Optimal Delay / 7 Introduction t1t1 t2t2 s A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” Trans. Inf. Theory, Sep A. Jiang, “Network coding for joint storage and transmission with minimum cost,” in Proc. ISIT, Jul Objectives Using an appropriate code, successful recovery occurs whenever the data collector accesses at least a unit amount of data (= size of the original data object)

Distributed Storage Allocations for Optimal Delay / 8 Objectives Therefore, assuming the use of an appropriate code (e.g. random linear coding, MDS code), we can express the recovery delay D as, where is the set of all nodes contacted by the data collector by time d Introduction

Distributed Storage Allocations for Optimal Delay / 9 Let W 1, …, W n be i.i.d. random variables denoting the times at which the data collector first contacts nodes 1, …, n, respectively, where Observe that the data collector contacts each node by the specified deadline d > 0 independently with probability p ¸ ; d given by It follows that the probability of contacting exactly a subset r of the n nodes by time d is ; the recovery probability can therefore be obtained by summing over all subsets r that allow successful recovery: Maximizing Recovery Probability W i ~ Exponential( ¸ )

Distributed Storage Allocations for Optimal Delay / 10 Discussion between R. Karp, R. Kleinberg, C. Papadimitriou, E. Friedman, and others at UC Berkeley, 2005 Found examples for the suboptimality of symmetric allocations Conjectured that there exists a symmetric optimal allocation when the number of nodes n → ∞ S. Jain, M. Demmer, R. Patra, K. Fall, “Using redundancy to cope with failures in a delay tolerant network,” SIGCOMM 2005 Considered the allocation of a transmission budget over different routes in a DTN Experimentally evaluated the performance of symmetric allocations along with other heuristics Related theoretical claims and proofs incomplete/inaccurate Recovery Probability: Related Work RECAP

Distributed Storage Allocations for Optimal Delay / A 7 / 15 7 / 15 7 / 15 7 / 15 7 / 15 B 7 / 6 7 / C 2 / 3 2 / 3 1 / 3 1 / 3 1 / C Recovery Probability n = 5 nodes, access probability p ¸ ; d = 2/3, budget T = 7/3 Recovery Probability: Illustrative Example RECAP

Distributed Storage Allocations for Optimal Delay / 12 We are particularly interested in symmetric allocations because they are easy to describe and implement Successful recovery for the symmetric allocation occurs if and only if at least out of the m nonempty nodes are accessed Therefore, the recovery probability of is given by Recovery Probability: Optimal Symmetric Allocation D. Leong, A. G. Dimakis, and T. Ho, “Symmetric allocations for distributed storage,” in Proc. GLOBECOM, Dec RECAP

Distributed Storage Allocations for Optimal Delay / 13 Recovery Probability: Optimal Symmetric Allocation The problem is nontrivial even when restricted to symmetric allocations… number of nonempty nodes in the symmetric allocation The recovery probability for the symmetric allocation is RECAP

Distributed Storage Allocations for Optimal Delay / 14 Recovery Probability: Optimal Symmetric Allocation Maximal spreading (with coding) is optimal among symmetric allocations when the contact rate ¸ or recovery deadline d is sufficiently large: Minimal spreading (uncoded replication) is optimal among symmetric allocations when the contact rate ¸ or recovery deadline d is sufficiently small: D. Leong, A. G. Dimakis, and T. Ho, “Symmetric allocations for distributed storage,” in Proc. GLOBECOM, Dec RECAP PROPOSITION 1 If, then either or is an optimal symmetric allocation. PROPOSITION 2 If, then is an optimal symmetric allocation.

Distributed Storage Allocations for Optimal Delay / 15 Recovery Probability: Optimal Symmetric Allocation maximal spreading (with coding) is optimal among symmetric allocations minimal spreading (uncoded replication) is optimal among symmetric allocations other symmetric allocations may be optimal in the gap When exactly, we observe numerically that minimal spreading is optimal among symmetric allocations for most values of T ; the optimal symmetric allocation changes continually over the intervals while is optimal for RECAP

Distributed Storage Allocations for Optimal Delay / 16 By considering the derivative of wrt d, we obtain the following expression for the expected recovery delay: CONJECTURE : A symmetric optimal allocation always exists for any n and T Observe that given n, ¸, and T, the optimal allocation depends only on n and T, but not ¸ ; this is in contrast with the maximization of for which the optimal allocation depends on all parameters n, ¸, T, and d Minimizing Expected Recovery Delay

Distributed Storage Allocations for Optimal Delay / 17 Expected Delay: Related Work T. Spyropoulos, K. Psounis, C. S. Raghavendra, “Spray and Wait: An efficient routing scheme for intermittently connected mobile networks,” ACM SIGCOMM Workshop on DTN 2005 Spray a fixed number of uncoded replicas into the network, and wait for one of them to come into contact with the data collector Showed that this fixed budget approach performs very well compared to other heuristics

Distributed Storage Allocations for Optimal Delay / 18 number of nonempty nodes in the symmetric allocation Finding the optimal symmetric allocation… The expected recovery delay for the symmetric allocation is Expected Delay: Optimal Symmetric Allocation

Distributed Storage Allocations for Optimal Delay / 19 If T is an integer (i.e. ` = 1), then, which corresponds to minimal spreading (uncoded replication), is optimal As the fractional part of T increases (i.e. ` increases), the amount of spreading (with coding) in the optimal symmetric allocation increases RESULT 1 Suppose, where. If, then is an optimal symmetric allocation. If, then either or is an optimal symmetric allocation. We are able to characterize the optimal symmetric allocation completely: Expected Delay: Optimal Symmetric Allocation

Distributed Storage Allocations for Optimal Delay / 20 Proof Idea: Eliminating candidates for the optimal symmetric allocation… 1.We can show that an optimal m * can be found from among candidates: 2.For, where, the expected recovery delay is given by 3.Using a geometrical argument, we show that the choice of minimizes the expected recovery delay among all, where 4.To demonstrate the optimality of, i.e., we apply the following bounds for the harmonic number H n : Expected Delay: Optimal Symmetric Allocation

Distributed Storage Allocations for Optimal Delay / 21 We apply our theoretical insights to the design of a simple data dissemination and storage protocol for a delay tolerant network Simulations allow us to capture the transient dynamics of the data dissemination process, and its interaction with the data recovery process Our goal is to understand how different symmetric allocations perform under different circumstances: Random waypoint mobility model vs real-world mobility traces Low vs high mobility Low vs high connectivity Starting recovery immediately vs after some time Simulation Study

Distributed Storage Allocations for Optimal Delay / 22 Our protocol extends SPRAY AND WAIT by allowing nodes to store coded packets that are each 1/ w the size of the original data object Successful recovery occurs when the data collector accesses at least w such packets (choosing w = 1 produces the original protocol) Different symmetric allocations of the given budget T can be realized by changing the value of parameter w Simulation Study: Generalized Spray and Wait

Distributed Storage Allocations for Optimal Delay / 23 Simulation Study: Random Waypoint: Key Observations Number of wireless mobile nodes n = 100 Plots show how the required wait time varies with the desired recovery probability P S Each line represents a specific choice of parameter Recovery probability performance is consistent with our analysis: phase transition in the optimal symmetric allocation is clearly discernable in most plots Expected recovery delay performance is consistent with our analysis: minimal spreading is optimal in most plots High-mobility scenario plots appear to be vertically scaled versions of the baseline scenario plots: speeding up of time Effect of increased connectivity appears less straightforward, e.g. phase transition not evident for recovery starting at time 0: data dissemination process impeded by greater interference? In the high recovery probability regime, maximal spreading (with coding) can lead to a significant reduction in the required wait time Recovery start time appears to have a limited impact on how different allocations perform relative to each other: most noticeable effect of starting recovery at time 0 is the reduced spread in performance

Distributed Storage Allocations for Optimal Delay / 24 Simulation Study: Mobility Traces: Key Observations Number of wireless taxi cabs n = 100 Plots show how the required wait time varies with the desired recovery probability P S Each line represents a specific choice of parameter Plots show distinct “jumps” in wait times: reduced mobility of cabs at night Despite nonideal conditions, many of our previous observations still apply here In the high recovery probability regime, maximal spreading (with coding) can lead to a significant reduction in the required wait time

Distributed Storage Allocations for Optimal Delay / 25 The optimal symmetric allocations are not the same for both objectives… (i) Maximization of Recovery Probability : For any budget T, there is a phase transition from a regime where minimal spreading (uncoded replication) is optimal to a regime where maximal spreading (with coding) is optimal, as the access probability p (or the deadline d ) increases (ii) Minimization of Expected Recovery Delay : With the averaging over both regimes, minimal spreading (uncoded replication) turns out to be optimal whenever the budget T is an integer; the amount of spreading in the optimal symmetric allocation increases with the fractional part of T Performance gap between minimal spreading and maximal spreading can be quite substantial, e.g. for the required wait time in both the low and high recovery probability regimes Summary: Theoretical Analysis

Distributed Storage Allocations for Optimal Delay / 26 Results of the simulation study are consistent with our analytical findings Provides clear evidence that the choice of storage allocation can have a significant impact on the recovery delay performance Shows how mobility, connectivity, and recovery start time may affect performance Summary: Simulation Study

Distributed Storage Allocations for Optimal Delay / 27 The simple contact model assumed here can be generalized to the case where a variable amount of data is transmitted during each contact between nodes Allow nonuniform contact rates ¸ i between the data collector and individual nodes Future Work

Distributed Storage Allocations for Optimal Delay / 28 Thank you!

Distributed Storage Allocations for Optimal Delay / 29 Additional Simulation Results

Distributed Storage Allocations for Optimal Delay / 30 Simulation Study: Random Waypoint (Budget T = 5) Number of wireless mobile nodes n = 100 Plots show how the required wait time varies with the desired recovery probability P S Each line represents a specific choice of parameter

Distributed Storage Allocations for Optimal Delay / 31 Simulation Study: Random Waypoint (Budget T = 10) Number of wireless mobile nodes n = 100 Plots show how the required wait time varies with the desired recovery probability P S Each line represents a specific choice of parameter

Distributed Storage Allocations for Optimal Delay / 32 Simulation Study: Random Waypoint (Budget T = 20) Number of wireless mobile nodes n = 100 Plots show how the required wait time varies with the desired recovery probability P S Each line represents a specific choice of parameter

Distributed Storage Allocations for Optimal Delay / 33 Simulation Study: Mobility Traces (Budget T = 5) Number of wireless taxi cabs n = 100 Plots show how the required wait time varies with the desired recovery probability P S Each line represents a specific choice of parameter

Distributed Storage Allocations for Optimal Delay / 34 Simulation Study: Mobility Traces (Budget T = 10) Number of wireless taxi cabs n = 100 Plots show how the required wait time varies with the desired recovery probability P S Each line represents a specific choice of parameter

Distributed Storage Allocations for Optimal Delay / 35 Simulation Study: Mobility Traces (Budget T = 20) Number of wireless taxi cabs n = 100 Plots show how the required wait time varies with the desired recovery probability P S Each line represents a specific choice of parameter