Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16.

Slides:



Advertisements
Similar presentations
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Advertisements

Occupancy Problems m balls being randomly assigned to one of n bins. (Independently and uniformly) The questions: - what is the maximum number of balls.
1 Decomposing Hypergraphs with Hypertrees Raphael Yuster University of Haifa - Oranim.
Lesson 08 Linear Programming
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
MS&E 211 Quadratic Programming Ashish Goel. A simple quadratic program Minimize (x 1 ) 2 Subject to: -x 1 + x 2 ≥ 3 -x 1 – x 2 ≥ -2.
An Approximate Truthful Mechanism for Combinatorial Auctions An Internet Mathematics paper by Aaron Archer, Christos Papadimitriou, Kunal Talwar and Éva.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
MIT and James Orlin © Game Theory 2-person 0-sum (or constant sum) game theory 2-person game theory (e.g., prisoner’s dilemma)
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
CWI PNA2, Reading Seminar, Presented by Yoni Nazarathy EURANDOM and the Dept. of Mechanical Engineering, TU/e Eindhoven September 17, 2009 An Assortment.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.
1 University of Southern California Towards A Formalization Of Teamwork With Resource Constraints Praveen Paruchuri, Milind Tambe, Fernando Ordonez University.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Finite Mathematics & Its Applications, 10/e by Goldstein/Schneider/SiegelCopyright © 2010 Pearson Education, Inc. 1 of 68 Chapter 9 The Theory of Games.
Linear Programming Econ Outline  Review the basic concepts of Linear Programming  Illustrate some problems which can be solved by linear programming.
1 Lecture 4 Maximal Flow Problems Set Covering Problems.
Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.
Data Selection In Ad-Hoc Wireless Sensor Networks Olawoye Oyeyele 11/24/2003.
Randomized Algorithms - Treaps
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.
Section 15.8 The Binomial Distribution. A binomial distribution is a discrete distribution defined by two parameters: The number of trials, n The probability.
Distributed Storage Allocations for Optimal Delay Derek Leong 1, Alexandros G. Dimakis 2, Tracey Ho 1 1 California Institute of Technology 2 University.
1 Chapter 5 Flow Lines Types Issues in Design and Operation Models of Asynchronous Lines –Infinite or Finite Buffers Models of Synchronous (Indexing) Lines.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Basic Concepts in Number Theory Background for Random Number Generation 1.For any pair of integers n and m, m  0, there exists a unique pair of integers.
Discrete Probability Distributions. Random Variable Random variable is a variable whose value is subject to variations due to chance. A random variable.
Notes 5IE 3121 Knapsack Model Intuitive idea: what is the most valuable collection of items that can be fit into a backpack?
King Saud University Women Students
Optimal Content Delivery with Network Coding Derek Leong, Tracey Ho California Institute of Technology Rebecca Cathey BAE Systems CISS 2009 March 19, 2009.
Chin-Yu Huang Department of Computer Science National Tsing Hua University Hsinchu, Taiwan Optimal Allocation of Testing-Resource Considering Cost, Reliability,
Erasure Coding for Real-Time Streaming Derek Leong and Tracey Ho California Institute of Technology Pasadena, California, USA ISIT
Percolation Processes Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization WorkshopPercolation Processes1.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
On Coding for Real-Time Streaming under Packet Erasures Derek Leong *#, Asma Qureshi *, and Tracey Ho * * California Institute of Technology, Pasadena,
Statistics Chapter 6 / 7 Review. Random Variables and Their Probability Distributions Discrete random variables – can take on only a countable or finite.
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Collection of Data Jim Bohan
Christoph Lenzen, STOC What is Load Balancing? work sharing low-congestion routing optimizing storage utilization hashing.
Stochastic Optimization
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Linear Programming Chap 2. The Geometry of LP  In the text, polyhedron is defined as P = { x  R n : Ax  b }. So some of our earlier results should.
Approximation Algorithms based on linear programming.
 This will explain how consumers allocate their income over many goods.  This looks at individual’s decision making when faced with limited income and.
Chapter5 Statistical and probabilistic concepts, Implementation to Insurance Subjects of the Unit 1.Counting 2.Probability concepts 3.Random Variables.
Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
EMGT 6412/MATH 6665 Mathematical Programming Spring 2016
EE465: Introduction to Digital Image Processing
Golrezaei, N. ; Molisch, A.F. ; Dimakis, A.G.
Symmetric Allocations for Distributed Storage
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.
The Lagrange Multiplier Method
The Binomial Distribution
CUBE MATERIALIZATION E0 261 Jayant Haritsa
I.4 Polyhedral Theory (NW)
I.4 Polyhedral Theory.
Biointelligence Laboratory, Seoul National University
Bernoulli Trials Two Possible Outcomes Trials are independent.
Presentation transcript:

Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod

Motivation

0.1

A B C

Success probability = × × 0 successful 0-subsets × × 2 successful 1-subsets × × 7 successful 2-subsets × × 9 successful 3-subsets × × 5 successful 4-subsets × × 1 successful 5-subsets = 0.99 A

Motivation Success probability = × × 0 successful 0-subsets × × 0 successful 1-subsets × × 0 successful 2-subsets × × 10 successful 3-subsets × × 5 successful 4-subsets × × 1 successful 5-subsets = B

Motivation Success probability = × × 0 successful 0-subsets × × 0 successful 1-subsets × × 6 successful 2-subsets × × 10 successful 3-subsets × × 5 successful 4-subsets × × 1 successful 5-subsets = C

MotivationA B C

0.1 accessmodel

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Source s has a data object of unit size It can use n storage nodes to store x 1, x 2, …, x n amount of data But faces an aggregate storage budget T, i.e. Access by the Data Collector Objective

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Data collector t attempts to recover the data object by accessing a subset r of storage nodes It succeeds when the total amount of data accessed is at least the size of the data object, i.e. Objective

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective We seek the optimal allocation that maximizes the probability of successful recovery

Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Difficulty Problem is nonconvex Large space of possible symmetric and nonsymmetric allocations (an allocation is symmetric if all its nonzero elements are equal, and nonsymmetric otherwise)

[1] Deterministic Allocation with Probabilistic Access Data collector accesses each storage node independently with constant probability p

Symmetric allocations can be suboptimal † Given n = 5 storage nodes, budget T = 12 / 5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial [1] Deterministic Allocation with Probabilistic Access † Originally from a discussion among R. Karp, R. Kleinberg, † C. Papadimitriou, E. Friedman, and others † at UC Berkeley

[2] Deterministic Allocation with Fixed Access Data collector accesses an r -subset of storage nodes, selected uniformly at random from the collection of all possible r -subsets, where r < n is a constant

[2] Deterministic Allocation with Fixed Access Equivalently, we can seek the allocation that minimizes the budget T, among all allocations that achieve a given probability of successful recovery

[2] Deterministic Allocation with Fixed Access Example: ( n, r ) = (6,2) Question: For any budget T, is there always a symmetric allocation that produces the maximum success probability?

[2] Deterministic Allocation with Fixed Access Question: What is the optimal symmetric allocation? For most choices of ( n, r, T ), the optimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally An example of an exception is ( n, r, T ) = (15, 3, 4.6) for which the optimal number of nodes to use, 9, is neither of the extremes

[2] Deterministic Allocation with Fixed Access For Probability-1 Recovery, the problem reduces to a simple LP Result 1: If we require all possible r -subsets to allow successful recovery, then we need a minimum budget of which corresponds to the allocation i.e. it is optimal to spread the budget maximally We can also bound the success probability above which this allocation is optimal

[3] Symmetric Probabilistic Allocation with Fixed Access Each storage node is used independently with constant probability s / n to store the same amount of data 1 / `, and the total storage used must be at most budget T in expectation

[3] Symmetric Probabilistic Allocation with Fixed Access Probability of successful recovery can be written as where “Bin( n, p )” denotes the binomial random variable with n trials and success probability p Reparameterizing in terms of budget T gives the success probability,, each nonempty node stores 1 / ` amount of data

[3] Symmetric Probabilistic Allocation with Fixed Access Result 2: For any r ≥ 2, and at any budget T large enough to support a success probability xXXxx P ( r, T, ` ) > 0.9 for some `, the choice of x x x x x x x x x x ` = r is optimal, i.e. it is best to spread the budget maximally each nonempty node stores 1 / ` amount of data

[3] Symmetric Probabilistic Allocation with Fixed Access As we increase the budget T, we observe a sharp change in the optimal allocation For small budgets and therefore low success probabilities, it is optimal to store the data object in its entirety ( ` = 1) and hope the data collector accesses at least one of the nonempty nodes For large budgets and therefore high success probabilities, it is optimal to store only 1 / r amount of data in each node used ( ` = r ) and hope the data collector accesses r of them r = 5

[3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r r = 5 each nonempty node stores 1 / ` amount of data

[3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r each nonempty node stores 1 / ` amount of data r = 5 store less store more increasing budget per node

Summary & Future Work [1] Deterministic Allocation with Probabilistic Access Suboptimality of symmetric allocations [2] Deterministic Allocation with Fixed Access Optimal allocation for high probability recovery Extreme point solutions not necessarily optimal for symmetric allocations Is there always a symmetric optimal allocation? [3]iSymmetric Probabilistic Allocation with Fixed Access Optimal allocation in high-probability regime Is there a phase transition in optimal allocation with increasing budget?

Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod