Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16.

Similar presentations


Presentation on theme: "Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16."— Presentation transcript:

1 Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16

2 Motivation

3 0.1

4 A B C

5 Success probability = 0.9 0 × 0.1 5 × 0 successful 0-subsets + 0.9 1 × 0.1 4 × 2 successful 1-subsets + 0.9 2 × 0.1 3 × 7 successful 2-subsets + 0.9 3 × 0.1 2 × 9 successful 3-subsets + 0.9 4 × 0.1 1 × 5 successful 4-subsets + 0.9 5 × 0.1 0 × 1 successful 5-subsets = 0.99 A

6 Motivation Success probability = 0.9 0 × 0.1 5 × 0 successful 0-subsets + 0.9 1 × 0.1 4 × 0 successful 1-subsets + 0.9 2 × 0.1 3 × 0 successful 2-subsets + 0.9 3 × 0.1 2 × 10 successful 3-subsets + 0.9 4 × 0.1 1 × 5 successful 4-subsets + 0.9 5 × 0.1 0 × 1 successful 5-subsets = 0.99144 B

7 Motivation Success probability = 0.9 0 × 0.1 5 × 0 successful 0-subsets + 0.9 1 × 0.1 4 × 0 successful 1-subsets + 0.9 2 × 0.1 3 × 6 successful 2-subsets + 0.9 3 × 0.1 2 × 10 successful 3-subsets + 0.9 4 × 0.1 1 × 5 successful 4-subsets + 0.9 5 × 0.1 0 × 1 successful 5-subsets = 0.9963 C

8 MotivationA B C 0.99 0.99144 0.9963

9 0.1 accessmodel

10 Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective

11 Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Source s has a data object of unit size It can use n storage nodes to store x 1, x 2, …, x n amount of data But faces an aggregate storage budget T, i.e. Access by the Data Collector Objective

12 Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Data collector t attempts to recover the data object by accessing a subset r of storage nodes It succeeds when the total amount of data accessed is at least the size of the data object, i.e. Objective

13 Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Storage Allocation Access by the Data Collector Objective We seek the optimal allocation that maximizes the probability of successful recovery

14 Problem Description x How do we use storage nodes to store a data object reliably, subject to an aggregate storage budget? x Difficulty Problem is nonconvex Large space of possible symmetric and nonsymmetric allocations (an allocation is symmetric if all its nonzero elements are equal, and nonsymmetric otherwise)

15 [1] Deterministic Allocation with Probabilistic Access Data collector accesses each storage node independently with constant probability p

16 Symmetric allocations can be suboptimal † Given n = 5 storage nodes, budget T = 12 / 5, and p = 0.9, the nonsymmetric allocation performs better than the optimal symmetric allocation Finding the optimal symmetric allocation is also nontrivial [1] Deterministic Allocation with Probabilistic Access † Originally from a discussion among R. Karp, R. Kleinberg, † C. Papadimitriou, E. Friedman, and others † at UC Berkeley

17 [2] Deterministic Allocation with Fixed Access Data collector accesses an r -subset of storage nodes, selected uniformly at random from the collection of all possible r -subsets, where r < n is a constant

18 [2] Deterministic Allocation with Fixed Access Equivalently, we can seek the allocation that minimizes the budget T, among all allocations that achieve a given probability of successful recovery

19 [2] Deterministic Allocation with Fixed Access Example: ( n, r ) = (6,2) Question: For any budget T, is there always a symmetric allocation that produces the maximum success probability?

20 [2] Deterministic Allocation with Fixed Access Question: What is the optimal symmetric allocation? For most choices of ( n, r, T ), the optimal allocation either concentrates the budget over a minimal number of nodes, or spreads it out maximally An example of an exception is ( n, r, T ) = (15, 3, 4.6) for which the optimal number of nodes to use, 9, is neither of the extremes

21 [2] Deterministic Allocation with Fixed Access For Probability-1 Recovery, the problem reduces to a simple LP Result 1: If we require all possible r -subsets to allow successful recovery, then we need a minimum budget of which corresponds to the allocation i.e. it is optimal to spread the budget maximally We can also bound the success probability above which this allocation is optimal

22 [3] Symmetric Probabilistic Allocation with Fixed Access Each storage node is used independently with constant probability s / n to store the same amount of data 1 / `, and the total storage used must be at most budget T in expectation

23 [3] Symmetric Probabilistic Allocation with Fixed Access Probability of successful recovery can be written as where “Bin( n, p )” denotes the binomial random variable with n trials and success probability p Reparameterizing in terms of budget T gives the success probability,, each nonempty node stores 1 / ` amount of data

24 [3] Symmetric Probabilistic Allocation with Fixed Access Result 2: For any r ≥ 2, and at any budget T large enough to support a success probability xXXxx P ( r, T, ` ) > 0.9 for some `, the choice of x x x x x x x x x x ` = r is optimal, i.e. it is best to spread the budget maximally each nonempty node stores 1 / ` amount of data

25 [3] Symmetric Probabilistic Allocation with Fixed Access As we increase the budget T, we observe a sharp change in the optimal allocation For small budgets and therefore low success probabilities, it is optimal to store the data object in its entirety ( ` = 1) and hope the data collector accesses at least one of the nonempty nodes For large budgets and therefore high success probabilities, it is optimal to store only 1 / r amount of data in each node used ( ` = r ) and hope the data collector accesses r of them r = 5

26 [3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r r = 5 each nonempty node stores 1 / ` amount of data

27 [3] Symmetric Probabilistic Allocation with Fixed Access We conjecture that for any r and T, the optimal choice of ` that maximizes success probability P ( r, T, ` ) is either ` = 1 or ` = r each nonempty node stores 1 / ` amount of data r = 5 store less store more increasing budget per node

28 Summary & Future Work [1] Deterministic Allocation with Probabilistic Access Suboptimality of symmetric allocations [2] Deterministic Allocation with Fixed Access Optimal allocation for high probability recovery Extreme point solutions not necessarily optimal for symmetric allocations Is there always a symmetric optimal allocation? [3]iSymmetric Probabilistic Allocation with Fixed Access Optimal allocation in high-probability regime Is there a phase transition in optimal allocation with increasing budget?

29 Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16


Download ppt "Distributed Storage Allocation Problems Derek Leong, Alexandros G. Dimakis, Tracey Ho California Institute of Technology NetCod 2009 2009-06-16."

Similar presentations


Ads by Google