Presentation is loading. Please wait.

Presentation is loading. Please wait.

Replica Placement Strategy for Wide-Area Storage Systems Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004.

Similar presentations


Presentation on theme: "Replica Placement Strategy for Wide-Area Storage Systems Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004."— Presentation transcript:

1 Replica Placement Strategy for Wide-Area Storage Systems Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004

2 Final Presentation:2 Environment Store large quantities of data persistently and availably Storage Strategy –Redundancy - duplicate data to protect against data loss –Place data throughout wide area for availability and durability Avoid correlated failures –Continuously repair loss redundancy as needed Detect permanent node failures and trigger data recovery

3 Final Presentation:3 Assumptions Data is maintained on nodes, in the wide area, and in well maintained sites. Sites contribute resources –Nodes (storage, cpu) –Network - bandwidth Nodes collectively maintain data –Adaptive - Constant change, Self-organizing, self-maintaining Costs –Data Recovery Process of maintaining data availability –Limit wide area bandwidth used to maintain data

4 Final Presentation:4 Challenge Avoiding correlated failures/downtime with careful data placement –Minimize cost of resources used to maintain data Storage Bandwidth –Maximize Data availability

5 Final Presentation:5 Outline Analysis of correlated failures –Show that correlated failures exist - are significant Effects of common subnet (admin area, geographic location, etc) –Pick a threshold and extra redundancy Effects of extra redundancy –Vary extra redundancy –Compare random, random w/ constraint, and oracle placement –Show that margin between oracle and random is small

6 Final Presentation:6 Analysis of PlanetLab Trace characteristics Trace-driven simulation Model maintaining data on PlanetLab Create trace using all-pairs ping* –Collected from February 16, 2003 to October 6, 2004 Measure –Correlated failures v. time –Probability of k nodes down simultaneously –{5th Percentile, Median} number of available replicas v. time –Cumulative number of triggered data recovery v. time *Jeremy Stribling http://infospect.planet-lab.org/pings

7 Final Presentation:7 Analysis of PlanetLab II Correlated failures

8 Final Presentation:8 Analysis I - Node characteristics

9 Final Presentation:9 Analysis II- Correlated Failures

10 Final Presentation:10 Correlated Failures

11 Final Presentation:11 Correlated Failures (machine with downtime <= 1000 slots)

12 Final Presentation:12 Availability Trace

13 Final Presentation:13 Replica Placement Strategies Random RandomSite –Avoid to place multiple replicas in the same site –A site in PlanetLab is identified by 2B IP address prefix. RandomBlacklist –Avoid to use machines, in blacklist, that are top k machines with long down time RandomSiteBlacklist –Combine RandomSite and RandomBlacklist

14 Final Presentation:14 Comparison of simple strategies (m=1, th=9, n=14, |blacklist|=35) StrategyRandomRandom Site Random Blacklist Random Site Blacklist # of repairs 9075858186918160 Improve ment (%) 5.444.2310.08

15 Final Presentation:15 Simulation setup Placement Algorithm –Random vs. Oracle –Oracle strategies Max-Lifetime-Availability Min-Max-TTR, Min-Sum-TTR, Min-Mean-TTR Simulation Parameters –Replication m = 1, threshold th = 9, total replicas n = 15 –Initial repository size 2TB –Write rate 1Kbps per node and 10Kbps per node 300 storage nodes System increases in size at rate of 3TB and 30TB per year, respective. Metrics –Number of available nodes –Number of data repairs

16 Final Presentation:16 Comparison of simple strategies(m=1, th=9)

17 Final Presentation:17 Results - Random Placement (1Kbps)

18 Final Presentation:18 Results - Oracle Max-Lifetime-Avail (1Kbps)

19 Final Presentation:19 Results - Breakdown of Random (1Kbps)

20 Final Presentation:20 Results - Random (10Kbps)

21 Final Presentation:21 Results - Breakdown of Random (10Kbps)

22 Final Presentation:22 Conclusion There does exist correlated downtimes. Random is sufficient –A minimum data availability threshold and extra redundancy is sufficient to absorb most correlation.


Download ppt "Replica Placement Strategy for Wide-Area Storage Systems Byung-Gon Chun and Hakim Weatherspoon RADS Final Presentation December 9, 2004."

Similar presentations


Ads by Google