We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byDexter Lepper
Modified about 1 year ago
© 2005 Andreas Haeberlen, Rice University 1 Glacier: Highly durable, decentralized storage despite massive correlated failures Andreas Haeberlen Alan Mislove Peter Druschel Rice University Houston, TX 2nd Symposium on Networked Systems Design & Implementation (NSDI) Boston, MA May 2-4, 2005
2 © 2005 Andreas Haeberlen, Rice University Introduction Many distributed applications require storage Cooperative storage: Aggregate storage on participating nodes Advantages: Resilient Highly scalable Examples: Farsite, PAST, OceanStore Structured overlay network
3 © 2005 Andreas Haeberlen, Rice University Motivation Common assumption: High node diversity Failure independence Unrealistic! Node population may have low diversity (e.g. OS) Worms can cause large-scale correlated Byzantine failures Reactive systems are too slow to prevent data loss
4 © 2005 Andreas Haeberlen, Rice University Related Work Phoenix, OceanStore use introspection: Build failure model Store data on nodes with low correlation Limitations: Model must reflect all possible correlations Even small inaccuracies may lead to data loss Users have an incentive to report incorrect data
5 © 2005 Andreas Haeberlen, Rice University Our Approach: Glacier Create massive redundancy to ensure that data survives any correlated failure with high probability Assumption: Magnitude of the failure can be bounded by fraction f max Challenges: Minimize storage and bandwidth requirements Withstand attacks, Byzantine failures
6 © 2005 Andreas Haeberlen, Rice University Glacier: Insertion When a new object is inserted: 1. Apply erasure code 2. Attach manifest with hashes of fragments 3. Send each fragment to a different node No remote delete operation, but lifetime of objects can be limited Storage is lease-based; reclaims unused storage X
7 © 2005 Andreas Haeberlen, Rice University Glacier: Maintenance Nodes with distance store similar fragments Periodic maintenance: Ask a peer node for its list of fragments Compare with local list, recover any missing fragments Fragments remain on their nodes during offline periods ? X
8 © 2005 Andreas Haeberlen, Rice University Glacier: Recovery During a failure, some fragments are damaged or lost Communication may not be possible Unaffected nodes do not take any special action: Failed nodes are eventually repaired Maintenance gradually restores lost fragments Time Insert Correlated failure T fail Offline period
9 © 2005 Andreas Haeberlen, Rice University Glacier: Durability Example configuration: 48 fragments, any 5 sufficient for recovery Bad news: Storage overhead 9.6x Good news: Survives 60% correlated failure with P= (single object) f max DurabilityCodeFragmentsStorage More storage Higher durability
10 © 2005 Andreas Haeberlen, Rice University Aggregation If objects are small: Huge number of fragments High overhead for storage, management Solution: Aggregate objects before storing them in Glacier Challenges: Untrusted environment Aggregates must be self-authenticating App Glacier App Aggreg. Glacier
11 © 2005 Andreas Haeberlen, Rice University Aggregation: Links Mapping from objects to aggregates is crucial! Need durability Need authentication Solution: Link aggregates Result: DAG Can recover mapping by traversing the DAG DAG forms a hash tree; easy to authenticate Top-level pointer is kept in Glacier itself
12 © 2005 Andreas Haeberlen, Rice University Evaluation Two sets of experiments: Trace-driven simulations (scalability, churn,...) Actual deployment: ePOST ePOST: A cooperative, serverless system In production use: Initially 17 users, 20 nodes Based on FreePastry, PAST, Scribe, POST Added Glacier for durability Glacier configuration in ePOST: 48 fragments, 0.2 encoding f max =0.6, P= days of practical experience (incl. some failures)
13 © 2005 Andreas Haeberlen, Rice University Evaluation: Storage Inherent storage overhead: 48/5= GB of on-disk storage for 1.3GB of data Actual storage overhead on disk: About 12.6
14 © 2005 Andreas Haeberlen, Rice University Evaluation: Network load During stable periods, traffic is comparable to PAST In the ePOST experiment, a misconfiguration caused frequent traffic spikes Long off-line periods were mistaken for failures
15 © 2005 Andreas Haeberlen, Rice University Evaluation: Recovery Experiment: Created a 'clone' of the ePOST ring with only 13 of the 31 nodes (a 58% failure!) Started recovery process on a freshly installed node: User entered address and date of last use Glacier located head of aggregate tree, recovered it System was again ready for use; no data loss
16 © 2005 Andreas Haeberlen, Rice University Conclusions Large-scale correlated failures are a realistic threat to distributed storage systems Glacier provides hard durability guarantees with minimal assumptions about the failure model Glacier transforms abundant but unreliable disk space into reliable storage Bandwidth cost is low Thank you!
17 © 2005 Andreas Haeberlen, Rice University Glacier is available! Download: Serverless, secure Easy to set up Uses Glacier for durability
Chapter 17: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Shadow Paging Recovery With Concurrent Transactions.
Dynamo: Amazon’s Highly Available Key-value Store Slides taken from created by paper authors Giuseppe DeCandia, Deniz Hastorun,www.slideworld.com.
Vipul Patel Ideas … Please …
AG Projects P2P SIP Principles & Technologies Overview of P2P SIP Principles & Technologies by Dan Pascu International SIP Conference Paris, 27 February.
Version 4.1 CCNA Discovery 2– Chapter 7. Contents 7.1: ISP Services : TCP / IP Protocols 7.2: 7.3: DNS 7.3: 7.4: Application Layer Protocols 7.4.
File Systems. Storing Information Applications can store it in the process address space Why is it a bad idea? –Size is limited to size of virtual address.
Distributed Computing Dr. Eng. Ahmed Moustafa Elmahalawy Computer Science and Engineering Department.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 10: Storage and.
Pushing group communication to the edge will enable radically new distributed applications Ken Birman Cornell University.
G.Vadivu & P.Subha, Assistant Professor, SRM University, Kattankulathur 8/22/2011School of Computing, Department of IT.
File Concept A file is a named collection of related information that is recorded on secondary storage. A file has a define structure, which we must know.
DC-API: Unified API for Desktop Grid Systems Gábor Gombás MTA SZTAKI.
Routing An Engineering Approach to Computer Networking.
1 P2P Systems Dan Rubenstein Columbia University Thanks to: B. Bhattacharjee, K. Ross, A. Rowston,
Advanced data management Jiaheng Lu Department of Computer Science Renmin University of China
What is an Operating System? A program that acts as an intermediary between a user of a computer and the computer hardware. Operating system goals: Execute.
Distributed Processing, Client/Server and Clusters Chapter 16.
Peer-to-peer and agent-based computing P2P Algorithms & Issues.
INFSO-RI Enabling Grids for E-sciencE Building a robust distributed system: some lessons from R-GMA CHEP-07, Victoria,
A Construction of Locality-Aware Overlay Network: mOverlay and Its Performance Found in: IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO.
Paxos and Zookeeper Roy Campbell. Motivation Centralized service:- Coordination kernel Maintains – configuration information, – naming, – distributed.
An Introduction to Peer-to-Peer networks Diganta Goswami IIT Guwahati.
Reliable Internet Routing Martin Suchara Thesis advisor Prof. Jennifer Rexford June 15, 2011.
©Ian Sommerville 2000Dependable systems specification Slide 1 Chapter 17 Critical Systems Specification.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Presented By : Vibhuti Dhiman. Outline : 1.Paper Abstract 2. Introduction 3.Design and Implementation 3.1 Event Propagation 3.2 HyperFlow Controller Application.
© 2013 A. Haeberlen NETS 212: Scalable and Cloud Computing 1 University of Pennsylvania Storage at Facebook December 3, 2013.
File-System Interface File-System Interface. Chapter 10: File-System Interface File Concept Access Methods Directory Structure File-System Mounting File.
SOLPHY POLSKA Product Presentation SOLPHY Home Storage.
GENI Distributed Services Preliminary Requirements and Design Tom Anderson and Amin Vahdat (co-chairs) David Andersen, Mic Bowman, Frans Kaashoek, Arvind.
© 2016 SlidePlayer.com Inc. All rights reserved.