Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store Slides taken from created by paper authors Giuseppe DeCandia, Deniz Hastorun,
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - V ALUE S TORE Presented By Roni Hyam Ami Desai.
Distributed Hash Tables Chord and Dynamo Costin Raiciu, Advanced Topics in Distributed Systems 18/12/2012.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Dynamo: Amazon's Highly Available Key-value Store Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Amazon Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Dynamo: Amazon’s Highly Available Key-value Store Adopted from slides and/or materials by paper authors (Giuseppe DeCandia, Deniz Hastorun, Madan Jampani,
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
Chubby Lock server for distributed applications 1Dennis Kafura – CS5204 – Operating Systems.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Dynamo Kay Ousterhout. Goals Small files Always writeable Low latency – Measured at 99.9 th percentile.
Overview Distributed vs. decentralized Why distributed databases
Dynamo: Amazon’s Highly Available Key- value Store (SOSP’07) Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman,
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Dynamo: Amazon's Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, et.al., SOSP ‘07.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia et al. [Amazon.com] Jagrut Sharma CSCI-572 (Prof. Chris Mattmann)
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
Dynamo: Amazon’s Highly Available Key-value Store DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels PRESENTED.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - VALUE S TORE Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani 1 Professor : Dr Sheykh Esmaili.
Dynamo: Amazon’s Highly Available Key-value Store
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Databases Illuminated
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
DYNAMO: AMAZON’S HIGHLY AVAILABLE KEY-VALUE STORE GIUSEPPE DECANDIA, DENIZ HASTORUN, MADAN JAMPANI, GUNAVARDHAN KAKULAPATI, AVINASH LAKSHMAN, ALEX PILCHIN,
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Peer-to-Peer Data Management
Dynamo: Amazon’s Highly Available Key-value Store
A Replica Location Service
CHAPTER 3 Architectures for Distributed Systems
Scaling Out Key-Value Storage
Providing Secure Storage on the Internet
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems

Dynamo Dennis Kafura – CS5204 – Operating Systems Context Core e-commerce services need scalable and reliable storage for massive amounts of data  n x 100 of services  n x 100,000 concurrent sessions on key services Size and scalability require a storage architecture that is  highly decentralized  high component count  commodity hardware High component count creates reliability problems (“treats failure handling as the normal case”) Address reliability problems by replication Replication raises issues of:  Consistency (replicas differ after failure)  Performance When to enforce consistency (on read, on write) Who enforces consistency (client, storage system) 2

Dynamo System Elements Maintain state of services with  High reliability requirements  Latency-sensitive performance  Control tradeoff between consistency and performance Used only internally  Can leverage characteristics of services and workloads  Non-hostile environment (no security requirements) Simple key-value interface  Applications do not require more complicated (e.g. database) semantics or hierarchical name space  Key is unique identifier for data item; Value is a binary object (blob)  No operations over multiple data items Adopts weaker model of consistency (eventual consistency) in favor of higher availability Service level agreements (SLA)  At 99.9% percentile  Key factors: service latency at a given request rate  Example: response time of 300ms for 99.9% of requests at peak client load of 500 requests per second  State manage (storage) efficiency a key factor in SLAs Dennis Kafura – CS5204 – Operating Systems3

Dynamo Design Considerations Consistency vs. availability  Strict consistency means that data is unavailable in case of failure to one of the replicas  To improve availability, use weaker form of consistency (eventual consistency) allow optimistic updates (changes propagate in the background)  Can lead to conflicting changes which must be detected and resolved Conflicts  Dynamo applications require “always writeable” storage  Perform conflict detection/resolution on reads Other factors  Incremental scalability  Symmetry/decentralization (P2P organization/control)  Heterogeneity (not all servers the same) Dennis Kafura – CS5204 – Operating Systems4

Dynamo Design Overview Dennis Kafura – CS5204 – Operating Systems5

Dynamo Partitioning Interface  get(key) Returns context and A single object or a list of conflicting objects  put(key, context, object) Context from previous read Object placement/replication  MD5 hash of key yields 128 bit identifier  Consistent hashing Dennis Kafura – CS5204 – Operating Systems6 preference list

Dynamo Versioning Failure free operation What to do in case of failure? Dennis Kafura – CS5204 – Operating Systems7 ? put replicas put replicas

Dynamo Versioning Object content is treated as immutable and an update operation creates a new version Dennis Kafura – CS5204 – Operating Systems8

Dynamo Versioning Versoning can lead to inconsistency  Due to network partitioning Dennis Kafura – CS5204 – Operating Systems9 put

Dynamo Versioning Versoning can lead to inconsistency  Due to concurrent updates Dennis Kafura – CS5204 – Operating Systems10 put a put b

Dynamo Object Resolution Uses vector-clocks Conflicting versions passed to application as output of get operation Application resolves conflicts and puts a new (consistent) version Inconsistent version rare: 99.94% of get operations saw exactly one version Dennis Kafura – CS5204 – Operating Systems11

Dynamo Handling get/put operations Operating handled by coordinator:  First among the top N nodes in the preference list  Located by call to load balancer (no Dynamo-specific node needed in application but may require extra level of indirection) Direct call to coordinator (via Dynamo-specific client library) Quorum voting  R nodes must agree to a get operation  W nodes must agree to a put operation  R+W > N  (N, R, W) can be chosen to achieve desired tradeoff  Common configuration is (3,2,2) “Sloppy quorum”  Top N’ healthy nodes in the preference list  Coordinator is first in this group  Replicas sent to node contain a “hint” indicating the (unavailable) original node that should hold the replica  Hinted replicas are stored by available node and sent forwarded when original node recovers. Dennis Kafura – CS5204 – Operating Systems12

Dynamo Replica synchronization Accelerates detection of inconsistent replicas using Merkle tree Separate tree maintained by each node for each key range Adds overhead to maintain Merkle trees Dennis Kafura – CS5204 – Operating Systems13

Dynamo Ring membership Nodes are explicitly added to/removed from a ring Membership, partitioning, and placement information propagates via periodic exchanges (a gossip protocol) Existing nodes transfer key ranges to newly added node or receive key ranges from exiting nodes Nodes eventually know key ranges of its peers and can forward requests to them Some “seed” nodes are well-known Nodes failures detected by lack of responsiveness and recovery detected by periodic retry Dennis Kafura – CS5204 – Operating Systems14

Dynamo Partition/Placement Strategies Dennis Kafura – CS5204 – Operating Systems15 StrategyPlacementPartition 1T random tokens per nodeConsecutive tokens create a partition 2T random tokens per nodeQ equal sized partitions 3Q/S tokens per nodeQ equal sized partitions S = number of nodes

Dynamo Strategy Performance Factors Strategy 1  Bootstrapping of new node is lengthy It must acquire its key ranges from other nodes Other nodes process scanning/transmission of key ranges for new node as background activities Has taken a full day during peak periods  Numerous nodes many have to adjust their Merkle trees when a new node joins/leaves system  Archival process difficult Key ranges may be in transit No obvious synchronization/checkpointing structure Dennis Kafura – CS5204 – Operating Systems16

Dynamo Strategy Performance Factors Strategy 2  Decouples partition and placement  Allows changing of placement scheme at run-time Strategy 3  Decouples partition and placement  Faster bootstrapping/recovery and ease of archiving because key ranges can be segregates into different files that can be shared/archived separately Dennis Kafura – CS5204 – Operating Systems17

Dynamo Partition Strategies - Performance Strategies have different tuning parameters Fair comparison: evaluate the skew in their load distributions for a fixed amount of space to maintain membership information Strategy 3 superior Dennis Kafura – CS5204 – Operating Systems18

Dynamo Client- vs Server-Side Coordination Any node can coordinate read requests; write requests handled by coordinator State-machine for coordination can be in load balancing server or incorporated into client Client-driven coordination has lower latency because it avoids extra network hop (redirection) Dennis Kafura – CS5204 – Operating Systems19