VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.

Slides:



Advertisements
Similar presentations
Megastore: Providing Scalable, Highly Available Storage for Interactive Services. Presented by: Hanan Hamdan Supervised by: Dr. Amer Badarneh 1.
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store Slides taken from created by paper authors Giuseppe DeCandia, Deniz Hastorun,
Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - V ALUE S TORE Presented By Roni Hyam Ami Desai.
Distributed Hash Tables Chord and Dynamo Costin Raiciu, Advanced Topics in Distributed Systems 18/12/2012.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Dynamo: Amazon's Highly Available Key-value Store Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Amazon Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Dynamo: Amazon’s Highly Available Key-value Store Adopted from slides and/or materials by paper authors (Giuseppe DeCandia, Deniz Hastorun, Madan Jampani,
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
Dynamo Kay Ousterhout. Goals Small files Always writeable Low latency – Measured at 99.9 th percentile.
Dynamo: Amazon’s Highly Available Key- value Store (SOSP’07) Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman,
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Rethinking Dynamo: Amazon’s Highly Available Key-value Store --An Offense Shih-Chi Chen Hongyu Gao.
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, et.al., SOSP ‘07.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Plan for Intro to Cloud Databases
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia et al. [Amazon.com] Jagrut Sharma CSCI-572 (Prof. Chris Mattmann)
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
Dynamo: Amazon’s Highly Available Key-value Store DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels PRESENTED.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - VALUE S TORE Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani 1 Professor : Dr Sheykh Esmaili.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
Dynamo: Amazon’s Highly Available Key-value Store
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Exam and Lecture Overview.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Partitioning and Replication.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Replication (1). Topics r Why Replication? r System Model r Consistency Models – How do we reason about the consistency of the “global state”? m Data-centric.
DYNAMO: AMAZON’S HIGHLY AVAILABLE KEY-VALUE STORE GIUSEPPE DECANDIA, DENIZ HASTORUN, MADAN JAMPANI, GUNAVARDHAN KAKULAPATI, AVINASH LAKSHMAN, ALEX PILCHIN,
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Introduction to Cloud.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Cassandra Architecture.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Peter Vosshall James Cheng CSE, CUHK.
Cassandra The Fortune Teller
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Trade-offs in Cloud Databases
Partitioning and Replication
Dynamo: Amazon’s Highly Available Key-value Store
Lecturer : Dr. Pavle Mogin
Providing Secure Storage on the Internet
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
EECS 498 Introduction to Distributed Systems Fall 2017
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer : Dr. Pavle Mogin

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 1 Plan for Amazon’s Dynamo Context Data Model Partitioning and Replication Data Versioning Executing get() and put() Membership changes Replica Synchronization and Anti-Entropy Algorithm –Reedings: Have a look at Readings on the Home Page

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 2 Context Dynamo is one of the CDBMS used at Amazon –The others are: SimpleDB or S3, and Simple Storage Service –Dynamo is used for simple services requiring data access via the primary key, like the Shopping Cart application At Amazon, Dynamo is used to manage services that: –Have very high reliability requirements and –Need a tight control over tradeoffs between: Availability, Consistency, Cost-effectiveness, and Performance Dynamo is already in use since 2006 and has influenced the design of a number of other NoSQL CDBMS’s

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 3 Design Requirements Technical context: –The infrastructure is made up of tens thousands of servers and network components located in many data centres around the world, –Commodity hardware is used, –Components failure is a “standard mode of operation”, –Amazon uses a highly decentralized, loosely coupled, service oriented architecture consisting of hundreds of services Business considerations: –A strict internal service level agreement (SLA) has to be met for, practically, all customers, regardless of the amount of processing their requests need A simple SLA : response time of 300 ms for 99.9% of requests for a peak client load of 500 requests per second –High reliability since even a slightest outage has significant financial consequences and impacts user’s trust –High scalability to support a continuous growth

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 4 System Design (Data Model and API) Data model: key/value –Most services at Amazon need only to store and retrieve data by primary key and do not require complex querying and data management functionality –The value part is a BLOB –Updates are limited to one key/value pair with no references Operations: –get(key), returning a list of objects and a context –put(key, context, value), with no return value –“ context ” is the system metadata containing a version vector –The get() operation may return more than one value if there is a conflict between objects with the given key –Dynamo treats key and value as opaque arrays of bytes –The key is hashed by the MD5 algorithm to determine the storage node

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 5 Design (Partitioning and Replication) To provide for incremental scalability, Dynamo uses consistent hashing to dynamically partition data across the present storage hosts –Each physical node contains a number of virtual nodes according to its performance Dynamo uses optimistic replication to ensure availability and durability in an environment where machine crushes are a standard mode of operation –Each data object is replicated n times A typical value for n at Amazon is 3 –Each node contains a list of nodes, called the preference list, for each key k to be stored A node from the top of the preference list becomes responsible for storing and replicating an update to the object with the key k

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 6 System Design (Data Versioning) Dynamo is designed to be an eventually consistent system that is always update available: –An update operation returns before all replica nodes have received and applied the update –Also, an update is accepted from a client even if it is apparent that the client is not aware of the latest version of the object To handle multiple versions of an object: –Dynamo uses a version vector (called vector clock), and –Always creates a new and immutable version of the object updated Many of the object versions are reconciled syntactically by Dynamo itself Whenever two replicas have ordered version vectors But some reads, may return a set of conflicting object versions that have to be reconciled semantically by an application knowing schema and business logic

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 7 Design ( get() and put()) Dynamo allows any storage node to receive a get() or put() request for any key –The node then uses the preference list to forward the request to a healthy prioritized storage host (the coordinator) To provide a consistent view to clients, Dynamo applies a quorum consistency protocol –Values of r = 2, w = 2, and n = 3 satisfy Amazon’s SLA, where r and w are minimum numbers of storage host to take part in a successful read or write, respectively, –Parameters r, w, and n are configurable by the application, –Applications needing the highest level of availability may set w = 1: Then, a write request is rejected only if all node in the system are unavailable –To achieve a higher level of durability, w should be greater than 1

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 8 Design (Membership Changes) Dynamo uses a gossip network communication protocol to transfer messages between nodes Node outages (due to a failure or maintenance) are often transient, although may last for extended intervals A node outage rarely signifies a permanent departure and therefore should not result in rebalancing of the partition assignment For these reasons, Dynamo uses an explicit mechanism for addition and removing nodes from a Dynamo consistent hashing ring

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 9 Design (Node Addition and Removal) (1) An administrator uses a command line tool to a node and issues a command for a node addition or removal The node stores the membership change The gossip protocol is used to propagate membership changes –Each second, a node chooses a random peer to exchange the information about membership changes When a new node joins the consistent hashing ring, a token is chosen for each of its virtual nodes and stored permanently –Tokens are spread to other nodes by gossip together with membership changes information –By having this information, nodes are able to send a request to a node responsible for the key range

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 10 Design (Node Addition and Removal) (2) Adding nodes to the system changes the ownership of key ranges on the ring When a node determines it is not responsible for a key range any more, it transfers objects to the new node At the removal of a node, database objects are relocated in a reverse process Temporary failure detection is performed during gossiping –To avoid failed attempts during get(), put() operations, and data transfers, a node A considers a node B temporarily inaccessible if the node B does not respond to a node A’s gossip message

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 11 Design (Handling of Failures) The hinted handoff is a technique used to compensate for not relocating database objects of temporarily failed nodes Dynamo’s quorum is a sloppy one, since the first n healthy nodes from the preference list for the key are used when executing a read or write operation –Some of these n nodes may even be not responsible for the key Hence, a new object may be written on a node j that is not responsible for the key, instead off on the node i being an intended recipient of the object’s replica The new object is stored along with a hint about the intended recipient node i of the replica When the node i revives, the node j sends the object to it and deletes the object

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 12 Hinted Handoff (Example) A H B C D E F G Replication factor n = 3 Temporary down The preference list for the key k: C, D, E. F, G,... The object (k, o) stored here (k, o) Hinted Handoff Not responsible for Range BC Cordinator

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 13 Design (Replica Synchronization) Hinted handoff works well if the system membership changes are infrequent and failures are transient There are scenarios under which hinted replicas may become unavailable before they can be returned to the original replica node To detect the inconsistencies between replicas faster and to minimize the amount of data transfer between nodes, Dynamo uses Merkle trees Merkle trees are used to discover differences in key sets of the same key range held on different nodes

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 14 Design (Merkle Trees)(1) A Merkle tree is a full binary hash tree where leaves are hashes of individual keys Parent nodes are hashes of their concatenated children Let k be the number of keys and h the height of the tree, then: –The number of tree leaves 2 h – 1 > k and –The k-th key has to be replicated r = (2 h – 1 – k) times in order to get a full tree Example: –k = 5, h = 4, the number of replicated keys is r = 3

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 15 Design (Merkle Trees)(2) Two Merkle trees have the same respective node values if they are produced using the same set of keys by applying the same hashing function Use of Merkle trees in comparing the key ranges of two replicas is performed in the following way: –If tree roots are the same, then key ranges contain the same keys –If tree roots are different, then their subtrees have to be compared By applying the rule above recursively, one finally finds a missing key

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 16 Merkle Trees (Example)(1) This is an extremely simplified example, far from reality The only aim of the example is to give you an idea how Merkle trees might be built and used in synchronizing replicas Assume: –The replica1 contains the following keys: 159, 973, 414, 003 –The replica2 contains the following keys: 159, 973, 414 –We use the hash function h(k) = k mod 7

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 17 Merkle Trees (Example)(2) Replica Replica

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 18 Design (Anti-Entropy Algorithm) Each physical node maintains a separate Merkle tree for each key range hosted by one of its virtual nodes Two nodes exchange roots of Merkle trees for the key ranges they have in common By applying the tree traversal scheme described, the nodes determine if they have any differences –If a difference exists, nodes apply a corresponding corrective action by copying the missing object

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 19 Failure of a Whole Data Centre A highly available storage system should be able to handle the failure of an entire data centre –A data centre failures happen due to: Power outages, Cooling failures, Network failures, and Natural disasters Dynamo is configured in such a way that each object is replicated across multiple data centres –Nodes in the preference list for a key belong to multiple centres

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 20 Summary (1) Dynamo is one of Amazon’s CDBMSs It is in use since 2006 and has influenced a number of other CDBMSs including Cassandra Data model: key-value with a very simple API Data partitioning and replication: consistent hash ring with optimistic replication Data versioning: vector clocks

Advanced Database Design and Implementation 2015 Amazon’s Dynamo 21 Summary(2) Network communication: gossip protocol Handling of failures: hinted hand-off is used to compensate for not relocating database objects of temporarily failed nodes Replica synchronization: to detect inconsistencies Merkle trees are used Anti-Entropy algorithm: two nodes exchange Merkle trees for key ranges they have in common, find differences in key ranges, and apply corrective actions