AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available.

Slides:



Advertisements
Similar presentations
Dynamo: Amazon’s Highly Available Key-value Store
Advertisements

Dynamo: Amazon’s Highly Available Key-value Store Slides taken from created by paper authors Giuseppe DeCandia, Deniz Hastorun,
CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Dynamo: Amazon’s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat.
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - V ALUE S TORE Presented By Roni Hyam Ami Desai.
Distributed Hash Tables Chord and Dynamo Costin Raiciu, Advanced Topics in Distributed Systems 18/12/2012.
Cassandra Structured Storage System over a P2P Network Avinash Lakshman, Prashant Malik.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
Dynamo: Amazon's Highly Available Key-value Store Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
Amazon Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Dynamo: Amazon’s Highly Available Key-value Store Adopted from slides and/or materials by paper authors (Giuseppe DeCandia, Deniz Hastorun, Madan Jampani,
1 Dynamo Amazon’s Highly Available Key-value Store Scott Dougan.
Dynamo Highly Available Key-Value Store 1Dennis Kafura – CS5204 – Operating Systems.
Dynamo Kay Ousterhout. Goals Small files Always writeable Low latency – Measured at 99.9 th percentile.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Dynamo: Amazon’s Highly Available Key- value Store (SOSP’07) Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman,
Rethinking Dynamo: Amazon’s Highly Available Key-value Store --An Offense Shih-Chi Chen Hongyu Gao.
A Decentralized Structure Storage Model - Avinash Lakshman & Prashanth Malik - Presented by Srinidhi Katla CASSANDRA.
Wide-area cooperative storage with CFS
Dynamo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as well as related cloud storage implementations.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
CS162 Operating Systems and Systems Programming Key Value Storage Systems November 3, 2014 Ion Stoica.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
Amazon’s Dynamo System The material is taken from “Dynamo: Amazon’s Highly Available Key-value Store,” by G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati,
Dynamo: Amazon's Highly Available Key-value Store
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, et.al., SOSP ‘07.
Cloud Storage – A look at Amazon’s Dyanmo A presentation that look’s at Amazon’s Dynamo service (based on a research paper published by Amazon.com) as.
Dynamo: Amazon’s Highly Available Key-value Store Presented By: Devarsh Patel 1CS5204 – Operating Systems.
EECS 262a Advanced Topics in Computer Systems Lecture 22 P2P Storage: Dynamo November 14 th, 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering.
Cloud Storage: All your data belongs to us! Theo Benson This slide includes images from the Megastore and the Cassandra papers/conference slides.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-Peer in the Datacenter: Amazon Dynamo Aaron Blankstein COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia et al. [Amazon.com] Jagrut Sharma CSCI-572 (Prof. Chris Mattmann)
Dynamo: Amazon’s Highly Available Key-value Store COSC7388 – Advanced Distributed Computing Presented By: Eshwar Rohit
Cloud Computing Cloud Data Serving Systems Keke Chen.
Dynamo: Amazon's Highly Available Key-value Store Dr. Yingwu Zhu.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Dynamo: Amazon’s Highly Available Key-value Store DeCandia, Hastorun, Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels PRESENTED.
Google’s Big Table 1 Source: Chang et al., 2006: Bigtable: A Distributed Storage System for Structured Data.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
D YNAMO : A MAZON ’ S H IGHLY A VAILABLE K EY - VALUE S TORE Presenters: Pourya Aliabadi Boshra Ardallani Paria Rakhshani 1 Professor : Dr Sheykh Esmaili.
Dynamo: Amazon’s Highly Available Key-value Store
Cassandra - A Decentralized Structured Storage System
CSE 486/586 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Peer to Peer Networks Distributed Hash Tables Chord, Kelips, Dynamo Galen Marchetti, Cornell University.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Dynamo: Amazon’s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin,
DYNAMO: AMAZON’S HIGHLY AVAILABLE KEY-VALUE STORE GIUSEPPE DECANDIA, DENIZ HASTORUN, MADAN JAMPANI, GUNAVARDHAN KAKULAPATI, AVINASH LAKSHMAN, ALEX PILCHIN,
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Bigtable: A Distributed Storage System for Structured Data
Department of Computer Science, Johns Hopkins University EN Instructor: Randal Burns 24 September 2013 NoSQL Data Models and Systems.
Big Data Yuan Xue CS 292 Special topics on.
Kitsuregawa Laboratory Confidential. © 2007 Kitsuregawa Laboratory, IIS, University of Tokyo. [ hoshino] paper summary: dynamo 1 Dynamo: Amazon.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Amazon’s Dynamo Lecturer.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Cassandra - A Decentralized Structured Storage System
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Dynamo: Amazon’s Highly Available Key-value Store
Lecturer : Dr. Pavle Mogin
Lecture 9: Dynamo Instructor: Weidong Shi (Larry), PhD
Providing Secure Storage on the Internet
EECS 498 Introduction to Distributed Systems Fall 2017
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

AMAZON’S KEY-VALUE STORE: DYNAMO DeCandia,Hastorun,Jampani, Kakulapati, Lakshman, Pilchin, Sivasubramanian, Vosshall, Vogels: Dynamo: Amazon's highly available key-value store. SOSP 2007 UCSB CS2711 Adapted from Amazon’s Dynamo Presentation

Motivation Reliability at a massive scale Slightest outage  significant financial consequences High write availability Amazon’s platform: 10s of thousands of servers and network components, geographically dispersed Provide persistent storage in spite of failures Sacrifice consistency to achieve performance, reliability, and scalability UCSB CS2712

Dynamo Design rationale Most services need key-based access: – Best-seller lists, shopping carts, customer preferences, session management, sales rank, product catalog, and so on. Prevalent application design based on RDBMS technology will be catastrophic. Dynamo therefore provides primary-key only interface. UCSB CS2713

Dynamo Design Overview Data partitioning using consistent hashing Data replication Consistency via version vectors Replica synchronization via quorum protocol Gossip-based failure-detection and membership protocol UCSB CS2714

System Requirements Data & Query Model: – Read/write operations via primary key – No relational schema: use object – Object size < 1 MB, typically. Consistency guarantees: – Weak – Only single key updates – Not clear if read-modify-write isolate Efficiency: – SLA 99.9 percentile of operations Notes: – Commodity hardware – Minimal security measures since for internal use UCSB CS2715

Service Level Agreements (SLA) Application can deliver its functionality in a bounded time: Every dependency in the platform needs to deliver its functionality with even tighter bounds. Example SLA: service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second. UCSB CS2716

System Interface Two basic operations: – Get(key): Locates replicas Returns the object + context (encodes meta data including version) – Put(key, context, object): Writes the replicas to the disk Context: version (vector timestamp) Hash(key)  128-bit identifier UCSB CS2717

Partition Algorithm Consistent hashing: the output range of a hash function is treated as a fixed circular space or “ring” a la Chord. “ Virtual Nodes”: Each node can be responsible for more than one virtual node (to deal with non-uniform data and load distribution) UCSB CS2718

Virtual Nodes UCSB CS2719

Advantages of using virtual nodes The number of virtual nodes that a node is responsible can be decided based on its capacity, accounting for heterogeneity in the physical infrastructure. A real node’s load can be distributed across the ring, thus ensuring a hot spot is not targeted to a single node. If a node becomes unavailable the load handled by this node is evenly dispersed across the remaining available nodes. When a node becomes available again, the newly available node accepts a roughly equivalent amount of load from each of the other available nodes. UCSB CS27110

Replication Each data item is replicated at N hosts. preference list: The list of nodes that is responsible for storing a particular key. Some fine-tuning to account for virtual nodes UCSB CS27111

Replication UCSB CS27112

Replication UCSB CS27113

Preference Lists List of nodes responsible for storing a particular key. Due to failures, preference list contains more than N nodes. Due to virtual nodes, preference list skips positions to ensure distinct physical nodes. UCSB CS27114

Data Versioning A put() call may return to its caller before the update has been applied at all the replicas A get() call may return many versions of the same object. Challenge: an object may have distinct versions Solution: use vector clocks in order to capture causality between different versions of same object. UCSB CS27115

Vector Clock A vector clock is a list of (node, counter) pairs. Every version of every object is associated with one vector clock. If the all counters on the first object’s clock are less-than-or-equal to all of the counters in the second clock, then the first is an ancestor of the second and can be forgotten. Application reconciles divergent versions and collapses into a single new version. UCSB CS27116

Vector clock example UCSB CS27117

Routing requests Route request through a generic load balancer that will select a node based on load information. Use a partition-aware client library that routes requests directly to relevant node. A gossip protocol propagates membership changes. Each node contacts a peer chosen at random every second and the two nodes reconcile their membership change histories. UCSB CS27118

Sloppy Quorum R and W is the minimum number of nodes that must participate in a successful read/write operation. Setting R + W > N yields a quorum-like system. In this model, the latency of a get (or put) operation is dictated by the slowest of the R (or W) replicas. For this reason, R and W are usually configured to be less than N, to provide better latency and availability. UCSB CS27119

Highlights of Dynamo High write availability Optimistic: vector clocks for resolution Consistent hashing (Chord) in controlled environment Quorums for relaxed consistency. UCSB CS27120

CASSANDRA (FACEBOOK) Lakshman and Malik Cassandra—A Decentralized Structured Storage System. LADIS 2009 UCSB CS27121

Data Model Key-value store—more like Bigtable. Basically, a distributed multi-dimensional map indexed by a key. Value is structured into Columns, which are grouped into Column Families: simple and super (column family within a column family). An operation is atomic on a single row. API: insert, get and delete. UCSB CS27122

System Architecture Like Dynamo (and Chord). Uses order preserving hash function on a fixed circular space. Node responsible for a key is called the coordinator. Non-uniform data distribution: keep track of data distribution and reorganize if necessary. UCSB CS27123

Replication Each item is replicated at N hosts. Replicas can be: Rack Unaware; Rack Aware (within a data center); Datacenter Aware. System has an elected leader. When a node joins the system, the leader assigns it a range of data items and replicas. Each node is aware of every other node in the system and the range they are responsible for. UCSB CS27124

Membership and Failure Detection Gossip-based mechanism to maintain cluster membership. A node determines which nodes are up and down using a failure detector. The Φ accrual failure detector returns a suspicion level, Φ, for each monitored node. Say a node suspects A when Φ=1, 2, 3, then the likelihood of a mistake is 10%, 1% and.1%. Every node maintains a sliding window of interarrival times of gossip messages from other nodes to determine distribution of interarrival times and then calculate Φ. Approximate using an exponential distribution. UCSB CS27125

Operations Use quorums: R and W If R+W < N then read will return latest value. – Read operations return value with highest timestamp, so may return older versions – Read Repair: with every read, send newest version to any out-of-date replicas. – Anti-Entropy: compute Merkle tree to catch any out of synch data (expensive) Each write: first into a persistent commit log, then an in-memory data structure. UCSB CS27126