CS 440 Database Management Systems

Slides:



Advertisements
Similar presentations
More About Transaction Management Chapter 10. Contents Transactions that Read Uncommitted Data View Serializability Resolving Deadlocks Distributed Databases.
Advertisements

Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
Distributed Databases Chapter 22 By Allyson Moran.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
CS 603 Handling Failure in Commit February 20, 2002.
Transaction.
Distributed Databases
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
NoSQL Database.
Distributed Databases
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
1. Big Data A broad term for data sets so large or complex that traditional data processing applications ae inadequate. 2.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Implementation of Database Systems, Jarek Gryz1 Distributed Databases Chapter 21, Part B.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Instructor: Marina Gavrilova. Outline Introduction Types of distributed databases Distributed DBMS Architectures and Storage Replication Synchronous replication.
Databases Illuminated
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
CSCI5570 Large Scale Data Processing Systems
Introduction Many distributed systems require that participants agree on something On changes to important data On the status of a computation On what.
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Trade-offs in Cloud Databases
CS 245: Database System Principles Notes 11: Modern and Distributed Transactions Peter Bailis CS 245 Notes 11.
Dynamo: Amazon’s Highly Available Key-value Store
Lecturer : Dr. Pavle Mogin
CSCI5570 Large Scale Data Processing Systems
Outline Introduction Background Distributed DBMS Architecture
Introduction to NewSQL
Consistency and CAP.
NoSQL Databases An Overview
EECS 498 Introduction to Distributed Systems Fall 2017
Commit Protocols CS60002: Distributed Systems
CS 440 Database Management Systems
RELIABILITY.
Outline Introduction Background Distributed DBMS Architecture
Outline Announcements Fault Tolerance.
CSIS 7102 Spring 2004 Lecture 6: Distributed databases
Distributed Systems, Consensus and Replicated State Machines
2PC Recap Eventual Consistency & Dynamo
2PC Recap Eventual Consistency & Dynamo
Replication and Recovery in Distributed Systems
PERSPECTIVES ON THE CAP THEOREM
Distributed Databases
Lecture 21: Replication Control
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
EECS 498 Introduction to Distributed Systems Fall 2017
Distributed Databases Recovery
UNIVERSITAS GUNADARMA
distributed databases
Lecture 21: Replication Control
Implementing Consistency -- Paxos
CIS 720 Concurrency Control.
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Data Distribution and Distributed Transaction Management
Distributed Databases
Transaction Communication
Presentation transcript:

CS 440 Database Management Systems NoSQL & NewSQL

Motivation Web 2.0 applications How to scale DBMS? thousands or millions of users. users perform both reads and updates. How to scale DBMS? Vertical scaling: moving the application to larger computers: multiple cores and/or CPUs limited and expensive! Horizontal scaling: distribute the data and workload over many servers (nodes)

DBMS over a cluster of servers QUERY Client-Server CLIENT CLIENT Client ships query to single site. All query processing at server. SERVER SERVER SERVER SERVER Collaborating-Server SERVER Query can span multiple sites. SERVER QUERY

Data partitioning to improve performance TID Data partitioning to improve performance t1 t2 t3 t4 Sharding: horizontal partitioning by some key and store records on different nodes. Vertical: store sets of attributes (columns) on different nodes: Lossless-join; tids. Each node handles a portion read/write requests.

Replication Gives increased availability. Faster query (request) evaluation. each node has more information and does not need to communicate with others. Synchronous vs. Asynchronous. Vary in how current copies are. node A R1 R3 node B R1 R2

Replication: consistency of copies Synchronous: All copies of a modified data item must be updated before the modifying Xact commits. Xact could be a single write operation copies are consistent Asynchronous: Copies of a modified data item are only periodically updated; different copies may get out of synch in the meantime. copies may be inconsistent over periods of time.

Consistency Users and developers see the DBMS as coherent and consistent single-machine DBMS. Developers do not need to know how to write concurrent programs => easier to use DBMS should support ACID transactions Multiple nodes (servers) run parts of the same Xact They all must commit, or none should commit

Xact commit over clusters Assumptions: Each node logs actions at that site, but there is no global log There is a special node, called the coordinator, which starts and coordinates the commit process. Nodes communicate through sending messages Algorithm??

Two-Phase Commit (2PC) Node at which Xact originates is coordinator; other nodes at which it executes are subordinates. When an Xact wants to commit: Coordinator sends prepare msg to each subordinate. Subordinate force-writes an abort or prepare log record and then sends a no or yes msg to coordinator.

Two-Phase Commit (2PC) When an Xact wants to commit: If coordinator gets unanimous yes votes, force-writes a commit log record and sends commit msg to all subs. Else, force-writes abort log rec, and sends abort msg. Subordinates force-write abort/commit log rec based on msg they get, then send ack msg to coordinator. Coordinator writes end log rec after getting all acks.

Comments on 2PC Two rounds of communication: first, voting; then, termination. Both initiated by coordinator. Any node can decide to abort an Xact. Every msg reflects a decision by the sender; to ensure that this decision survives failures, it is first recorded in the local log. All commit protocol log recs for an Xact contain Xactid and Coordinatorid. The coordinator’s abort/commit record also includes ids of all subordinates.

Restart after a failure at a node If we have a commit or abort log rec for Xact T, but not an end rec, must redo/undo T. If this node is the coordinator for T, keep sending commit/abort msgs to subs until acks received. If we have a prepare log rec for Xact T, but not commit/abort, this node is a subordinate for T. Repeatedly contact the coordinator to find status of T, then write commit/abort log rec; redo/undo T; and write end log rec. If we don’t have even a prepare log rec for T, unilaterally abort and undo T. This site may be coordinator! If so, subs may send msgs.

2PC: discussion Guarantees ACID properties, but expensive Communication overhead => I/O access. Relies on central coordinator: both performance bottleneck, and single-point-of-failure Other nodes depend on the coordinator, so if it slows down, 2PC will be slow. Solution: Paxos a distributed protocol.

Eventual consistency “It guarantees that, if no additional updates are made to a given data item, all reads to that item will eventually return the same value.” Peter Bailis et. al., Eventual Consistency Today: Limitations, Extensions, and Beyond, ACM Queue The copies are not synch over periods of times, but they will eventually have the same value: they will converge. There are several methods to implement eventual consistency; we discuss vector clocks in Amazon Dynamo: http://aws.amazon.com/dynamodb/

Vector clocks Each data item D has a set of [server, timestamp] pairs D([s1,t1], [s2,t2],...) Example: A client writes D1 at server SX: D1 ([SX,1]) Another client reads D1, writes back D2; also handled by server SX: D2 ([SX,2]) (D1 garbage collected) Another client reads D2, writes back D3; handled by server SY: D3 ([SX,2], [SY,1]) Another client reads D2, writes back D4; handled by server SZ: D4 ([SX,2], [SZ,1]) Another client reads D3, D4: CONFLICT !

Vector clock: interpretation A vector clock D[(S1,v1),(S2,v2),...] means a value that represents version v1 for S1, version v2 for S2, etc. If server Si updates D, then: It must increment vi, if (Si, vi) exists Otherwise, it must create a new entry (Si,1)

Vector clock: conflicts A data item D is an ancestor of D’ if for all (S,v)∈D there exists (S,v’)∈D’ s.t. v ≤ v’ they are on the same branch; there is not conflict. Otherwise, D and D’ are on parallel branches, and it means that they have a conflict that needs to be reconciled semantically.

Vector clock: conflict examples Data item 1 Data item 2 Conflict? ([SX,3],[SY,6]) ([SX,3],[SZ,2])

Vector clock: conflict examples Data item 1 Data item 2 Conflict? ([SX,3],[SY,6]) ([SX,3],[SZ,2]) Yes ([SX,3]) ([SX,5])

Vector clock: conflict examples Data item 1 Data item 2 Conflict? ([SX,3],[SY,6]) ([SX,3],[SZ,2]) Yes ([SX,3]) ([SX,5]) No ([SX,3],[SY,6],[SZ,2])

Vector clock: conflict examples Data item 1 Data item 2 Conflict? ([SX,3],[SY,6]) ([SX,3],[SZ,2]) Yes ([SX,3]) ([SX,5]) No ([SX,3],[SY,6],[SZ,2]) ([SX,3],[SY,10])

Vector clock: conflict examples Data item 1 Data item 2 Conflict? ([SX,3],[SY,6]) ([SX,3],[SZ,2]) Yes ([SX,3]) ([SX,5]) No ([SX,3],[SY,6],[SZ,2]) ([SX,3],[SY,10]) ([SX,3],[SY,20],[SZ,2])

Vector clock: conflict examples Data item 1 Data item 2 Conflict? ([SX,3],[SY,6]) ([SX,3],[SZ,2]) Yes ([SX,3]) ([SX,5]) No ([SX,3],[SY,6],[SZ,2]) ([SX,3],[SY,10]) ([SX,3],[SY,20],[SZ,2])

Vector clock: reconciling conflicts Client sends the read request to coordinator Coordinator sends read request to all N replicas If it gets R < N responses, returns the data item This method is called sloppy quorum If there is a conflict, informs the developer and returns all vector clocks. Developer has to take care of the conflict!! Example: updating a shopping card Mark deletion with a flag; merge insertions and deletions Deletion in one branch and addition in the other one? Developer may not know what happens earlier. Business logic decision => Amazon likes to keep the item in the shopping card!!

Vector clocks: discussion It does not have the communication overheads and waiting time of 2PC and ACID Better running time Developers have to resolve the conflicts It may be hard for complex applications Dynamo argument: conflicts rarely happened in our applications of interest. Their experiments are not exhaustive; There is not (yet) a final answer on choosing between ACID and eventual consistency Know what you gain and what you sacrifice; make the decision based on your application(s).

CAP Theorem About the properties of data distributed systems Published by Eric Brewer in 1999 - 2000 Consistency: all replicas should have the same value. Availability: all read/write operations should return successfully Tolerance to Partitions: system should tolerate network partitions. “CAP Theorem”: A distributed data system can have only two of the aforementioned properties. not really a theorem; the concepts are not formalized.

CAP Theorem illustration node A node B R1 R2 R1 R3 Both nodes available, no network partition: Update A.R1 => inconsistency; sacrificing consistency: C To make it consistent => one node shuts down; sacrificing availability: A To make it consistent => nodes communicate; sacrificing tolerance to partition: P If A (B) shuts down; read/write request to R2 (R3) will not be successfully answered.

CAP Theorem: examples Having consistency and availability; no tolerance to partition single machine DBMS Having consistency and tolerance to partition; no availability majority protocol in distributed DBMS makes minority partitions unavailable Having availability and tolerance to partition; no consistency DNS

Justification for NoSQL based on CAP Distributed data systems cannot forfeit tolerance to partition (P) Must choose between consistency ( C) and availability ( A) Availability is more important for the business! keeps customers buying stuff! We should sacrifice consistency

Criticism to CAP Many including Brewer himself in a 2012 paper at Computer magazine. It is not really a “Theorem” as the concepts are not well defined. A version was formalized and proved later but under more limited conditions. C, A, and P are not binary Availability over a period of time Subsystems may make their own individual choices