CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1.

Slides:



Advertisements
Similar presentations
More About Transaction Management Chapter 10. Contents Transactions that Read Uncommitted Data View Serializability Resolving Deadlocks Distributed Databases.
Advertisements

Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
Distributed Databases Chapter 22 By Allyson Moran.
Dynamo: Amazon's Highly Available Key-value Store Distributed Storage Systems CS presented by: Hussam Abu-Libdeh.
CS 603 Handling Failure in Commit February 20, 2002.
Distributed databases
Transaction.
CIS 720 Concurrency Control. Timestamp-based concurrency control Assign a timestamp ts(T) to each transaction T. Each data item x has two timestamps:
Distributed Databases
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems
Distributed Database Management Systems
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
NoSQL Database.
Distributed Databases
Distributed Storage System Survey
CS162 Section Lecture 10 Slides based from Lecture and
1. Big Data A broad term for data sets so large or complex that traditional data processing applications ae inadequate. 2.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Trade-offs in Cloud.
Replication March 16, Replication What is Replication?  A technique for increasing availability, fault tolerance and sometimes, performance 
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Transactions Chapter 13
Transactions and Their Distribution Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems October 20, 2015.
Implementation of Database Systems, Jarek Gryz1 Distributed Databases Chapter 21, Part B.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
DISTRIBUTED DATABASES 1. RECAP: PARALLEL DATABASES Three possible architectures Shared-memory Shared-disk Shared-nothing (the most common one) Parallel.
Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.
University of Tampere, CS Department Distributed Commit.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Instructor: Marina Gavrilova. Outline Introduction Types of distributed databases Distributed DBMS Architectures and Storage Replication Synchronous replication.
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
Databases Illuminated
Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Distributed synchronization and mutual exclusion Distributed Transactions.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
Distributed Transactions Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 15, 2008.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Distributed Transactions Chapter – Vidya Satyanarayanan.
CAP Theorem Justin DeBrabant CIS Advanced Systems - Fall 2013.
Distributed Databases
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
CSCI5570 Large Scale Data Processing Systems
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
CS 440 Database Management Systems
Introduction to NewSQL
Consistency and CAP.
EECS 498 Introduction to Distributed Systems Fall 2017
CS 440 Database Management Systems
Outline Announcements Fault Tolerance.
Distributed Systems, Consensus and Replicated State Machines
PERSPECTIVES ON THE CAP THEOREM
distributed databases
Implementing Consistency -- Paxos
Distributed Databases
Presentation transcript:

CS 540 Database Management Systems NoSQL & NewSQL Some slides due to Magda Balazinska 1

Motivation Web 2.0 applications – thousands or millions of users. – users perform both reads and updates. How to scale DBMS? – Vertical scaling: moving the application to larger computers: multiple cores and/or CPUs limited and expensive! – Horizontal scaling: distribute the data and workload over many servers (nodes) 2

DBMS over a cluster of servers Client-Server Collaborating-Server CLIENT SERVER QUERY SERVER QUERY Client ships query to single site. All query processing at server. Query can span multiple sites.

Data partitioning to improve performance Sharding: horizontal partitioning by some key and store records on different nodes. Vertical: store sets of attributes (columns) on different nodes: Lossless-join; tids. Each node handles a portion read/write requests. TID t1 t2 t3 t4

Replication Gives increased availability. Faster query ( request) evaluation. – each node has more information and does not need to communicate with others. Synchronous vs. Asynchronous. – Vary in how current copies are. R1 R2 R3 node A node B

Replication: consistency of copies Synchronous: All copies of a modified data item must be updated before the modifying Xact commits. – Xact could be a single write operation – copies are consistent Asynchronous: Copies of a modified data item are only periodically updated; different copies may get out of synch in the meantime. – copies may be inconsistent over periods of time.

Consistency Users and developers see the DBMS as coherent and consistent single-machine DBMS. – Developers do not need to know how to write concurrent programs => easier to use DBMS should support ACID transactions – Multiple nodes (servers) run parts of the same Xact – They all must commit, or none should commit

Xact commit over clusters Assumptions: – Each node logs actions at that site, but there is no global log – There is a special node, called the coordinator, which starts and coordinates the commit process. – Nodes communicate through sending messages Algorithm??

Two-Phase Commit (2PC) Node at which Xact originates is coordinator; other nodes at which it executes are subordinates. When an Xact wants to commit: ¬ Coordinator sends prepare msg to each subordinate. ­ Subordinate force-writes an abort or prepare log record and then sends a no or yes msg to coordinator.

Two-Phase Commit (2PC) When an Xact wants to commit: ® If coordinator gets unanimous yes votes, force- writes a commit log record and sends commit msg to all subs. Else, force-writes abort log rec, and sends abort msg. ¯ Subordinates force-write abort/commit log rec based on msg they get, then send ack msg to coordinator. ° Coordinator writes end log rec after getting all acks.

Comments on 2PC Two rounds of communication: first, voting; then, termination. Both initiated by coordinator. Any node can decide to abort an Xact. Every msg reflects a decision by the sender; to ensure that this decision survives failures, it is first recorded in the local log. All commit protocol log recs for an Xact contain Xactid and Coordinatorid. The coordinator’s abort/commit record also includes ids of all subordinates.

Restart after a failure at a node If we have a commit or abort log rec for Xact T, but not an end rec, must redo/undo T. – If this node is the coordinator for T, keep sending commit/abort msgs to subs until acks received. If we have a prepare log rec for Xact T, but not commit/abort, this node is a subordinate for T. – Repeatedly contact the coordinator to find status of T, then write commit/abort log rec; redo/undo T; and write end log rec. If we don’t have even a prepare log rec for T, unilaterally abort and undo T. – This site may be coordinator! If so, subs may send msgs.

2PC: discussion Guarantees ACID properties, but expensive – Communication overhead => I/O access. Relies on central coordinator: both performance bottleneck, and single-point-of-failure – Other nodes depend on the coordinator, so if it slows down, 2PC will be slow. – Solution: Paxos a distributed protocol.

Eventual consistency “It guarantees that, if no additional updates are made to a given data item, all reads to that item will eventually return the same value.” Peter Bailis et. al., Eventual Consistency Today: Limitations, Extensions, and Beyond, ACM Queue The copies are not synch over periods of times, but they will eventually have the same value: they will converge. There are several methods to implement eventual consistency; we discuss vector clocks in Amazon Dynamo:

Vector clocks Each data item D has a set of [server, timestamp] pairs D([s 1,t 1 ], [s 2,t 2 ],...) Example: A client writes D1 at server S X : D1 ([S X,1]) Another client reads D1, writes back D2; also handled by server S X : D2 ([S X,2]) (D1 garbage collected) Another client reads D2, writes back D3; handled by server S Y : D3 ([S X,2], [S Y,1]) Another client reads D2, writes back D4; handled by server S Z : D4 ([S X,2], [S Z,1]) Another client reads D3, D4: CONFLICT !

Vector clock: interpretation A vector clock D[(S 1,v 1 ),(S 2,v 2 ),...] means a value that represents version v1 for S1, version v2 for S2, etc. If server S i updates D, then: – It must increment vi, if (S i, v i ) exists – Otherwise, it must create a new entry (S i,1)

Vector clock: conflicts A data item D is an ancestor of D’ if for all (S,v) ∈ D there exists (S,v’) ∈ D’ s.t. v ≤ v’ – they are on the same branch; there is not conflict. Otherwise, D and D’ are on parallel branches, and it means that they have a conflict that needs to be reconciled semantically.

Vector clock: conflict examples Data item 1Data item 2Conflict? ([S X,3],[S Y,6])([S X,3],[S Z,2])

Vector clock: conflict examples Data item 1Data item 2Conflict? ([S X,3],[S Y,6])([S X,3],[S Z,2])Yes ([S X,3])([S X,5])

Vector clock: conflict examples Data item 1Data item 2Conflict? ([S X,3],[S Y,6])([S X,3],[S Z,2])Yes ([S X,3])([S X,5])No ([S X,3],[S Y,6])([S X,3],[S Y,6],[S Z,2])

Vector clock: conflict examples Data item 1Data item 2Conflict? ([S X,3],[S Y,6])([S X,3],[S Z,2])Yes ([S X,3])([S X,5])No ([S X,3],[S Y,6])([S X,3],[S Y,6],[S Z,2])No ([S X,3],[S Y,10])([S X,3],[S Y,6],[S Z,2])

Vector clock: conflict examples Data item 1Data item 2Conflict? ([S X,3],[S Y,6])([S X,3],[S Z,2])Yes ([S X,3])([S X,5])No ([S X,3],[S Y,6])([S X,3],[S Y,6],[S Z,2])No ([S X,3],[S Y,10])([S X,3],[S Y,6],[S Z,2])Yes ([S X,3],[S Y,10])([S X,3],[S Y,20],[S Z,2])

Vector clock: conflict examples Data item 1Data item 2Conflict? ([S X,3],[S Y,6])([S X,3],[S Z,2])Yes ([S X,3])([S X,5])No ([S X,3],[S Y,6])([S X,3],[S Y,6],[S Z,2])No ([S X,3],[S Y,10])([S X,3],[S Y,6],[S Z,2])Yes ([S X,3],[S Y,10])([S X,3],[S Y,20],[S Z,2])No

Vector clock: reconciling conflicts Client sends the read request to coordinator Coordinator sends read request to all N replicas If it gets R < N responses, returns the data item – This method is called sloppy quorum If there is a conflict, informs the developer and returns all vector clocks. – Developer has to take care of the conflict!! Example: updating a shopping card – Mark deletion with a flag; merge insertions and deletions – Deletion in one branch and addition in the other one? Developer may not know what happens earlier. Business logic decision => Amazon likes to keep the item in the shopping card!!

Vector clocks: discussion It does not have the communication overheads and waiting time of 2PC and ACID – Better running time Developers have to resolve the conflicts – It may be hard for complex applications – Dynamo argument: conflicts rarely happened in our applications of interest. – Their experiments is not exhaustive; There is not (yet) a final answer on choosing between ACID and eventual consistency – Know what you gain and what you sacrifice; make the decision based on your application(s).

CAP Theorem About the properties of data distributed systems Published by Eric Brewer in Consistency: all replicas should have the same value. Availability: all read/write operations should return successfully Tolerance to Partitions: system should tolerate network partitions. “CAP Theorem”: A distributed data system can have only two of the aforementioned properties. – not really a theorem; the concepts are not formalized.

CAP Theorem illustration Both nodes available, no network partition: Update A.R1 => inconsistency; sacrificing consistency: C To make it consistent => one node shuts down; sacrificing availability: A To make it consistent => nodes communicate; sacrificing tolerance to partition: P R1 R2 R3 node Anode B

CAP Theorem: examples Having consistency and availability; no tolerance to partition – single machine DBMS Having consistency and tolerance to partition; no availability – majority protocol in distributed DBMS – makes minority partitions unavailable Having availability and tolerance to partition; no consistency – DNS

Justification for NoSQL based on CAP Distributed data systems cannot forfeit tolerance to partition (P) – Must choose between consistency ( C) and availability ( A) Availability is more important for the business! – keeps customers buying stuff! We should sacrifice consistency

Criticism to CAP Many including Brewer himself in a 2012 paper at Computer magazine. It is not really a “Theorem” as the concepts are not well defined. – A version was formalized and proved later but under more limited conditions. – C, A, and P are not binary Availability over a period of time – Subsystems may make their own individual choices