Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes.

Slides:

Advertisements

Similar presentations

Consistency in Distributed Systems

Advertisements

Outline Introduction Background Distributed Database Design

Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry.

Exploiting Distributed Version Concurrency in a Transactional Memory Cluster Kaloian Manassiev, Madalin Mihailescu and Cristiana Amza University of Toronto,

Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.

1/27 Replication and Query Processing in the APPA Data Management System Reza AKBARINIA Vidal MARTINS Esther PACITTI Patrick VALDURIEZ.

Presented By Alon Adler – Based on OSDI ’12 (USENIX Association)

Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.

1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,

Chapter 13 (Web): Distributed Databases

1 P2P Logging and Timestamping for Reconciliation M. Tlili, W. Dedzoe, E. Pacitti, R. Akbarinia, P. Valduriez, P. Molli, G. Canals, S. Laurière VLDB Auckland,

Predicting Replicated Database Scalability Sameh Elnikety, Microsoft Research Steven Dropsho, Google Inc. Emmanuel Cecchet, Univ. of Mass. Willy Zwaenepoel,

Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.

CMPT 431 Dr. Alexandra Fedorova Lecture XII: Replication.

Chapter 13 Replica Management in Grids

EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Session - 14 CONCURRENCY CONTROL CONCURRENCY TECHNIQUES Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:

Overview Distributed vs. decentralized Why distributed databases

Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.

Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.

CS 603 Data Replication in Oracle February 27, 2002.

Outline Introduction Background Distributed Database Design

Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.

Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.

6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.

Database Replication. Replication Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software.

1 The Google File System Reporter: You-Wei Zhang.

Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.

/11/2003 C-JDBC: a High Performance Database Clustering Middleware Nicolas Modrzyk

Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.

Presented by Dr. Greg Speegle April 12,  Two-phase commit slow relative to local transaction processing  CAP Theorem  Option 1: Reduce availability.

Scaling Dynamic Content Applications through Data Replication - Opportunities for Compiler Optimizations Cristiana Amza UofT.

VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.

Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of Computer Science McGill University, Montreal.

Usenix Annual Conference, Freenix track – June 2004 – 1 : Flexible Database Clustering Middleware Emmanuel Cecchet – INRIA Julie Marguerite.

Concurrency Control in Distributed Databases Gul Sabah Arif.

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.

Chapter 15: Achieving High Availability Through Replication.

IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.

1 Distributed Databases BUAD/American University Distributed Databases.

Databases Illuminated

第5讲一致性与复制 §5.1 副本管理 Replica Management §5.2 一致性模型 Consistency Models

Computer Science Lecture 13, page 1 CS677: Distributed OS Last Class: Canonical Problems Distributed synchronization and mutual exclusion Distributed Transactions.

SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.

GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.

Feb 1, 2001CSCI {4,6}900: Ubiquitous Computing1 Eager Replication and mobile nodes Read on disconnected clients may give stale data Eager replication prohibits.

1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.

Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.

A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís.

CS 3471 CS 347: Parallel and Distributed Data Management Notes13: Time and Clocks.

1/12 Distributed Transactional Memory for Clusters and Grids EuroTM, Paris, May 20th, 2011 Michael Schöttner.

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.

Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.

DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.

Distributed DBMS, Query Processing and Optimization

Oracle9i Performance Tuning Chapter 11 Advanced Tuning Topics.

Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.

Highly Available Services and Transactions with Replicated Data Jason Lenthe.

Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.

CS6320 – Performance L. Grewe.

Lecturer : Dr. Pavle Mogin

Introduction to NewSQL

Consistency and Replication

The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.

Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)

Providing Secure Storage on the Internet

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

EEC 688/788 Secure and Dependable Computing

Presentation transcript:

Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes - France * University of Waterloo - Canada July 2005

2 LINA / INRIA – Atlas Group Outline  Motivations  Cluster Architecture  Preventive Replication  Multi-Master  Partially Replicated configurations  Replication Manager Architecture  Optimizations  RepDB* Prototype  Experiments  Conclusions, Current and Future Work

3 LINA / INRIA – Atlas Group Motivations  Applications and Data are asynchronously replicated among a set of cluster nodes connected by a fast and reliable network to improve users requests response times  Use of lazy preventive replication to enforce data consistency Cluster of n PC nodes External Users Requests

4 LINA / INRIA – Atlas Group Cluster system architecture Cluster Architecture

5 LINA / INRIA – Atlas Group Preventive Replication (1) Properties:  Strong consistency  Non-blocking  Scale and Speeds Up Highly  High Data availability

6 LINA / INRIA – Atlas Group Preventive Replication (2) Assumptions:  Network interface provides FIFO reliable multicast  Max is the upper bound of time needed to multicast a message from a node i and to be received at a receiving node j  Clocks are  -synchronized  Each transaction has a timestamp C value (arrival time)

7 LINA / INRIA – Atlas Group Preventive Replication (3)  Consistency Criteria  Total Order Enforcement: Transactions are received in the same order at all involved nodes: correspond to the execution order  To enforce total order, transactions are chronologically ordered at each node using its delivery_time value: delivery_time = C + Max + ε T is received at node i node i Wait until delivery_time T node j

8 LINA / INRIA – Atlas Group Preventive Replication (4)  Whenever a node i receives T  Propagation: It multi-cast T to all nodes including itself  Scheduling: At each node T’s delivery-time expires if and only if it is the older transaction  Execution: When T’s delivery-time expires then T is entirely executed

9 LINA / INRIA – Atlas Group Partial Architecture

10 LINA / INRIA – Atlas Group R S r', s' r'', s'' R 1, S 1 R 2, S 2 R 3, S 3 R 4, S 4 Bowtie Fully replicated Partially replicated R 1, S 1 S2S2 R2R2 Partially replicated R 1, S R 2, s' R3R3 s'' Preventive Replication (4)  PRIMARY copies (R): Can be updated only on master node  Secondary copies (r): read-only  MULTIMASTER copies (R 1 ): Can be updated on more than one node

11 LINA / INRIA – Atlas Group Preventive Replication (5)  Introduces Max + ε delay time  Negligible in Cluster Networks  Critical in bursty workloads  Data placement restrictions  Lazy-Master, Fully replicated  In Fully-Replicated  Overhead of message exchanges  Not all nodes may have enough place to stores all replicas => Free data placement

12 LINA / INRIA – Atlas Group  In the case where all data are not fully replicated, some transactions cannot be executed on target nodes Example: UPDATE r SET c1 WHERE c2 IN (SELECT c3 FROM s); N2 T1(R, S) R 1, S 1 S2S2 R2R2 N3 N1 Partially Replicated Configurations (1)

13 LINA / INRIA – Atlas Group  On target nodes, T 1 waits after its selection (Step 3)  At the end of the execution on the origin node, a Refresh Transaction (RT 1 ) is multicast to target nodes (Step 4)  RT 1 is executed to update replicated data R 1, S 1 R2R2 S2S2 N1 N2 N3 Client T 1 (r S, w R ) R 1, S 1 R2R2 S2S2 N1 N2 N3 R 1, S 1 R2R2 S2S2 N1 N2N3 Client Answer T 1 R 1, S 1 R2R2 S2S2 N1 N2 N3 Step 1 R 1, S 1 R2R2 S2S2 N1 N2 N3 Step 2 Step 3Step 4Step 5 T 1 (r S, w R ) Standby RT 1 (w R ) Perform Partially Replicated Configurations (2)

14 LINA / INRIA – Atlas Group Data Placement  Tables must have a Primary Key  A node i can not hold primary copies which has Foreign keys of others tables which are not held by node i ITEM, ORDER (On N 3, a order can be done on an item which doesn’t exist) N3N3 N1N1 order N2N2

15 LINA / INRIA – Atlas Group Replication Manager Architecture

16 LINA / INRIA – Atlas Group Optimization: Eliminating delay times (1)  In a cluster network, messages are naturally totally ordered  Schedule a transaction in parallel with its execution  Submitting a transaction to execution as soon as it is received  Schedule the commit order of the transactions: A transaction can be committed only after Max + ε  Abort and re execute all younger transactions when a transaction is received out of order  Concurrent execution of non conflicting transactions

17 LINA / INRIA – Atlas Group Optimization: Eliminating delay times (2) Scheduling Execution T Validation SchedulingValidationExecution T Abort Preventive replication: Optimized Preventive Replication:

18 LINA / INRIA – Atlas Group Optimisation Example (3)

19 LINA / INRIA – Atlas Group Optimization: Eliminating delay times (4)  Without the optimization, the refreshment time of a transaction T is always delayed by: Max + ε + t  With the optimization, the refreshment time of a transaction T is : Maximum((Max + ε), t), where t is the time spent to execute T

20 LINA / INRIA – Atlas Group RepDB* Prototype: Architecture DBMS Clients Replica Interface JDBC server Log Monitor DBMS specific Propagator Receiver Refresher Deliver Network JDBC RepDB*

21 LINA / INRIA – Atlas Group RepDB* Prototype: Implementation  Java (around lines)  DBMS is a black-box  Interface JDBC (RMI-JDBC)  Use of Spread toolkit to manage the network (Center for Networking and Distributed Systems - CNDS)  Simulation version (SimJava) 

22 LINA / INRIA – Atlas Group Replicas definition (1)  A file contains the replica placement specification: R S T R S

23 LINA / INRIA – Atlas Group Interface: Applications / RepDB* (2) Connection c; Statement s; Class.forName(“org.atlas.repdb.jdbc.Driver”); c = DriverManager.getConnection( ” jdbc:repdb://node0:4444/”, ”login”, ”password”); s = c.createStatement(); s.executeUpdate( “ R, S T “ + “UPDATE R SET att2 = 1 WHERE att1 IN “ + “(SELECT att3 FROM T); “+ “UPDATE S SET att2 = 1 WHERE att1 NOT IN “ + “(SELECT att3 FROM T);” ); s.close(); c.close();

24 LINA / INRIA – Atlas Group Experiments (1): TPC-C benchmark  1 / 5 / 10 Warehouses  10 clients per Warehouse  Transactions’ arrival rate is 1s / 200ms / 100ms  4 types of transactions:  New-order: Read-Write, high frequency (45%)  Payment: Read-Write, high frequency (45%)  Order-status: Read, low frequency (5%)  Stock-level: Read, low frequency (5%)

25 LINA / INRIA – Atlas Group Experiments (2)  Cluster of 64 nodes  PostgreSQL  1 Gb/s network  2 Configurations  Fully Replicated (FR)  Partially Replicated (PR): each type of TPC- C transaction runs using ¼ of the nodes.

26 LINA / INRIA – Atlas Group Experiments (3): Scale up a) Fully Replicated (FR)b) Partially Replicated (PR)

27 LINA / INRIA – Atlas Group Experiments (4): Speed up + Launch 128 clients that submit Order-status transactions (read-only) a) Fully Replicated (FR)b) Partially Replicated (PR)

28 LINA / INRIA – Atlas Group Experiments (5): Unordored messages a) Fully Replicated (FR)b) Partially Replicated (PR)

29 LINA / INRIA – Atlas Group Experiments (6): Delay x Trans. size

30 LINA / INRIA – Atlas Group Conclusions  Preventive replication  Strong consistency  Prevents conflicts for partially replicated databases  Full node autonomy  Scale and Seeps up  Experiments show the configuration and the placement of the copies should be tuned to selected types of transactions

31 LINA / INRIA – Atlas Group Current and Future Work  Preventive Replication for P2P systems  Small and Dynamic multi-master groups  Max is computed dynamically  Small and dynamic slave groups  Optimistic Replication  Distributed Semantic Reconcialiation

32 LINA / INRIA – Atlas Group Thanks ! Merci ! Obrigado ! Questions ?