Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes.

Similar presentations


Presentation on theme: "Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes."— Presentation transcript:

1 Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes - France * University of Waterloo - Canada July 2005

2 2 LINA / INRIA – Atlas Group Outline  Motivations  Cluster Architecture  Preventive Replication  Multi-Master  Partially Replicated configurations  Replication Manager Architecture  Optimizations  RepDB* Prototype  Experiments  Conclusions, Current and Future Work

3 3 LINA / INRIA – Atlas Group Motivations  Applications and Data are asynchronously replicated among a set of cluster nodes connected by a fast and reliable network to improve users requests response times  Use of lazy preventive replication to enforce data consistency Cluster of n PC nodes External Users Requests

4 4 LINA / INRIA – Atlas Group Cluster system architecture Cluster Architecture

5 5 LINA / INRIA – Atlas Group Preventive Replication (1) Properties:  Strong consistency  Non-blocking  Scale and Speeds Up Highly  High Data availability

6 6 LINA / INRIA – Atlas Group Preventive Replication (2) Assumptions:  Network interface provides FIFO reliable multicast  Max is the upper bound of time needed to multicast a message from a node i and to be received at a receiving node j  Clocks are  -synchronized  Each transaction has a timestamp C value (arrival time)

7 7 LINA / INRIA – Atlas Group Preventive Replication (3)  Consistency Criteria  Total Order Enforcement: Transactions are received in the same order at all involved nodes: correspond to the execution order  To enforce total order, transactions are chronologically ordered at each node using its delivery_time value: delivery_time = C + Max + ε T is received at node i node i Wait until delivery_time T node j

8 8 LINA / INRIA – Atlas Group Preventive Replication (4)  Whenever a node i receives T  Propagation: It multi-cast T to all nodes including itself  Scheduling: At each node T’s delivery-time expires if and only if it is the older transaction  Execution: When T’s delivery-time expires then T is entirely executed

9 9 LINA / INRIA – Atlas Group Partial Architecture

10 10 LINA / INRIA – Atlas Group R S r', s' r'', s'' R 1, S 1 R 2, S 2 R 3, S 3 R 4, S 4 Bowtie Fully replicated Partially replicated R 1, S 1 S2S2 R2R2 Partially replicated R 1, S R 2, s' R3R3 s'' Preventive Replication (4)  PRIMARY copies (R): Can be updated only on master node  Secondary copies (r): read-only  MULTIMASTER copies (R 1 ): Can be updated on more than one node

11 11 LINA / INRIA – Atlas Group Preventive Replication (5)  Introduces Max + ε delay time  Negligible in Cluster Networks  Critical in bursty workloads  Data placement restrictions  Lazy-Master, Fully replicated  In Fully-Replicated  Overhead of message exchanges  Not all nodes may have enough place to stores all replicas => Free data placement

12 12 LINA / INRIA – Atlas Group  In the case where all data are not fully replicated, some transactions cannot be executed on target nodes Example: UPDATE r SET c1 WHERE c2 IN (SELECT c3 FROM s); N2 T1(R, S) R 1, S 1 S2S2 R2R2 N3 N1 Partially Replicated Configurations (1)

13 13 LINA / INRIA – Atlas Group  On target nodes, T 1 waits after its selection (Step 3)  At the end of the execution on the origin node, a Refresh Transaction (RT 1 ) is multicast to target nodes (Step 4)  RT 1 is executed to update replicated data R 1, S 1 R2R2 S2S2 N1 N2 N3 Client T 1 (r S, w R ) R 1, S 1 R2R2 S2S2 N1 N2 N3 R 1, S 1 R2R2 S2S2 N1 N2N3 Client Answer T 1 R 1, S 1 R2R2 S2S2 N1 N2 N3 Step 1 R 1, S 1 R2R2 S2S2 N1 N2 N3 Step 2 Step 3Step 4Step 5 T 1 (r S, w R ) Standby RT 1 (w R ) Perform Partially Replicated Configurations (2)

14 14 LINA / INRIA – Atlas Group Data Placement  Tables must have a Primary Key  A node i can not hold primary copies which has Foreign keys of others tables which are not held by node i ITEM, ORDER (On N 3, a order can be done on an item which doesn’t exist) N3N3 N1N1 order N2N2

15 15 LINA / INRIA – Atlas Group Replication Manager Architecture

16 16 LINA / INRIA – Atlas Group Optimization: Eliminating delay times (1)  In a cluster network, messages are naturally totally ordered  Schedule a transaction in parallel with its execution  Submitting a transaction to execution as soon as it is received  Schedule the commit order of the transactions: A transaction can be committed only after Max + ε  Abort and re execute all younger transactions when a transaction is received out of order  Concurrent execution of non conflicting transactions

17 17 LINA / INRIA – Atlas Group Optimization: Eliminating delay times (2) Scheduling Execution T Validation SchedulingValidationExecution T Abort Preventive replication: Optimized Preventive Replication:

18 18 LINA / INRIA – Atlas Group Optimisation Example (3)

19 19 LINA / INRIA – Atlas Group Optimization: Eliminating delay times (4)  Without the optimization, the refreshment time of a transaction T is always delayed by: Max + ε + t  With the optimization, the refreshment time of a transaction T is : Maximum((Max + ε), t), where t is the time spent to execute T

20 20 LINA / INRIA – Atlas Group RepDB* Prototype: Architecture DBMS Clients Replica Interface JDBC server Log Monitor DBMS specific Propagator Receiver Refresher Deliver Network JDBC RepDB*

21 21 LINA / INRIA – Atlas Group RepDB* Prototype: Implementation  Java (around 10000 lines)  DBMS is a black-box  Interface JDBC (RMI-JDBC)  Use of Spread toolkit to manage the network (Center for Networking and Distributed Systems - CNDS)  Simulation version (SimJava)  http://www.sciences.univnantes.fr/ATLAS/RepDB

22 22 LINA / INRIA – Atlas Group Replicas definition (1)  A file contains the replica placement specification: R S T R S

23 23 LINA / INRIA – Atlas Group Interface: Applications / RepDB* (2) Connection c; Statement s; Class.forName(“org.atlas.repdb.jdbc.Driver”); c = DriverManager.getConnection( ” jdbc:repdb://node0:4444/”, ”login”, ”password”); s = c.createStatement(); s.executeUpdate( “ R, S T “ + “UPDATE R SET att2 = 1 WHERE att1 IN “ + “(SELECT att3 FROM T); “+ “UPDATE S SET att2 = 1 WHERE att1 NOT IN “ + “(SELECT att3 FROM T);” ); s.close(); c.close();

24 24 LINA / INRIA – Atlas Group Experiments (1): TPC-C benchmark  1 / 5 / 10 Warehouses  10 clients per Warehouse  Transactions’ arrival rate is 1s / 200ms / 100ms  4 types of transactions:  New-order: Read-Write, high frequency (45%)  Payment: Read-Write, high frequency (45%)  Order-status: Read, low frequency (5%)  Stock-level: Read, low frequency (5%)

25 25 LINA / INRIA – Atlas Group Experiments (2)  Cluster of 64 nodes  PostgreSQL 7.3.2  1 Gb/s network  2 Configurations  Fully Replicated (FR)  Partially Replicated (PR): each type of TPC- C transaction runs using ¼ of the nodes.

26 26 LINA / INRIA – Atlas Group Experiments (3): Scale up a) Fully Replicated (FR)b) Partially Replicated (PR)

27 27 LINA / INRIA – Atlas Group Experiments (4): Speed up + Launch 128 clients that submit Order-status transactions (read-only) a) Fully Replicated (FR)b) Partially Replicated (PR)

28 28 LINA / INRIA – Atlas Group Experiments (5): Unordored messages a) Fully Replicated (FR)b) Partially Replicated (PR)

29 29 LINA / INRIA – Atlas Group Experiments (6): Delay x Trans. size

30 30 LINA / INRIA – Atlas Group Conclusions  Preventive replication  Strong consistency  Prevents conflicts for partially replicated databases  Full node autonomy  Scale and Seeps up  Experiments show the configuration and the placement of the copies should be tuned to selected types of transactions

31 31 LINA / INRIA – Atlas Group Current and Future Work  Preventive Replication for P2P systems  Small and Dynamic multi-master groups  Max is computed dynamically  Small and dynamic slave groups  Optimistic Replication  Distributed Semantic Reconcialiation

32 32 LINA / INRIA – Atlas Group Thanks ! Merci ! Obrigado ! Questions ?


Download ppt "Preventive Replication in Database Cluster Esther Pacitti, Cedric Coulon, Patrick Valduriez, M. Tamer Özsu* LINA / INRIA – Atlas Group University of Nantes."

Similar presentations


Ads by Google