Database Replication. Replication Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software.

Slides:



Advertisements
Similar presentations
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Advertisements

Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Replication Management. Motivations for Replication Performance enhancement Increased availability Fault tolerance.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Transaction Management and Concurrency Control
CS 582 / CMPE 481 Distributed Systems
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
EEC 688/788 Secure and Dependable Computing Lecture 13 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
DBMS Functions Data, Storage, Retrieval, and Update
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
CS 603 Data Replication in Oracle February 27, 2002.
CS 603 Data Replication February 25, Data Replication: Why? Fault Tolerance –Hot backup –Catastrophic failure Performance –Parallelism –Decreased.
Distributed Databases
6.4 Data and File Replication Gang Shen. Why replicate  Performance  Reliability  Resource sharing  Network resource saving.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
SOEN 6011 Software Engineering Processes Section SS Fall 2007 Dr Greg Butler
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Architectural Design of Distributed Applications Chapter 13 Part of Design Analysis Designing Concurrent, Distributed, and Real-Time Applications with.
IM NTU Distributed Information Systems 2004 Replication Management -- 1 Replication Management Yih-Kuen Tsay Dept. of Information Management National Taiwan.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.
XA Transactions.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Fault Tolerance and Replication
Chap 7: Consistency and Replication
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
EEC 688/788 Secure and Dependable Computing Lecture 9 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Enterprise Computing with Jini Technology Mark Stang and Stephen Whinston Jan / Feb 2001, IT Pro presented by Alex Kotchnev.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Distributed Computing Systems Replication Dr. Sunny Jeong. Mr. Colin Zhang With Thanks to Prof. G. Coulouris,
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
6.4 Data and File Replication
Chapter 19: Distributed Databases
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Outline Announcements Fault Tolerance.
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
Active replication for fault tolerance
Replication and Recovery in Distributed Systems
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Distributed Transactions
Lecture 21: Replication Control
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Transactions, Properties of Transactions
Presentation transcript:

Database Replication

Replication Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardward components, to improve reliability, fault tolerance or accessibility.

Database Replication Data Replication It could be data replication if the same data is stored on multiple storage devices or computation replication if the same computing task is executed many times. A computational task is typically replicated in space, i.e. executed on separate devices, or it could be replicated in time, if it is executed repeatedly on a single device.

Database Replication Databaes replication is the creation and maintenance of multiple copies of the same database. In most implementations of database replication, one database server maintains the master copy of the database and additional database servers maintain slave copies of the database. Database writes are sent to the master database server and are then replicated by the slave database servers. Database Replication

Database reads are divided among all of the database servers, which results in a large performance advantage due to load sharing. In addition, database replication can also improve availability because the slave database servers can be configured to take over the master role if the master database server becomes unavailable. Database Reads on Replicated Servers

Database Replication The access to a replicated entity is typically uniform with access to a single, non-replicated entity. The replication itself should be transparent to an external user. Also, in a failure scenario, a failover of replicas is hidden as much as possible. Transparency

Database Replication The isolation property is the most often relaxed ACID property in a DBMS (Database Management System). This is because to maintain the highest level of isolation a DBMS must acquire locks on data, which may result in a loss of concurrency or else implement multiversioning (version control), which may require additional logic to function correctly. Transparency

Database Replication It is common to talk about active and passive replication in systems that replicate data or services. Active replication is performed by processing the same request at every replica. In passive replication, each single request is processed on a single replica and then its state is transferred to the other replicas. If at any time one master replica is designated to process all the requests, then we are talking about the primary-backup scheme (master-slave scheme) predominant in high availalibility clusters. Active vs. Passive Replication

Database Replication On the other side, if any replica processes a request and then distributes a new state, then this is a multi-primary scheme (called multi-master in database field). In the multi-primary scheme, some form of distributed concurrency control must be used, such as distributed lock manager. Multi-Master Replication

Database Replication Load Balancing Load balancing is different from task replication, since it distributes a load of different (not the same) computations across machines, and allows a single computation to be dropped in case of failure. Load balancing, however, sometimes uses data replication (esp. multi-master) internally, to distribute its data among machines.

Database Replication Backup is different from replication, since it saves a copy of data unchanged for a long period of time. Replicas on the other hand are frequently updated and quickly lose any historical state. Backup

Database Replication Whether one replicates data or computation, the objective is to have some group of processes that handle incoming events. If we replicate data, these processes are passive and operate only to maintain the stored data, reply to read requests, and apply updates. When we replicate computation, the usual goal is to provide fault- tolerance. But the underlying needs are the same in both cases: by ensuring that the replicas see the same events in equivalent orders, they stay in consistent states and hence any replica can respond to queries. Replication in distributed systems

Database Replication Replication models in distributed systems A number of widely cited models exist for data replication, each having its own properties and performance: Transactional Replication State Machine Replication Virtual Synchrony Replication Models

Database Replication This is the model for replicating transactional data, for example a database or some other form of transactional storage structure. The one copy serializeability model is employed in this case, which defines legal outcomes of a transaction on replicated data in accordance with ttion on replicated data in accordance with the overall ACID properties that transactional systems seek to guarantee. Transactional Replication

Database Replication This model assumes that replicated process is a deterministic finite state machine and that atomic broadcast of every event is possible. It is based on a distributed computing problem called distributed consensus and has a great deal in common with the transactional replication model. This is sometimes mistakenly used as synonym of active replication.deterministic finite state machineatomic broadcastdistributed consensus State Machine Replication

Database Replication This computational model is used when a group of processes cooperate to replicate in-memory data or to coordinate actions. The model defines a new distributed entity called a process group. A process can join a group, which is much like opening a file: the process is added to the group, but is also provided with a checkpoint containing the current state of the data replicated by group members. Processes can then send events (multicasts) to the group and will see incoming events in the identical order, even if events are sent concurrently. Membership changes are handled as a special kind of platform-generated event that delivers a new membership view to the processes in the group. Virtual Synchronomy

Database Replication Transactional replication is slowest, at least when one-copy serializability guarantees are desired (better performance can be obtained when a database uses log-based replication, but at the cost of possible inconsistencies if a failure causes part of the log to be lost). Virtual synchrony is the fastest of the three models, but the handling of failures is less rigorous than in the transactional model. State machine replication lies somewhere in between; the model is faster than transactions, but much slower than virtual synchrony. Performance Comparison

Database Replication Database replication can be used on many database management systems, usually with a master/slave relationship between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave outputs a message stating that it has received the update successfully, thus allowing the sending (and potentially re-sending until successfully applied) of subsequent updates.database management systems Master/Slave

Database Replication Master/Master replication is where updates can be submitted to any database node, and then ripple through to other servers, is often desired, but introduces substantially increased costs and complexity which may make it impractical in some situations. The most common challenge that exists in multi-master replication is transactional conflict prevention or resolution. Most synchronous or eager replication solutions do conflict prevention, while asynchronous solutions have to do conflict resolution. For instance, if a record is changed on two nodes simultaneously, an eager replication system would detect the conflict before confirming the commit and abort one of the transactions. Master/Master

Database Replication Would allow both transactions to commit and run a conflict resolution during resynchronization. The resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy of the origin nodes or on much more complex logic, which decides consistently on all nodes. Lazy Replication

Database Replication Multi-master replication is a method of replication employed by databases to transfer data or changes to data across multiple computers within a group. Multi-master replication can be contrasted with a master-slave method (also known as single-master replication). The term Multi-master can also be applied to systems in general where a single piece of information can be updated by one of several systems. That is, no one system can be said to own the information and be able to control it consistency and accuracy. Multi-Master Replication

Database Replication : If one master fails, other masters will continue to update the database. Masters can be located in several physical sites i.e. distributed across the network. Benefits of Multi-Master Replication

Database Replication Most multi-master replication systems are only loosely consistent, i.e. lazy and asynchronous, violating ACID properties. Eager replication systems are complex and introduce some communication latency. Issues such as conflict resolution can become intractable as the number of nodes involved rises and the required latency decreases Disadvantages of Multi-Master Replication

Database Replication Log Based: A database transaction log is referenced to capture changes made to the database. For log-based transaction capturing, database changes can only be distributed asynchronously. Trigger Based: Triggers at the subscriber capture changes made to the database and submit them to the publisher. With trigger-based transaction capturing, database changes can be distributed either synchronously or asynchronously. Methods

Database Replication We perform one phase transactions using Java Database Connectivity (JDBC) which is part of the Java Software Development Kit (SDK). We add a JDBC driver to the SDK for a one phase implementation. We do a Class.forName to load the driver. We use a DriverManger.getConnection() to get the connection. We use the connection in processing. All Objects are interfaces implemented by the JDBC driver. Real World Example (For Lab)‏

Database Replication To do an XA (2 phase) transactions we require an implementation of the javax packages. This occurs in an application server and is part of the JEE implementation stack. Your instructor will show a real world example. Two-phase commit Real World Example

Two Phased Commit Server 1 Savings Server 2 Checking Update Savings decrease by $500 Update Checking Increase by $500