Types of Distributed Database Systems

Slides:



Advertisements
Similar presentations
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Advertisements

Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Distributed Locking. Distributed Locking (No Replication) Assumptions Lock tables are managed by individual sites. The component of a transaction at a.
What we will cover…  Distributed Coordination 1-1.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Chapter 25 Distributed Databases and Client-Server Architectures Copyright © 2004 Pearson Education, Inc.
Transaction Management and Concurrency Control
CS 582 / CMPE 481 Distributed Systems
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Overview Distributed vs. decentralized Why distributed databases
Lecture-12 Concurrency Control in Distributed Databases
Manajemen Basis Data Pertemuan 10 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Chapter 18.2: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter.
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Concurrency Control in Distributed Databases
Alexandria Dodd Janelle Toungett
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases
04/20/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Distributed Deadlocks and Transaction Recovery.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
III. Current Trends: 2 - Distributed DBMSsSlide 1/47 III. Current Trends Distributed DBMSs: Advanced Concepts 3C13/D63C13/D6.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts 1 Chapter 19: Distributed Databases Heterogeneous and Homogeneous Databases Distributed.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Distributed Database Systems Overview
DISTRIBUTED COMPUTING
Concurrency Control in Distributed Databases Gul Sabah Arif.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Replicated Databases. Reading Textbook: Ch.13 Textbook: Ch.13 FarkasCSCE Spring
A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 7 Part 1. Distributed Databases Part 2. IrisNet Query Processing.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
Databases Illuminated
Distributed database system
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Distributed Transaction Management. Outline Introduction Concurrency Control Protocols  Locking  Timestamping Deadlock Handling Replication.
1 Transaction System Processes. 2 Data Servers Used in LANs, where there is a very high speed connection between the clients and the server, the client.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
Highly Available Services and Transactions with Replicated Data Jason Lenthe.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Distributed Databases
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 22: Distributed.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Distributed Databases and Client-Server Architectures
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Lecture 17- Distributed Databases (continued)
Parallel and Distributed Databases
Chapter 19: Distributed Databases
Outline Announcements Fault Tolerance.
CSIS 7102 Spring 2004 Lecture 6: Distributed databases
Distributed Database Systems
Chapter 10 Transaction Management and Concurrency Control
Distributed Databases
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Database Architecture
Distributed Database Management Systems
Presentation transcript:

Types of Distributed Database Systems Homogeneous All sites of the database system have identical setup, i.e., same database system software. The underlying operating system may be different. For example, all sites run Oracle or DB2, or Sybase or some other database system. The underlying operating systems can be a mixture of Linux, Window, Unix, etc.

Types of Distributed Database Systems Heterogeneous Federated: Each site may run different database system but the data access is managed through a single conceptual schema. This implies that the degree of local autonomy is minimum. Each site must adhere to a centralized access policy. There may be a global schema. Multidatabase: There is no one conceptual global schema. For data access a schema is constructed dynamically as needed by the application software.

Types of Distributed Database Systems Federated Database Management Systems Issues Differences in data models: Relational, Objected oriented, hierarchical, network, etc. Differences in constraints: Each site may have their own data accessing and processing constraints. Differences in query language: Some site may use SQL, some may use SQL-89, some may use SQL-92, and so on.

Concurrency Control and Recovery Distributed Databases encounter a number of concurrency control and recovery problems which are not present in centralized databases. Some of them are listed below. Dealing with multiple copies of data items Failure of individual sites Communication link failure Distributed commit Distributed deadlock

Concurrency Control and Recovery Details Dealing with multiple copies of data items: The concurrency control must maintain global consistency. Likewise the recovery mechanism must recover all copies and maintain consistency after recovery. Failure of individual sites: Database availability must not be affected due to the failure of one or two sites and the recovery scheme must recover them before they are available for use.

Concurrency Control and Recovery Details (contd.) Communication link failure: This failure may create network partition which would affect database availability even though all database sites may be running. Distributed commit: A transaction may be fragmented and they may be executed by a number of sites. This require a two or three-phase commit approach for transaction commit. Distributed deadlock: Since transactions are processed at multiple sites, two or more sites may get involved in deadlock. This must be resolved in a distributed manner.

Concurrency Control in Distributed Databases Single-Lock-Manager Approach Distributed Lock Manager Primary copy Majority protocol Biased protocol Quorum consensus

Single-Lock-Manager Approach System maintains a single lock manager that resides in a single chosen site, say Si (Primary Site Technique) When a transaction needs to lock a data item, it sends a lock request to Si and lock manager determines whether the lock can be granted immediately If yes, lock manager sends a message to the site which initiated the request If no, request is delayed until it can be granted, at which time a message is sent to the initiating site

Single-Lock-Manager Approach (Cont.) The transaction can read the data item from any one of the sites at which a replica of the data item resides. Writes must be performed on all replicas of a data item Advantages of scheme: Simple implementation Simple deadlock handling Disadvantages of scheme are: Bottleneck: lock manager site becomes a bottleneck Vulnerability: system is vulnerable to lock manager site failure.

Distributed Lock Manager In this approach, functionality of locking is implemented by lock managers at each site Lock managers control access to local data items But special protocols may be used for replicas Advantage: work is distributed and can be made robust to failures Disadvantage: deadlock detection is more complicated Lock managers cooperate for deadlock detection Several variants of this approach Primary copy Majority protocol Biased protocol Quorum consensus

Primary Copy Choose one replica of data item to be the primary copy. Site containing the replica is called the primary site for that data item Different data items can have different primary sites When a transaction needs to lock a data item Q, it requests a lock at the primary site of Q. Implicitly gets lock on all replicas of the data item Benefit Concurrency control for replicated data handled similarly to unreplicated data - simple implementation. Drawback If the primary site of Q fails, Q is inaccessible even though other sites containing a replica may be accessible.

Majority Protocol Local lock manager at each site administers lock and unlock requests for data items stored at that site. When a transaction wishes to lock an unreplicated data item Q residing at site Si, a message is sent to Si ‘s lock manager. If Q is locked in an incompatible mode, then the request is delayed until it can be granted. When the lock request can be granted, the lock manager sends a message back to the initiator indicating that the lock request has been granted.

Majority Protocol (Cont.) In case of replicated data If Q is replicated at n sites, then a lock request message must be sent to more than half of the n sites in which Q is stored. The transaction does not operate on Q until it has obtained a lock on a majority of the replicas of Q. When writing the data item, transaction performs writes on all replicas. Benefit Can be used even when some sites are unavailable Drawback Requires 2(n/2 + 1) messages for handling lock requests, and (n/2 + 1) messages for handling unlock requests. Potential for deadlock even with single item - e.g., each of 3 transactions may have locks on 1/3rd of the replicas of a data.

Biased Protocol Local lock manager at each site as in majority protocol, however, requests for shared locks are handled differently than requests for exclusive locks. Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site containing a replica of Q. Exclusive locks. When transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites containing a replica of Q. Advantage - imposes less overhead on read operations. Disadvantage - additional overhead on writes

Quorum Consensus Protocol A generalization of both majority and biased protocols Each site is assigned a weight. Let S be the total of all site weights Choose two values read quorum Qr and write quorum Qw Such that Qr + Qw > S and 2 * Qw > S Quorums can be chosen (and S computed) separately for each item Each read must lock enough replicas that the sum of the site weights is >= Qr Each write must lock enough replicas that the sum of the site weights is >= Qw

Timestamping Timestamp based concurrency-control protocols can be used in distributed systems Each transaction must be given a unique timestamp Main problem: how to generate a timestamp in a distributed fashion Each site generates a unique local timestamp using either a logical counter or the local clock. Global unique timestamp is obtained by concatenating the unique local timestamp with the unique identifier.

Timestamping (Cont.) A site with a slow clock will assign smaller timestamps Still logically correct: serializability not affected But: “disadvantages” transactions To fix this problem Define within each site Si a logical clock (LCi), which generates the unique local timestamp Require that Si advance its logical clock whenever a request is received from a transaction Ti with timestamp < x,y> and x is greater that the current value of LCi. In this case, site Si advances its logical clock to the value x + 1.

Replication with Weak Consistency Many commercial databases support replication of data with weak degrees of consistency (I.e., without a guarantee of serializabiliy) E.g.: master-slave replication: updates are performed at a single “master” site, and propagated to “slave” sites. Propagation is not part of the update transaction: its is decoupled May be immediately after transaction commits May be periodic Data may only be read at slave sites, not updated No need to obtain locks at any remote site Particularly useful for distributing information E.g. from central office to branch-office Also useful for running read-only queries offline from the main database

Replication with Weak Consistency (Cont.) Replicas should see a transaction-consistent snapshot of the database That is, a state of the database reflecting all effects of all transactions up to some point in the serialization order, and no effects of any later transactions. E.g. Oracle provides a create snapshot statement to create a snapshot of a relation or a set of relations at a remote site snapshot refresh either by recomputation or by incremental update Automatic refresh (continuous or periodic) or manual refresh

Multimaster and Lazy Replication With multimaster replication (also called update-anywhere replication) updates are permitted at any replica, and are automatically propagated to all replicas Basic model in distributed databases, where transactions are unaware of the details of replication, and database system propagates updates as part of the same transaction Coupled with 2 phase commit Many systems support lazy propagation where updates are transmitted after transaction commits Allows updates to occur even if some sites are disconnected from the network, but at the cost of consistency

Concurrency Control and Recovery Distributed Concurrency control based on a distributed copy of a data item Primary site technique: A single site is designated as a primary site which serves as a coordinator for transaction management.

Concurrency Control and Recovery Transaction management: Concurrency control and commit are managed by this site. In two phase locking, this site manages locking and releasing data items. If all transactions follow two-phase policy at all sites, then serializability is guaranteed.

Recovery in a Distributed Database Single Lock Manager Approach: (Primary Site Approach) All transaction management activities go to primary site which is likely to overload the site. If the primary site fails, the entire system is inaccessible. To aid recovery a backup site is designated which behaves as a shadow of primary site. In case of primary site failure, backup site can act as primary site.

Concurrency Control and Recovery Primary Copy Technique: In this approach, instead of a site, a data item partition is designated as primary copy. To lock a data item just the primary copy of the data item is locked. Advantages: Since primary copies are distributed at various sites, a single site is not overloaded with locking and unlocking requests. Disadvantages: Identification of a primary copy is complex. A distributed directory must be maintained, possibly at all sites.

Recovery in a Distributed Database Recovery from a coordinator failure In both approaches a coordinator site or copy may become unavailable. This will require the selection of a new coordinator. Primary site approach with no backup site: Aborts and restarts all active transactions at all sites. Elects a new coordinator and initiates transaction processing. Primary site approach with backup site: Suspends all active transactions, designates the backup site as the primary site and identifies a new back up site. Primary site receives all transaction management information to resume processing. Primary and backup sites fail or no backup site: Use election process to select a new coordinator site.

Concurrency Control and Recovery Concurrency control based on voting: There is no primary copy of coordinator. Send lock request to sites that have data item. If majority of sites grant lock then the requesting transaction gets the data item. Locking information (grant or denied) is sent to all these sites. To avoid unacceptably long wait, a time-out period is defined. If the requesting transaction does not get any vote information then the transaction is aborted.

Client-Server Database Architecture It consists of clients running client software, a set of servers which provide all database functionalities and a reliable communication infrastructure.

Client-Server Database Architecture Clients reach server for desired service, but server does reach clients. The server software is responsible for local data management at a site, much like centralized DBMS software. The client software is responsible for most of the distribution function. The communication software manages communication among clients and servers.

Client-Server Database Architecture The processing of a SQL queries goes as follows: Client parses a user query and decomposes it into a number of independent sub-queries. Each subquery is sent to appropriate site for execution. Each server processes its query and sends the result to the client. The client combines the results of subqueries and produces the final result.

Recap Distributed Database Concepts Data Fragmentation, Replication and Allocation Types of Distributed Database Systems Query Processing Concurrency Control and Recovery 3-Tier Client-Server Architecture