Presentation on theme: "Types of Distributed Database Systems"— Presentation transcript:
1Types of Distributed Database Systems HomogeneousAll sites of the database system have identical setup, i.e., same database system software.The underlying operating system may be different.For example, all sites run Oracle or DB2, or Sybase or some other database system.The underlying operating systems can be a mixture of Linux, Window, Unix, etc.
2Types of Distributed Database Systems HeterogeneousFederated: Each site may run different database system but the data access is managed through a single conceptual schema.This implies that the degree of local autonomy is minimum. Each site must adhere to a centralized access policy. There may be a global schema.Multidatabase: There is no one conceptual global schema. For data access a schema is constructed dynamically as needed by the application software.
3Types of Distributed Database Systems Federated Database Management Systems IssuesDifferences in data models:Relational, Objected oriented, hierarchical, network, etc.Differences in constraints:Each site may have their own data accessing and processing constraints.Differences in query language:Some site may use SQL, some may use SQL-89, some may use SQL-92, and so on.
4Concurrency Control and Recovery Distributed Databases encounter a number of concurrency control and recovery problems which are not present in centralized databases. Some of them are listed below.Dealing with multiple copies of data itemsFailure of individual sitesCommunication link failureDistributed commitDistributed deadlock
5Concurrency Control and Recovery DetailsDealing with multiple copies of data items:The concurrency control must maintain global consistency. Likewise the recovery mechanism must recover all copies and maintain consistency after recovery.Failure of individual sites:Database availability must not be affected due to the failure of one or two sites and the recovery scheme must recover them before they are available for use.
6Concurrency Control and Recovery Details (contd.)Communication link failure:This failure may create network partition which would affect database availability even though all database sites may be running.Distributed commit:A transaction may be fragmented and they may be executed by a number of sites. This require a two or three-phase commit approach for transaction commit.Distributed deadlock:Since transactions are processed at multiple sites, two or more sites may get involved in deadlock. This must be resolved in a distributed manner.
7Concurrency Control in Distributed Databases Single-Lock-Manager ApproachDistributed Lock ManagerPrimary copyMajority protocolBiased protocolQuorum consensus
8Single-Lock-Manager Approach System maintains a single lock manager that resides in a single chosen site, say Si (Primary Site Technique)When a transaction needs to lock a data item, it sends a lock request to Si and lock manager determines whether the lock can be granted immediatelyIf yes, lock manager sends a message to the site which initiated the requestIf no, request is delayed until it can be granted, at which time a message is sent to the initiating site
9Single-Lock-Manager Approach (Cont.) The transaction can read the data item from any one of the sites at which a replica of the data item resides.Writes must be performed on all replicas of a data itemAdvantages of scheme:Simple implementationSimple deadlock handlingDisadvantages of scheme are:Bottleneck: lock manager site becomes a bottleneckVulnerability: system is vulnerable to lock manager site failure.
10Distributed Lock Manager In this approach, functionality of locking is implemented by lock managers at each siteLock managers control access to local data itemsBut special protocols may be used for replicasAdvantage: work is distributed and can be made robust to failuresDisadvantage: deadlock detection is more complicatedLock managers cooperate for deadlock detectionSeveral variants of this approachPrimary copyMajority protocolBiased protocolQuorum consensus
11Primary Copy Choose one replica of data item to be the primary copy. Site containing the replica is called the primary site for that data itemDifferent data items can have different primary sitesWhen a transaction needs to lock a data item Q, it requests a lock at the primary site of Q.Implicitly gets lock on all replicas of the data itemBenefitConcurrency control for replicated data handled similarly to unreplicated data - simple implementation.DrawbackIf the primary site of Q fails, Q is inaccessible even though other sites containing a replica may be accessible.
12Majority ProtocolLocal lock manager at each site administers lock and unlock requests for data items stored at that site.When a transaction wishes to lock an unreplicated data item Q residing at site Si, a message is sent to Si ‘s lock manager.If Q is locked in an incompatible mode, then the request is delayed until it can be granted.When the lock request can be granted, the lock manager sends a message back to the initiator indicating that the lock request has been granted.
13Majority Protocol (Cont.) In case of replicated dataIf Q is replicated at n sites, then a lock request message must be sent to more than half of the n sites in which Q is stored.The transaction does not operate on Q until it has obtained a lock on a majority of the replicas of Q.When writing the data item, transaction performs writes on all replicas.BenefitCan be used even when some sites are unavailableDrawbackRequires 2(n/2 + 1) messages for handling lock requests, and (n/2 + 1) messages for handling unlock requests.Potential for deadlock even with single item - e.g., each of 3 transactions may have locks on 1/3rd of the replicas of a data.
14Biased ProtocolLocal lock manager at each site as in majority protocol, however, requests for shared locks are handled differently than requests for exclusive locks.Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site containing a replica of Q.Exclusive locks. When transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites containing a replica of Q.Advantage - imposes less overhead on read operations.Disadvantage - additional overhead on writes
15Quorum Consensus Protocol A generalization of both majority and biased protocolsEach site is assigned a weight.Let S be the total of all site weightsChoose two values read quorum Qr and write quorum QwSuch that Qr + Qw > S and 2 * Qw > SQuorums can be chosen (and S computed) separately for each itemEach read must lock enough replicas that the sum of the site weights is >= QrEach write must lock enough replicas that the sum of the site weights is >= Qw
16TimestampingTimestamp based concurrency-control protocols can be used in distributed systemsEach transaction must be given a unique timestampMain problem: how to generate a timestamp in a distributed fashionEach site generates a unique local timestamp using either a logical counter or the local clock.Global unique timestamp is obtained by concatenating the unique local timestamp with the unique identifier.
17Timestamping (Cont.)A site with a slow clock will assign smaller timestampsStill logically correct: serializability not affectedBut: “disadvantages” transactionsTo fix this problemDefine within each site Si a logical clock (LCi), which generates the unique local timestampRequire that Si advance its logical clock whenever a request is received from a transaction Ti with timestamp < x,y> and x is greater that the current value of LCi.In this case, site Si advances its logical clock to the value x + 1.
18Replication with Weak Consistency Many commercial databases support replication of data with weak degrees of consistency (I.e., without a guarantee of serializabiliy)E.g.: master-slave replication: updates are performed at a single “master” site, and propagated to “slave” sites.Propagation is not part of the update transaction: its is decoupledMay be immediately after transaction commitsMay be periodicData may only be read at slave sites, not updatedNo need to obtain locks at any remote siteParticularly useful for distributing informationE.g. from central office to branch-officeAlso useful for running read-only queries offline from the main database
19Replication with Weak Consistency (Cont.) Replicas should see a transaction-consistent snapshot of the databaseThat is, a state of the database reflecting all effects of all transactions up to some point in the serialization order, and no effects of any later transactions.E.g. Oracle provides a create snapshot statement to create a snapshot of a relation or a set of relations at a remote sitesnapshot refresh either by recomputation or by incremental updateAutomatic refresh (continuous or periodic) or manual refresh
20Multimaster and Lazy Replication With multimaster replication (also called update-anywhere replication) updates are permitted at any replica, and are automatically propagated to all replicasBasic model in distributed databases, where transactions are unaware of the details of replication, and database system propagates updates as part of the same transactionCoupled with 2 phase commitMany systems support lazy propagation where updates are transmitted after transaction commitsAllows updates to occur even if some sites are disconnected from the network, but at the cost of consistency
21Concurrency Control and Recovery Distributed Concurrency control based on a distributed copy of a data itemPrimary site technique: A single site is designated as a primary site which serves as a coordinator for transaction management.
22Concurrency Control and Recovery Transaction management:Concurrency control and commit are managed by this site.In two phase locking, this site manages locking and releasing data items. If all transactions follow two-phase policy at all sites, then serializability is guaranteed.
23Recovery in a Distributed Database Single Lock Manager Approach: (Primary Site Approach)All transaction management activities go to primary site which is likely to overload the site.If the primary site fails, the entire system is inaccessible.To aid recovery a backup site is designated which behaves as a shadow of primary site.In case of primary site failure, backup site can act as primary site.
24Concurrency Control and Recovery Primary Copy Technique:In this approach, instead of a site, a data item partition is designated as primary copy. To lock a data item just the primary copy of the data item is locked.Advantages:Since primary copies are distributed at various sites, a single site is not overloaded with locking and unlocking requests.Disadvantages:Identification of a primary copy is complex. A distributed directory must be maintained, possibly at all sites.
25Recovery in a Distributed Database Recovery from a coordinator failureIn both approaches a coordinator site or copy may become unavailable. This will require the selection of a new coordinator.Primary site approach with no backup site:Aborts and restarts all active transactions at all sites. Elects a new coordinator and initiates transaction processing.Primary site approach with backup site:Suspends all active transactions, designates the backup site as the primary site and identifies a new back up site. Primary site receives all transaction management information to resume processing.Primary and backup sites fail or no backup site:Use election process to select a new coordinator site.
26Concurrency Control and Recovery Concurrency control based on voting:There is no primary copy of coordinator.Send lock request to sites that have data item.If majority of sites grant lock then the requesting transaction gets the data item.Locking information (grant or denied) is sent to all these sites.To avoid unacceptably long wait, a time-out period is defined. If the requesting transaction does not get any vote information then the transaction is aborted.
27Client-Server Database Architecture It consists of clients running client software, a set of servers which provide all database functionalities and a reliable communication infrastructure.
28Client-Server Database Architecture Clients reach server for desired service, but server does reach clients.The server software is responsible for local data management at a site, much like centralized DBMS software.The client software is responsible for most of the distribution function.The communication software manages communication among clients and servers.
29Client-Server Database Architecture The processing of a SQL queries goes as follows:Client parses a user query and decomposes it into a number of independent sub-queries. Each subquery is sent to appropriate site for execution.Each server processes its query and sends the result to the client.The client combines the results of subqueries and produces the final result.
30Recap Distributed Database Concepts Data Fragmentation, Replication and AllocationTypes of Distributed Database SystemsQuery ProcessingConcurrency Control and Recovery3-Tier Client-Server Architecture