Distributed Databases

Slides:



Advertisements
Similar presentations
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Advertisements

Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
Data and Database Administration Chapter 12. Outline What is Concurrency Control? Background Serializability  Locking mechanisms.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Chapter 12 (Online): Distributed Databases
Jan. 2014Dr. Yangjun Chen ACS Database recovery techniques (Ch. 21, 3 rd ed. – Ch. 19, 4 th and 5 th ed. – Ch. 23, 6 th ed.)
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Transaction Management and Concurrency Control
CS 582 / CMPE 481 Distributed Systems
Distributed Database Management Systems
Transaction Management and Concurrency Control
Transaction Management and Concurrency Control
Overview Distributed vs. decentralized Why distributed databases
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
1 © Prentice Hall, 2002 Chapter 13: Distributed Databases Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 13 (Web): Distributed Databases
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
TRANSACTION PROCESSING TECHNIQUES BY SON NGUYEN VIJAY RAO.
Distributed Databases
Distributed Deadlocks and Transaction Recovery.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Session-8 Data Management for Decision Support
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Database Systems Overview
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
Databases Illuminated
 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Distributed Database: Part 2. Distributed DBMS Distributed database requires distributed DBMS Distributed database requires distributed DBMS Functions.
© 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 12 (Online): Distributed Databases Modern Database Management 10 th Edition Jeffrey.
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
Transaction Management Transparencies. ©Pearson Education 2009 Chapter 14 - Objectives Function and importance of transactions. Properties of transactions.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
9 1 Chapter 9_B Concurrency Control Database Systems: Design, Implementation, and Management, Rob and Coronel.
10 1 Chapter 10_B Concurrency Control Database Systems: Design, Implementation, and Management, Rob and Coronel.
Antidio Viguria Ann Krueger A Nonblocking Quorum Consensus Protocol for Replicated Data Divyakant Agrawal and Arthur J. Bernstein Paper Presentation: Dependable.
SHUJAZ IBRAHIM CHAYLASY GNOPHANXAY FIT, KMUTNB JANUARY 05, 2010 Distributed Database Systems | Dr.Nawaporn Wisitpongphan | KMUTNB Based on article by :
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Database Administration: The Complete Guide to Practices and Procedures Chapter 19 Data Movement and Distribution.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 11 Distributed Databases Modern Database Management 9 th Edition Jeffrey A. Hoffer,
Chapter 13 Managing Transactions and Concurrency Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
Distributed Databases
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Database System Implementation CSE 507
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
Distributed Databases
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Distributed Databases Recovery
UNIVERSITAS GUNADARMA
Transactions, Properties of Transactions
Presentation transcript:

Distributed Databases

Outline Distributed DBMS (DBMS) Features Detailed insight of its Objectives

Distributed DBMS (1) A distributed DBMS coordinates the data access at various nodes for a distributed database. Although each site will have a DBMS to manage local database at that site, a distributed DBMS will perform the following: Keep track of data location in a data dictionary. Determine location from which requested data can be retrieved. Determine where each part of a distributed query can be processed

Distributed DBMS (2) If required translate request at one node (using a local DBMS) into proper request to another node (using a different DBMS+data model) and return data to requesting node in its acceptable format. Provide data management functions like security, dead lock control, concurrency, recovery, etc. Provide consistency among data copies. Use of global primary keys. Scalability, replication and transparency. Permit different DBMS on different nodes.

A Popular DDBMS Architecture Distributed/ data repository Database Site n site2 Distributed DBMS Application Programs Local Comm. controller User 1 User 2 User n

DDBMS Architecture Description (1) The architecture shown on previous slide shows a system in which each site has a local DBMS, a copy of DDBMS and associated DD/D or data repository The DD/D has data definitions and info about location of distributed data. User requests for data are first processed by DDBMS which determines whether a local* or global* transaction is involved. DD/D data dictionary/directory Local transaction- a transaction that requires reference only to data that are stored at the site where the transaction originates Global transaction- a transaction that requires a reference to data at one or more non-local sites to satisfy the request.

DDBMS Architecture Description(2) For local transactions, DDBMS passes the local requests to local DBMS. For global transactions, DDBMS routes the request to other sites. In this architecture, it is assumed that copies of distributed DBMS and DD/D exist at each site. In another setup, these may be at a central site, but this setup will be vulnerable to failures. Other strategies are also possible

Key Objectives of DDBMS Location Transparency Replication Transparency Failure Transparency Commit protocol Concurrency Transparency Query Optimization

Location Transparency It implies that User of the data need not to know where the data are physically located. User being unaware of data distribution assumes one single database (physically). Example: User at region A wants to see data of customers whose sales have exceeded $10,000 Select * from customer where total_sales>10000 Now distributed DBMS at local site (region A) consults the distributed DD/D and determines that this request has to be routed to Region C. On display of results to user, it appears as if data were locally retrieved (unless there is long comm. delay)

Data Distribution Example RegionA RegionB RegionC RegionA parts RegionB parts RegionC parts Account ing Customers Engg Parts PriceList PriceList PriceList

Replication Transparency(1) Although a given data item may be replicated at several nodes in the network, with replication transparency, a user may treat it as if it were a single item at a single node. Also called fragmentation transparency. Example: User wants to view the Price List file. An identical copy of this file exists at all three nodes (three regions) – case of full replication. Now distributed DBMS will consult DD/D and determine that this is a local transaction. User need not be aware that same data is stored at other sites.

Replication Transparency(2) In case requested data are not at the local site, the distributed DBMS decides the optimum route to the target site. Consider another situation when one or more users try to update replicated data. user at region C wants to change price of a part. This update must be reflected at all replicated sites for data integrity. With replication transparency this is easily handled.

Failure Transparency (1) Each site in a distributed environment is prone to same type of failures as in centralized systems (like disk failure, erroneous data,etc.) An additional risk is communication link failure A robust system must be able to Detect failure Reconfigure system to continue computation Recover when processor or link is repaired The 1st & 2nd tasks are jobs of comm. controller or processor and 3rd one is job of DDBMS DDBMS has a software module – Transaction Manager

Failure Transparency (2) Transaction Manager: Maintain a log of transactions and “before & after images” of database Maintain concurrency control scheme to ensure data integrity during parallel execution of transactions at that site For global transactions, transaction managers at each participating site cooperate to ensure all update operations are synchronized Example A person at site1 wants to change a part’s price in the StandardPriceList (copies are present at three different sites) This global transaction must update every copy of record for that part at the three sites Suppose, site1 and site2 are successfully updated but due to transmission failure, update does not occur at site3 With failure transparency, either all actions of a transaction are committed or none of them are committed

Commit protocol Commit Protocol Two-phase Commit (an algorithm) Ensures data integrity for distributed update operations, the cooperating transaction managers execute a commit protocol Ensures that a global transaction is successfully completed at all sites or else aborted Two-phase Commit (an algorithm) Most widely used protocol Coordinates updates in a distributed environment Ensures that concurrent transactions at multiple sites are processed as though they were executes in same serial order at all sites How does it work?

How does the protocol work?(1) Site originating the global transaction sends a request to each of the sites that will process some portion of the transaction. Each site processes sub-transaction but does not commit the result to database (holds in temporary file). Each site locks its portion of database being updated. Each site notifies originating site about completion of sub-transaction. When all sites have responded, originating site initiates two-phase commit protocol Preparation Phase Final Commit Phase

How does the protocol work?(2) Preparation Phase Message is broadcasted to every participating sites, asking whether it is willing to commit its portion of transaction at that site Each site returns “OK” or “not OK” Originating site collects all messages.

How does the protocol work?(3) Final Commit Phase If all are “OK”, it broadcasts a message to all sites to commit the portion of transaction handled at each site If one or more sites respond with “not OK”, it broadcasts a message to all sites to abort the transaction A transaction can fail during the commit phase Such transaction will be in limbo

How does the protocol work?(4) A limbo transaction can be identified by polling or by a timeout. With a timeout (no confirmation of commit for a specified time period) its not possible to distinguish between a failed or a busy site. Polling is expensive in terms of network load and processing time In case a commit confirmation is not received from one or more sites, the originating sites forces all sites to undo changes by abort message

Improvement Strategies for two-phase commit Two-phase commit protocol is slow due to delays caused by extensive coordination among sites. Some improvements develop –ed are: Read-only optimization This approach identifies read-only portions of a transaction and eliminates need of confirmation messages on them, e.g., a transaction may include reading of data before inserting new data. (inventory balance check before creating new order, so data read can happen without callback confirmation) Lazy commit optimization This approach allows those sites which can update to proceed to update and the ones which can not update are allowed to “catch up” later Linear commit optimization This approach allows each part of transaction (sub-transaction) to be committed in sequence rather than holding up a whole transaction when sub-transaction parts are delayed from being processed

Concurrency Transparency Concurrency control When multiple users access and update a database, data integrity may be lost unless locking mechanisms are used to protect data from effects of concurrent updates Concurrency control is more complex in a distributed database as users are geographically distributed at sites Data are often replicated at several sites Concurrency Transparency Allows a distributed system to run many transactions Allows each transaction to appear as if it were the only activity in the system When several transactions are processed concurrently, results must be same as if each transaction were processed in serial order Transaction managers at each site must cooperate to provide concurrency control in distributed Database

Concurrency Control Approaches 1st approach - Locking Data retrieved by a user for updating must be locked or denied to other users until that update is completed or aborted. Transactions can place locks on data resources. Locks are implemented at the following levels Database Table Block or Page Record level Field level Lock Types Shared locks (S lock or read lock) Exclusive locks (X lock or write lock) Details on page 474 (Hoffer - 6/e)

Concurrency Control Approaches 2nd approach – Versioning Uses optimistic approach (opposite to pessimistic locking approach) that most of the time other users do not want the same record Each transaction is restricted to a view of database as of the time that transaction started and when a transaction modifies a record, the DBMS creates a new record version instead of overwriting the old record. When there is no conflict (only one user transaction changes database records) the changes are merged directly to database If two users make conflicting changes, changes made by one of the users are committed to database – usually the earlier time stamped transaction gets priority. Other user is told about the conflict.

Concurrency Control Approaches 3rd approach - Timestamping Every transaction gets a globally unique timpestamp (clock time when transaction occurred) Ensures that even if two simultaneous events/transactions occur at different sites, each will get a unique timestamp Timestamp ensures serial order processing of transactions Every record carries timestamp of transaction that last updated it. A transaction can not process a record until its timestamp is later than that carried in the record If a transaction timestamp is earlier than that carried in record, it is assigned a new timestamp and restarted

Query Optimization The decision of DDBMS of how to process a query is affected by the way user formulates a query and intelligence of the distributed DBMS to develop a sensible plan A distributed DBMS typically uses the following 3 steps to develop a query processing plan Query decomposition Query is simplified and rewritten into structured relational algebra form Data localization Query is transformed from a query referencing data across the network into one or more fragments which each explicitly reference data at only one site Global optimization In this final step, decisions are made about the order in which to execute query fragments, where to move data between sites and where parts of the query will be executed

Query Optimization One of the techniques of processing a distributed query efficiently is to use semijoin operation. In Semijoin, only the joining attribute is sent to sent from one site to another rather than sending all selected attributes from every qualified row. After that then only the required rows are returned Hence there is minimal transfer of data over the network. A DDBMS also uses a cost model to predict execution time of alternative execution plans.

Reading assignment Evolution of DDBMS

That’s it for today…!