Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.

Introduction to Distributed Databases Yiwei Wu

Introduction A distributed database is a database in which portions of the database are stored on multiple computers within a network. Centralized DB Distributed DB

Introduction – Cont. Advantages: Reflects organizational structure Local autonomy Improved availability Improved performance Economics Modularity Disadvantages: Complexity Economics Security Difficult to maintain integrity Inexperience

Types of DDBS Homogeneous Uses one DBMS for all the servers in the system(eg: Oracle or MS-SQL ). Heterogeneous Uses two or more different DBMS's for different database servers(eg: Oracle and MS-SQL and postgresql).

Data Fragmentation Horizontal fragments subsets of tuples (rows) from a relation (table). Vertical fragments subsets of attributes (columns) from a relation (table). Mixed fragment a fragment which is both horizontally and vertically fragmented.

Replication fully replication the whole database is replicated at every site in the distributed system no replication each fragment is stored at exactly one site partial replication some fragments of the database may be replicated whereas others may not

Query Processing Site1 10,000 records, 100 bytes each R(Employee)=(Fname, Lname, SSN, ….. Dno) Site2 100 records, 35 bytes each R(Department)=(Dnumber, Dname,….) Q: Site 1 Employee Site 2 Department Site 3 Result

Distributed Query Transfer Employee to site3 Transfer Department to site3 Perform join at site3 Cost: 1,000,000+3500 = 1,003,500 bytes

Semijoin The idea of using the semijoin operation is to reduce the number of tuples in a relation before transferring it to another site. Project the join attribute of Department at site 2 and transfer to site1. Cost = 4*100 Join with the employee at site 1, and transfer back to site3 Cost = 34*10,000 Total Cost = 340,400 bytes

Transaction Two phase commit protocol: Phase 1: Obtaining a Decision Coordinator asks all participants to prepare to commit transaction Ti. Ci adds the records to the log and forces log to stable storage sends messages to all sites at which T executed Upon receiving message, transaction manager at site determines if it can commit the transaction if not, add a record to the log and send abort T message to Ci if the transaction can be committed, then: add the record to the log force all records for T to stable storage send ready T message to Ci

Two phase commit protocol–Cont. Phase 2: Recording the Decision T can be committed of Ci received a ready T message from all the participating sites: otherwise T must be aborted. Coordinator adds a decision record, or, to the log and forces record onto stable storage. Once the record stable storage it is irrevocable (even if failures occur) Coordinator sends a message to each participant informing it of the decision (commit or abort) Participants take appropriate action locally.

Concurrency Control – algorithms Pessimistic synchronize the execution of user requests before the transaction starts E.g. Two-phase locking protocol, Timestamp ordering protocol Optimistic execute the requests and then perform a validation check to ensure that the execution has not compromised the consistency of the database E.g. Locking based and Timestamp ordering based

Concurrency Control –Replication primary site technique -- it is a simple extension of the centralized locking approach. primary site with backup site -- All locking information is maintained at both the primary and the backup sites primary copy technique -- Failure of one site only affects any transactions that are accessing locks on items whose primary copies reside at that site, but other transactions are not affected.

Deadlock Handling Centralized Approach: A global wait-for graph is constructed and maintained in a single site which is the deadlock-detection coordinator. Local wait-for graph Global wait-for graph

Recovery it is quite difficult to determine whether a site is down without exchanging numerous messages with other sites. When a transaction is updating data at several sites, it cannot commit until it is sure that the effect of the transaction on every site cannot be lost. The two-phase commit protocol is often used to ensure the correctness of distributed commit.

3-tier Client-Server Architecture The first, or presentation tier, (the client or front-end), deals with the interaction with the user. The second, processes the requests of all clients. The third or database tier contains the database management system that manages all persistent data.

3-tier Architecture – Cont.

Summary Distributed DBMS offer site autonomy and distributed administration. Must revisit storage techniques, concurrency control, and recovery issues

Thank You

Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.

Similar presentations

Presentation on theme: "Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.

Similar presentations

Presentation on theme: "Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple."— Presentation transcript:

Similar presentations

About project

Feedback