Download presentation
Presentation is loading. Please wait.
Published byNelson Blair Modified over 9 years ago
1
Distributed Databases John Ortiz
2
Lecture 24Distributed Databases2 Distributed Database (DDB) is a collection of interrelated databases interconnected by a computer network Distributed Database Management System (DDBMS) is software which manages a distributed database World Wide Web technology does not yet constitute a DDB by our definition
3
Lecture 24Distributed Databases3 Advantages of a DDB Supports various levels of transparency Distribution (network) transparency Degree to which user is unaware of the networked nature of the DB Replication transparency Degree to which user is unaware of copies of the DB Fragmentation transparency Degree to which user is unaware the DB is broken into pieces
4
Lecture 24Distributed Databases4 Advantages of a DDB Increased Reliability and Availability Reliability – probability a system is running at a particular point in time Availability – probability a system is continuously available during a time interval
5
Lecture 24Distributed Databases5 Advantages of a DDB Improved Performance Supports data localization – data is kept near where it is most often used to reduce affects of network delay Easier Expansion Adding more data, increasing DB size, adding resources is easier Reduced Operation Costs (when considering a mainframe system) cheaper to add workstations than a new mainframe computer
6
Lecture 24Distributed Databases6 Advantages of a DDB No Single Point of Failure When one computer fails, others can take its place
7
Lecture 24Distributed Databases7 Disadvantages of a DDB Significant increase in complexity Normalization, query optimization, security, transaction processing, concurrency control, crash recovery, etc. ALL become much more difficult to handle Increased storage requirements Since multiple copies of various portions of the DB exist, more storage space is required
8
Lecture 24Distributed Databases8 Data Fragmentation Fragmentation is the division of the database into pieces stored at different sites Horizontal Fragmentation – a subset of tuples in a particular relation the result of a query which SELECTS some tuples, but not others produces a horizontal “fragment” In a DDB, the output from the previous query may be stored as a separate DB at a separate site Requires a UNION to recombine information
9
Lecture 24Distributed Databases9 Data Fragmentation Vertical Fragmentation – a subset of attributes of a particular relation The result of a query which PROJECTS certain, specific attributes Requires an outer join (or an outer union) to recombine information Hybrid Fragmentation – can you guess? Includes both horizontal and vertical fragmentation Complete fragmentation simply means all tuples/attributes are in the result A fragmentation schema
10
Lecture 24Distributed Databases10 Data Fragmentation A fragmentation schema is a definition of the set of fragments that includes all attributes and tuples sufficient to reconstruct the DB An allocation schema describes which fragments are at what sites
11
Lecture 24Distributed Databases11 Data Replication Replication is the creation of copies of the DB A DDB may be fully replicated (a copy of the entire DB is made at each site) Why would you want to make a full copy of a DDB? A DDB may have no replication (each fragment is stored at one and only one site) Naturally, a DDB may be partially replicated A replication schema is a description of what pieces are copied at which sites
12
Lecture 24Distributed Databases12 Data Replication Replication creates new consistency and redundancy problems Every piece of data that is replicated is redundant, and therefore subject to be inconsistent These copies may be updated separately which causes inconsistency How much inconsistency acceptable?
13
Lecture 24Distributed Databases13 Synchronization Synchronization is the process of of updating the individual replicas Since pieces are stored in different places, the DDB must periodically be made consistent Synchronization can be expensive in terms of network resources and time It is not simply copying one replica to another – most recent updates on both copies being synchronized must be accounted for P.775 - 778 in the text has an example of a DDB
14
Lecture 24Distributed Databases14 US Air Force Email We have noted in the past that there are many types of databases such as spreadsheets, address books, and even documents (such as MS Word) Consider the AF with approximately 500,000 people who all have email addresses and need to communicate They have constructed a global email address book and make use of replication The AF is divided into levels: global, command, base
15
Lecture 24Distributed Databases15 US Air Force Email Initially the bases were each set up with email and interconnected via the network However, you had to know the email address of anyone at a different base Eventually, each command (a group of related bases) set up an address book consisting of all the bases Each base maintains a complete replica of the entire commands address book Why not just a piece?
16
Lecture 24Distributed Databases16 US Air Force Email The DB is synchronized each night So, when someone moves, their email address is removed from the local copy All the other bases will still have that “old” email address until the next day, at which point the DDB is consistent again I believe that now the entire AF address book is available at each base Not sure how often it is synchronized, perhaps weekly Search for an email address is quick
17
Lecture 24Distributed Databases17 US Air Force Email Search for an email address is quick since a local copy is kept This reduces network traffic considerably compared with everyone having to search a centralized DB for email addresses
18
Lecture 24Distributed Databases18 Query Processing in DDB When we looked at query processing before, the largest delay was with the disk Now, that same concept is extended to include network delay – which can be much longer Suppose the EMPLOYEE DB (10,000 records, 100 bytes each) is at site 1, and the DEPARTMENT DB (100 records, 35 bytes each) is at site 2 YOU are at site 3 Assume result is 400,000 bytes
19
Lecture 24Distributed Databases19 Query Processing in DDB SELECT E_Name FROM EMPLOYEE WHERE DeptNum = 5 There are 3 strategies: 1) Txfr both DBs to site 3 to perform the query (1,003,500 bytes txfr’d) 2) Txfr EMPLOYEE to site 2, perform the query, txfr result to site 3 (1,400,000 bytes txfr’d) 3) Txfr DEPARTMENT to site 1, perform the query, txfr result to site 3 (403,500 bytes)
20
Lecture 24Distributed Databases20 Query Processing using Semijoin Rather than sending the entire set of records to be joined, we could just send the joining attribute(s) Then the join is performed and the join attributes as well as the attributes projected, can be transferred to the requesting site The semijoin is symbolized as: NOTE: R S S R Substantially reduces amount of data txfr’d
21
Lecture 24Distributed Databases21 Concurrency Control and Recovery Dealing with multiple copies Failure of individual sites Failure of network Distributed commit is more complicated Deadlock is more difficult to detect and prevent A number of techniques have been proposed to deal with these problems
22
Lecture 24Distributed Databases22 Distinguished Copy The locks for a data item are associated with the distinguished copy There are several distinguished copy variations: Primary site (with backup) One site is the chosen one and coordinates locking activities (centralized locking) Primary copy Various fragments at different sites are chosen as the distinguished copy – this distributes the locking problem
23
Lecture 24Distributed Databases23 Distributed Recovery Very complex Suppose that X sends a request to Y – there may be a number of reasons the request was not granted Message was never delivered Site Y is down Site Y sent a response but the response was not delivered
24
Lecture 24Distributed Databases24 Summary Re-read the first 23 slides! Advantages/Disadvantages of a DDB The 3 Transparencies: network, replication, fragmentation Fragmentation Replication and Synchronization Query Processing in a DDB Semijoin Concurrency Control and Recovery
25
Lecture 24Distributed Databases25 Primary Site Technique
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.