Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Published byModified over 4 years ago
Presentation on theme: "Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised."— Presentation transcript:
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised DBs located in different places, developed for the specific information needs of each site Aim: to integrate these decentralised DBs into a coherent DDB
Advantages of Distributed DBs: Increased reliability of systems and availability of data Local control preserved Modular growth possible at each site and at new sites Optimised communication costs Faster response times
Control in normal DBs transaction control: ability of the DBMS to ensure the successful completion of transactions –commit transactions –roll-back to previous state concurrency control: ability of the DBMS to arbitrate between concurrent uses of data: –simultaneous access –simultaneous update –deletion
Control in Distributed DBs Different portions of the overall database reside at different locations these portions are controlled by different processors running sometimes different DBMSs common schema means queries can involve any portion of the DB residing at any location
Options for Distributed DBs Issue of physical design (data structure) performance of the DB (response time...) depends upon good design There are a number of options: –data replication –horizontal partitioning –vertical partitioning –combinations of the above
Data replication store a separate copy of the full tables in each location if a copy is stored at every site: Full Replication Advantages: –reliability –fast response Disadvantages –storage requirements –complexity and cost of updating
Horizontal partitioning some of the rows of the tables are stored in one location; others are stored at other locations eg: customers banking out of a particular branch Advantages: –efficiency –local optimisation –security Disadvantages: –inconsistent speed access –backup vulnerability
Vertical partitioning some columns are projected into base relationship at different sites all relations share a common domain so the full table can be reconstructed Advantages: –tailor-made support for functional areas –same as horizontal partitioning Disadvantages: –some queries might be very slow –users must understand some design issues
Combinations of the three methods most of the time, companies will use different methods each method is efficient in certain situations + some other security requirements eg: local customers, information originating at a certain site, shared processes that require the same data at all sites it is a design issue to try to identify the optimal distribution - data at the sites where it is used most
Distributed DBMS additional roles to play in the case of a distributed DB determine the location where data to be retrieved is located translate the request into the language used by the local DBMS deal with normal data management functions, security matters, locking, query optimisation...
Heterogeneous Distributed DBMS a different DBMS running at each site a master DBMS controlling the interactions amongst the parts not practical today (compatibility) more often, each DBMS follows the same data architecture
Problems with global transactions DBMSs can be radically different - relational versus network only some state-of-the-art commercial products have translating capabilities one alternative solution is to put some essential data and the directory of the data locations on a central server Real distributed DBMS solve these problems for the users with the help of the NOS
Commit Protocol to ensure the integrity of the data in update operations well defined procedure based on the exchange of messages (“ok” or “not ok”) each global transaction can either be complete (and completed) or aborted Two-phase commit: –site originating the transaction sends requests to all sites involved in the update –all sites attempt to process their part of the transaction without committing the data (temp files) –they notify the first site whether OK or not –the first site collects all OKs and sends order to commit the data
Timestamping Alternative to locking (possibility of deadlocks) ensures that transactions are processed in serial order so locking in not needed All updated records carry the timestamp of the transactions that modified them if new transaction attempts to update a record with an earlier timestamp = OK If new transaction...with a later stamp, update access is denied, the transaction is re-stamped and is re-started
Updated record Example: 168 Record update: 170OK 170 Record Update: 165Denied Record Update: 170Transaction re-started (ie: do it again) 170 Record in a DB +++: costly deadlock situations are avoided ----: transactions may sometimes be restarted even though they did not conflict with previous ones.
Effect of design on speed how to design fast queries simple example with two sites in relational DB: –supplier (Supplier#,...,City): 10,000 records stored in Detroit –part (part#,.., colour): 100,000 records stored in Chicago –Shipment (supplier#,..., Part#): 1,000,000 records stored in Detroit –each record is 100 characters long + there are 10 red parts –data transmission is 10,000 character/second, 1 second delay in any communication –data processing negligible Write the SQL statement Imagine how the query can be carried out between the two sites
SQL statement select supplier.supplier# from supplier, part, shipment where supplier.city = ‘Cleveland’ and supplier.supplier# = shipment.supplier# and shipment.part# = part.part# and part.color = ‘Red’
Conclusions Reasonably easy to optimise query with two tables Very complex with more than two (try with 30!) Rules: Queries must be broken down into components isolated at different sites (minimise communication time and traffic) Determine which site has the potential to yield FEWER selected records Move preliminary results to site where rest of the work can be performed (ie: try to move as few records as possible)