2Chapter Objectives Understand concepts of distributed DBMS Understand various transparency features of distributed databasesUnderstand distributed database design issues2
3What Is A Distributed DBMS? Decentralization of business operations and globalization of businesses created a demand for distributing the data and processes across multiple locations.Distributed database management systems (DDBMS) are designed to meet the information requirements of such multi-location organizations.A DDBMS manages the storage and processing of logically related data over interconnected computer systems in which both data and processing functions are distributed among several sites.Distributed processing shares the database’s logical processing among two or more physically independent sites that are connected through a network.14
4DDBMS Advantages Data located near site with greatest demand Faster data accessFaster data processingGrowth facilitationImproved communicationsReduced operating costsUser-friendly interfaceLess danger of single-point failureProcessor independence
5DDBMS Disadvantages Complexity of management and control Security Lack of standardsIncreased storage requirementsGreater difficulty in managing data environmentIncreased training costs
7Distributed DatabaseDistributed database stores a logically related database over two or more physically independent sites connected via a computer network.
8Distributed DatabaseStores logically related database over physically independent sitesFigure 10.2
9Distributed Database vs. Distributed Processing Does not require distributed databaseMay be based on a single database on single computerCopies or parts of database processing functions must be distributed to all data storage sitesDistributed databaseRequires distributed processingBothRequire a network to connect components
10Functions of DDBMS Application/end user interface Validation to analyze data requestsTransformation to determine request componentsQuery optimization to find the best access strategyMapping to determine the data locationI/O interface to read or write dataFormatting to prepare the data for presentationSecurity to provide data privacyBackup and recoveryDB AdministrationConcurrency ControlTransaction Management
11What Is A Distributed DBMS? Figure 10.3 Centralized Database Management System16
12What Is A Distributed DBMS? Figure 10.4 Fully Distributed Database Management System17
13DDBMS Components Computer workstations that form the network system. Network hardware and software components that reside in each workstation.Communications media that carry the data from one workstation to another.Transaction processor (TP) receives and processes the application’s data requests.Data processor (DP) stores and retrieves data located at the site. Also known as data manager (DM).18
15Levels of Data & Process Distribution Depending on the levels of data and process distribution we can envisage three different configurations:SPSD: Single site process, single site data (Centralized)MPSD: Multiple site processing, single site dataMPMD: Multiple site processing, multiple site data (Fully distributed)SPMD: Single site processing, multiple site data (Logically unsound)
16Levels of Data & Process Distribution Single-Site Processing, Single-Site Data (SPSD)All processing is done on a single CPU or host computer.All data are stored on the host computer’s local disk.The DBMS is located on the host computer.The DBMS is accessed by dumb terminals.This is an example of a centralized DBMS22
17Levels of Data & Process Distribution Figure 10.6 Nondistributed (Centralized) DBMS23
18Levels of Data & Process Distribution Multiple-Site Processing, Single-Site Data (MPSD)Typically, MPSD requires a network file server on which conventional applications are accessed through a LAN.A popular variation of the MPSD approach is known as a client/server architecture.24
19Levels of Data & Process Distribution Figure 10.7 Multiple-Site Processing, Single-Site Data25
20Levels of Data & Process Distribution Multiple-Site Processing, Multiple-Site Data (MPMD)Fully distributed DBMS with support for multiple DPs and TPs at multiple sites.Homogeneous DDBMS integrate only one type of centralized DBMS over the network.Heterogeneous DDBMS integrate different types of centralized DBMSs over a network.26
21Distributed DB Transparency A DDBMS ensures that the database operations are transparent to the end user.Different types of transparencies are:Distribution transparencyTransaction transparencyFailure transparencyPerformance transparencyHeterogeneity transparency28
22Distribution Transparency Distribution transparency allows us to manage a physically dispersed database as though it were a centralized database.Three Levels of Distribution TransparencyFragmentation transparencyLocation transparencyLocal mapping transparency29
23Distribution Transparency Example: Employee data (EMPLOYEE) are distributed over three locations: New York, Atlanta, and Miami. Depending on the level of distribution transparency support, three different cases of queries are possible:Distributed DBMSEmployee TableFragmentE1E2E3LocationNew YorkAtlantaMiami32
24Distribution Transparency When a DBMS support fragmentation transparency the user views a single logical databaseSELECT * FROM EMPLOYEE WHERE SALARY > 50000;
25Distribution Transparency When the DBMS supports location transparency the user needs to know the fragment names but need not know the actual location of the fragmentsSELECT *FROM E1 WHERE SALARY > 50000UNIONFROM E2WHERE SALARY > 50000FROM E3WHERE SALARY > 50000;
26Distribution Transparency When the DBMS supports local mapping transparency the user needs to know the fragment names as well as the actual location of the fragmentsSELECT * FROM E1 NODE NY WHERE SALARY > UNION SELECT * FROM E2 NODE ATL WHERE SALARY > UNION SELECT * FROM E3 NODE MIA WHERE SALARY > 50000;
27Distribution Transparency Distribution transparency is supported by a distributed data dictionary which captures the distributed global schema.A local transaction processor uses this global schema to translate user requests into subqueries (remote requests) that will be processed by different data processors.
28Transaction Transparency A distributed transaction updates and/or requests data from multiple remote sites.Transaction transparency ensures that the transaction will be completed only if all database sites involved in the transaction complete their part of the transaction.It maintains database integrity of a distributed database.Giving a 5% raise to all employees in the previous example involves updating the database at multiple locations. If the transaction cannot be committed in one location, it must be rolled back in all locations.
29Distributed DB Transparency Failure Transparency ensures that failure of a node will not affect the operation of a DDBMSPerformance Transparency ensures that the system performance will not degrade because of the distributed nature of the database.Query optimization becomes very complex in a distributed database due to fragmentation and replication of data in multiple remote nodes.Heterogeneity Transparency allows the integration of different types of DBMSs (multi vendor, multi model) under a common global schema.The DDBMS transparently translates the user requests from one local schema to another.
30Distributed Database Design All design principles and concepts discussed in the context of a centralized database also apply to a distributed database.Three additional issues are relevant to the design of a distributed database:data fragmentationdata replicationdata allocation
31Data FragmentationData fragmentation allows us to break a single object (a database or a table) into two or more fragments.Three type of fragmentation strategies are available to distribute a table:Horizontal, Vertical, Mixed.Horizontal fragmentation divides a table into fragments consisting of sets of tuplesEach fragment has unique rows and is stored at a different nodeExample: A bank may distribute its customer table by location
32Data FragmentationVertical fragmentation divides a table into fragments consisting of sets of columnsEach fragment is located at a different node and consists of unique columns - with the exception of the primary key column, which is common to all fragmentsExample: The Customer table may be divided into two fragments, one fragment consisting of Cust ID, name, and address may be located in the Service building and the other fragment with Cust ID, credit limit, balance, dues may be located in the Collection building.
33Data FragmentationMixed fragmentation combines the horizontal and vertical strategies.A fragment may consist of a subset of rows and a subset of columns of the original table.Example: Customer table may be divided by state and grouped by columns. The service building in Texas will store Customer service related information for customers from Texas.
34Data ReplicationData replication involves storing multiple copies of a fragment in different locations. For example, a copy may be stored in New York and another in San Francisco.It improves response time and data availability.Data replication requires the DDBMS to maintain data consistency among the replicas.A fully replicated database stores multiple copies of each database fragment.A partially replicated database stores multiple copies of some database fragments at multiple sites.
35Data AllocationData allocation decision involves determining the location of the fragments so as to achieve the design goals of cost, response time and availability.Three data allocation strategies are: centralized, partitioned and replicated.A centralized allocation strategy stores the entire database in a single location.A partitioned strategy divides the database into disjointed parts (fragments) and allocates the fragments to different locations.In a replicated strategy copies of one or more database fragments are stored at several sites.