Presentation on theme: "Distributed DBMSs – Concepts and Design Chapter 22 in Textbook."— Presentation transcript:
Distributed DBMSs – Concepts and Design Chapter 22 in Textbook
Overview 2 Concepts. What is a distributed DBMS? Distributed Processing. Homogeneous vs. Heterogeneous. Functions of a DDBMS. Components of a DDBMS. Advantages and Disadvantages. DDBMS Design. Fragmentation. Replication. Allocation. DDBMS Transparencies. Date’s 12 Rules for a DDBMS.
Concepts 3 Centralized DBMS systems with a single logical database located at one site under the control of a single DBMS. Distributed DBs logically interrelated collection of shared data physically distributed over a computer network. Applications can be classified into: Local applications. Global applications.
Distributed DBMS 4 Distributed DBMS The software system that: manages the distributed DBs. makes distribution transparent to users. allows users to access data on their own site as well as remote sites. Transparent distribution is the fundamental principle of DDBMS.
Characteristics of DDBMS 5 A collection of logically related shared data. The data is split into a number of fragments. Fragments may be replicated. Fragments/replicas are allocated to sites. The sites are linked by a communications networks. The data at each site is under the control of a DBMS. The DBMS at each site can handle local applications. Each DBMS participates in at least one global application.
Distributed DBMS Topology 6 Site 1 Site 2 Site 3 Site 4 Computer Network Data itself is distributed and access to it can be local or remote.
Distributed Processing 7 Site 1 Site 2 Site 3 Site 4 Computer Network Data itself is centralized but access to it can be local or remote.
Homogeneous vs. Heterogeneous DDBMS 8 Homogenous system: all sites use the same DBMS product. Heterogeneous system: sites may run different DBMS products & data model. Possible differences between data in different DBS: Data type difference. Value difference. Semantic difference.
Functions of a DDBMS 9 Provide access to remote sites and allow transfer of queries & data among the network’s site. Store data distribution details. Distributed data processing. Security control. Concurrency control. Recovery services.
Components of a DDBMS 10 Site 1 Site 3 Computer Network DDBMS DC LDBMS DDBMS DC GSC DB Global system catalog Data communication component
Advantages of DDBMS 11 Reflects organizational structure. Improve sharability & local autonomy. Improved availability. Improved reliability. Improved performance.
Disadvantages of DDBMS 12 Complexity. Cost. Security. Integrity control more difficult. Lack of standards. Lack of experience. DB design more complex.
Distributed Relational DB Design 13 We have a group of tables and we want to distribute them between a group of sites. Consists of 3 major steps: 1. Fragmentation divide a relation into a number of sub-relations (fragments). (Horizontal & vertical). 2. Replication make a copy of a fragment. 3. Allocation decide where (which site) each of the fragments and replicas are to be stored.
Distributed Relational DB Design 14 When we fragment, replicate and allocate, we try to achieve: Locality of reference. Improved reliability and availability. Good performance. Balanced storage capacities and costs. Minimal communication costs.
Rules of Fragmentation 15 Completeness: Nothing (rows or columns) gets lost while we fragment. Reconstruction: We can get back the original table after we fragmented it. Dis-jointness: No row or column appears in 2 fragments (there is 1 exception).
18 BranchNo PropertyNo CityStreet Fragment P1 PostCode Type Rooms Rent OwnerNoStaffNoBranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 PropertyNo CityStreet Fragment P2 PostCodeTypeRoomsRentOwnerNoStaffNo PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005 PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003 Based on type of property. P1: Type=‘House’ (PropertyForRent) P2: Type=‘Flat’ (PropertyForRent) Horizontal Fragmentation
Original Staff Table 19 John Ann David Susan FName White Beech Ford Brand LNameBranchNo B005 B003 B007 SL21 SG37 SG14 SG5 StaffNo Manager Assistant Supervisor Assistant Position M F M F sexSalary 30000 12000 18000 24000 DOB 1 Oct 93 10 Nov 60 24 Mar 58 3 Jun 40
20 SL21 SG37 SG14 SG5 StaffNo John Ann David Susan FName White Beech Ford Brand LNameBranchNo B005 B003 B007 SL21 SG37 SG14 SG5 StaffNo Manager Assistant Supervisor Assistant Position M F M F sexSalary 30000 12000 18000 24000 DOB 1 Oct 93 10 Nov 60 24 Mar 58 3 Jun 40 Fragment S1Fragment S2 S1: staffno,Position,sex,DOB, Salary (STAFF) S2: staffno,fname,lname,BranchNo (STAFF) Vertical Fragmentation
21 StaffNo FName LNameBranchNo SG5 SusanBrand B007 SL21 SG37 SG14 SG5 StaffNo Manager Assistant Supervisor Assistant Position M F M F sexSalary 30000 12000 18000 24000 DOB 1 Oct 93 10 Nov 60 24 Mar 58 3 Jun 40 Fragment S2.3 SL21 StaffNo John FName White LNameBranchNo B005 Fragment S2.1 StaffNo FName LNameBranchNo Fragment S2.2 SG14 David Ford B003 SG37 AnnBeech B003 S2.1: BranchNo=‘B005’ (S2) S2.2: BranchNo=‘B003’ (S2) S2.3: BranchNo=‘B007’ (S2) S1: staffno,Position,sex,DOB, Salary (STAFF) S2: staffoo,fname,lname,BranchNo (STAFF) Fragment S1 Mixed Fragmentation – Vertical then Horizontal
Derived Horizontal Fragmentation 22 Derived Horizontal Fragmentation is the horizontal fragmentation of a table (child), T1, because we horizontally fragmented another related table (parent), T2. It is not explicitly specified in design but implied from fragmentation of T2. T1 (child) has a foreign key that belongs to T2 (parent). Relationship between T1 and T2 either 1-to-1 or Many-to-1. Use Semi-join operation:
Derived Horizontal Fragmentation 23 You were required by the design to horizontally fragment Staff table. S1: BranchNo=‘B003’ (Staff) S2: BranchNo=‘B005’ (Staff) S3: BranchNo=‘B007’ (Staff) John Ann David Susan FName White Beech Ford Brand LNameBranchNo B005 B003 B007 SL21 SG37 SG14 SG5 StaffNo Manager Assistant Supervisor Assistant Position M F M F sexSalary 30000 12000 18000 24000 DOB 1 Oct 93 10 Nov 60 24 Mar 58 3 Jun 40
Derived Horizontal Fragmentation 24 Ann David FName Beech Ford LNameBranchNo B003 SG37 SG14 StaffNo Assistant Supervisor Position F M sexSalary 12000 18000 DOB 10 Nov 60 24 Mar 58 FName LNameBranchNo StaffNo Position sexSalary DOB JohnWhite B005 SL21 ManagerM 30000 1 Oct 93 FName LNameBranchNo StaffNo Position sexSalary DOB SusanBrand B007 SG5 AssistantF 24000 3 Jun 40 Fragment S1 Fragment S2 Fragment S3
Derived Horizontal Fragmentation 25 After we fragmented Staff, we found out that there is a table related to it, PropertyForRent. Because Staff is now fragmented, it makes sense to fragment PropertyForRent too. PropertyForRentStaff handles 1N S1: BranchNo=‘B003’ (Staff) S2: BranchNo=‘B005’ (Staff) Pi: PropertyForRent staffNo Si S3: BranchNo=‘B007’ (Staff)
27 PropertyNo CityStreet Fragment P1 PostCode Type Rooms Rent OwnerNoStaffNoBranchNo PG4 6 Lawrence Glasgow G11 9QX Flat 3 350 CO40 SG14 B003 PG21 18 Dale Rd Glasgow G12 House 5 600 CO87 SG37 B003 PG36 2 Manor Rd Glasgow G32 4QX Flat 3 375 CO93 SG37 B003 PG16 5 Novar Dr Glasgow G12 9AX Flat 4 450 CO93 SG14 B003 PropertyNo CityStreet Fragment P2 PostCodeTypeRoomsRentOwnerNoStaffNoBranchNo PL94 6 Argy11 St London NW2 Flat 4 400 CO87 SL41 B005 PropertyNo CityStreet Fragment P3 PostCodeTypeRoomsRentOwnerNoStaffNoBranchNo PA14 16 Holhead Aberdeen AB7 5SU House 6 650 CO46 SA9 B007 Derived Horizontal Fragmentation
Transparencies in a DDBMS 28 4 main transparencies: 1. Distribution Transparency. a. Fragmnetation. b. Location. c. Replication. d. Local Mapping. e. Naming. 2. Transaction Transparency. 3. Performance Transparency. 4. DBMS Transparency.
1. Distribution Transparency 29 Allows the user to perceive the DB as a single, logical entity. Types: a. Fragmentation: the user does not need to know the data is fragmented. b. Location: the user does not need to know the location of fragments. c. Replication: the user does not need to know the fragments are replicated. d. Local Mapping: the user specifies the fragment and its location. e. Naming: DDBMS makes sure every item name is unique. Consider the distribution of the STAFF relation: S1: staffno,Position,sex,DOB, Salary (STAFF) S2: staffno,fname,lname,BranchNo (STAFF) S21: BranchNo=‘B003’ (S2) S22: BranchNo=‘B005’ (S2) S22: BranchNo=‘B007’ (S2)
a. Fragmentation Transparency 30 Highest level of distribution transparency. The user does not need to know that the data is fragmented. User treats DDB like a centralized DB. The database access are based on the global schema. Fragmentation of the data can be changed without impacting the user. Example: SELECT Fname, Lname FROM Staff WHERE position = ‘Manager’;
b. Location Transparency 31 The middle level of distribution transparency. The user must know that the data is fragmented but still does not need to know the location of the data. Data location can be changed without impact on the user. Example: SELECT Fname, Lname FROM S21 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROM S22 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROM S23 WHERE staffNo IN (SELECT staffNo FROM S1 WHERE position=‘Manager’)
c. Replication Transparency 32 User unaware of replication and location but knows that data is fragmented. On the same level with location transparency.
d. Local Mapping Transparency 33 The lowest level of distribution transparency. The user knows that the data is fragmented and the location of the data. Example: SELECT Fname, Lname FROM S21 AT SITE 3 WHERE staffNo IN (SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROM S22 AT SITE 5 WHERE staffNo IN (SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’) UNION SELECT Fname, Lname FROM S23 AT SITE 7 WHERE staffNo IN (SELECT staffNo FROM S1 AT SITE 5 WHERE position=‘Manager’)
e. Naming Transparency 34 Each item in distributed database must have a unique name. DDBMS must ensure that no two sites violate that. Solutions Create a central name server. Bottleneck. against local autonomy. Prefix an object with the identifier of the site. loss of distribution transparency.
2. Transaction Transparency 35 All transactions must ensure the consistency and integrity of the DDB. Each transaction that needs to access data in multiple sites is divided into multiple sub-transactions. Even if transaction is split, atomicity has to be maintained.
3. Performance Transparency 36 DDBMS performs as if it were a centralized DBMS. Should not suffer because it is distributed (network communication cost). When a site issues a query, the system must figure out the fastest way of executing it. Distributed Query Processor (DQP) must figure out: Which fragment to access. Which copy of fragment to access (if replication is used). Where are the fragments.
3. Performance Transparency 37 Consider the following distributed DB: Property(PropertyNo, city)10,000 records in London Client(ClientNo, maxPrice)100,000 records in Glasgow Viewing(PropertNo, ClientNo)1,000,000 records in London London site wants to list properties in Aberdeen that have been viewed by clients who have a maximum price limit greater than 200,000. SELECT p.propertyNo FROM Property P INNER JOIN (Client c INNER JOIN Viewing v ON c.clientNo = v.clientNo) ON p.propertyNo = v.propertyNo WHERE p.city = ‘Aberdeen’ AND c.maxprice > 200000;
3. Performance Transparency 38 After the query is issued, DDBMS must determine the most cost-effective strategy to execute the query. Strategies: 1. Move Client table to London and process query there. 2. Move Property and Viewing relation to Glasgow and process query there then return result. 3. Join Property and Viewing at London, project only property number and client number and move result to Glasgow to join with clients with salary > 200,000 then return results. 4. Select clients at Glasgow with salary > 200000, move them to London and join with viewing and Aberdeen property.
4. DBMS Transparency 39 Hides the fact that different sites have different local DBMSs. Heterogeneous DDBMSs.
Date’s 12 Rules for a DDBMS 40 1. Local autonomy. 2. No reliance on a central site. 3. Continuous operation. 4. Location independence. 5. Fragmentation independence. 6. Replication independence. 7. Distributed query processing. 8. Distributed transaction processing. 9. Hardware independence. 10. Operating system independence. 11. Network independence. 12. Database independence.
Your consent to our cookies if you continue to use this website.