Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed DB 12. 1 CSE2132 Database Systems Week 12 Lecture Distributed Database.

Similar presentations


Presentation on theme: "Distributed DB 12. 1 CSE2132 Database Systems Week 12 Lecture Distributed Database."— Presentation transcript:

1 Distributed DB 12. 1 CSE2132 Database Systems Week 12 Lecture Distributed Database

2 Distributed DB 12. 2 Evolution of Distributed DBMS  CENTRALISED systems - all system components located on a single computer data DBMS secondary storage (disks etc) –Access via serially connected 'dumb terminals' - all processing occurs at central site  1980s changes: –business operations decentralised geographically, need to become 'lean-and-meaner, quick reacting, dispersed operations' –technological change low cost powerful computing platforms  led to DISTRIBUTED SYSTEMS

3 Distributed DB 12. 3 Distributed Data Base Management Systems  system components distributed over multiple sites, interconnected via communication system (network)  managed by Distributed Data Base Management System (DDBMS)  DDBMS advantages: –data located near the greatest demand site –faster data access - desired data subset locally available –faster data processing - system processing load spread out over multiple cpu's –growth facilitation - easy to add new sites to network –less danger of single point failure  DDBMS disadvantages: –complexity of management and control –security - weaker due to distribution (thus more people involved) and network traffic –lack of standards - many communication protocols exist eg. tcp/ip netbios DECnet etc.

4 Distributed DB 12. 4 Distributed Processing  database's logical processing shared among multiple cpu's  actual database resides on a single computer

5 Distributed DB 12. 5 Distributed Database  actual database stored over two or more independent cpu's - each part know as a database fragment

6 Distributed DB 12. 6 Distribution Options  Two important components of a DDBMS are –Transaction Processor (TP) - receives and processes the applications data request –Data Processor (DP) - stores and retrieves data located at the site  Data And Processing Distribution Options: –Single site processing, single site data all processing done on single cpu (host computer) all data stored on host computers local disk/s eg. traditional mainframe / minicomputer DBMS with dumb terminals or single user microcomputer DBMS

7 Distributed DB 12. 7 Distribution Options –Multi site processing, single site data multiple processes run on different computers all data stored on single computers local disk/s e.g. LAN file server TP acts as redirector - routes all network data requests to file server file server appears to end user as a hard disk eg. F: all data selection, search and update occur at workstation entire file must be transported across network inefficient, costly (communication)

8 Distributed DB 12. 8 Distribution Options (1)  Client-server is a term used very loosely - difficult to clearly define Client-server, similar to file server except database processing occurs at the server site - know as a database server. Client server is about a split in processing rather than a split in data. A 3 tier architecture is often employed. This requires communications middleware to resolve translation issues by the use of an agreed protocol(e.g. TCP/IP, IPX/SPX or NetBEUI). Client-server approaches overlap and and are used in conjunction with distributed database.  Multiple site processing, multiple site data This describes a fully distributed DBMS. They can then be classified as homogeneous or heterogeneous.

9 Distributed DB 12. 9 Distribution Options (2) homogeneous DDBMS The same DBMS running at each site. While there is much complexity to deal with the task of accessing data over many sites is simpler than in the heterogeneous case. heterogeneous DDBMS Will support different DBMSs and even different DBMS models (relational, network, hierarchical) at each site. Such implementations of a DDBMS operate under certain restrictions. (e.g. the access to other databasesis read only or only other relational databases can be accessed.)

10 Distributed DB 12. 10 Dates Distributed Database Rules DISTRIBUTED DATABASE SHOULD LOOK EXACTLY LIKE A NON DISTRIBUTED 'LOCAL' DATABASE LOCAL ACCESS TO LOCAL DATA Many vendors claim to be delivering distributed database applications - need criteria: DATES 12 RULES –0 - TO USER A DISTRIBUTED DATABASE SHOULD LOOK NO DIFFERENT TO A NON DDB - Distribution transparency, database treated as single logical database –1 - LOCAL AUTONOMY, local data owned and managed locally –2 - NO RELIANCE ON CENTRAL SITE –3 - CONTINUOUS OPERATION, failure transparency - site independence, even in the event of a node failure the system continues to operate –4 - LOCATION INDEPENDENCE, location transparency –5 - FRAGMENTATION INDEPENDENCE, fragmentation transparency

11 Distributed DB 12. 11 Dates rules continued –6 - REPLICATION INDEPENDENCE –7 - DISTRIBUTED QUERY PROCESSING, multiple site queries –8 - DISTRIBUTED TRANSACTION MANAGEMENT, transaction transparency - multiple site updates –9 - HARDWARE INDEPENDENCE –10 - OPERATING SYSTEM INDEPENDENCE –11 - COMMUNICATION NETWORK INDEPENDENCE –12 - DATABASE INDEPENDENCE

12 Distributed DB 12. 12 Levels of Distribution Transparency  Can be used as a method of classification by determining level of transparency supported by DDBMS at highest level: –FRAGMENTATION transparency No need to specify fragment names or locations select * from employee where dob < 01/01/40 –LOCATION transparency Specify fragment names but not locations select * from e1 where dob < 01/01/40; UNION select * from e2 where dob < 01/01/40; UNION select * from e3 where dob < 01/01/40;

13 Distributed DB 12. 13 Levels of Distribution Transparency –local MAPPING transparency (lowest level) Need to specify both fragment and location (using pseudo-SQL) select * from e1 node melbourne where dob < 01/01/40; UNION select * from e2 node sydney where dob < 01/01/40; UNION select * from e3 node adelaide where dob < 01/01/40;

14 Distributed DB 12. 14 DDBMS Operations  Join operation (most vendors supply) –Easier to achieve –Query optimisation critical  Update operation –eg debit / credit of two accounts at different sites –more difficult to manage –Need sophisticated transaction management - most popular strategy : TWO PHASE COMMIT  Two Phase Commit requires three operations: –DO - performs operation & records before and after image in transaction log –UNDO - undoes an operation using log entries created in DO –REDO - redoes an operation using log entries created in DO

15 Distributed DB 12. 15 Two Phase Commit  Site originating transaction (coordinator) sends request to sites (subordinates), each site processes sub transaction but does not commit.  Phase 1 - Preparation: –1. coordinator sends prepare to commit to all subordinates –2. subordinates receive message, write log entries and reply to coordinator - READY to COMMIT or NOT READY –3. coordinator checks all nodes ready to commit - if not broadcasts an ABORT, if all ready:  Phase 2 - Final COMMIT –1. coordinator broadcasts a commit message to all subordinates and awaits a reply –2. subordinate receives commit and updates database –3. subordinates reply with COMMITTED or NOT COMMITTED to the coordinator –If any subordinates did not commit, coordinator sends ABORT forcing an UNDO

16 Distributed DB 12. 16 DISTRIBUTION STRATEGIES The overall data model - the company view  Distribution principles –Examine geography / frequency of access –Guiding principle -minimise network traffic and communication costs

17 Distributed DB 12. 17  The sales team –Situated in Sydney –Need most access to customer/order/order-line/product  The supply branch –Situated in Melbourne –Need most access to warehouse/inventory/product  Data model partitioning What do we do with RELATIONS at the BOUNDARY ? FRAGMENTATION vs REPLICATION

18 Distributed DB 12. 18  Possible starting points: –Which site accesses it most? Storage at a single site Minimises update complexity –Is there a case for fragmentation? Minimises local access time Minimises network traffic –Is there a case for replication? Minimises local access time Minimises network traffic  Fragmentation –Horizontal fragmentation based on SELECTION, eg fragment customer table on customer_city

19 Distributed DB 12. 19 –Vertical fragmentation Based on projection, eg fragment product table on attributes needed by each group SALES : P_CODE, DESC, UNIT_PRICE SUPPLY : P_CODE, PACK_SIZE –Hybrid Fragmentation Assume 3 warehouses Footscray - p_code < 100 Collingwood - p_code = 100 Dandenong - p_code > 100 further subdivide vertically supply fragment SELECT P_CODE FOOTSCRAY SELECT P_CODE = 100 --> COLLINGWOOD SELECT P_CODE > 100 --> DANDENONG

20 Distributed DB 12. 20 Replication  Efficient retrievals vs Multi site updating –Costly, accident prone Updating techniques –Conservative –Don't commit until all sites accept –Primary node, one site accept updates and broadcast –Majority voting –Snapshots, etc

21 Distributed DB 12. 21 Query Optimization  Query - list the supplier numbers for cleveland suppliers of red parts ? –SUPPLIER(SUPPLIER#, CITY) 10,000 DETROIT –PART(PART#, COLOUR) 100,000 CHICAGO –SHIPMENT(SUPPLIER#, PART#) 1,000,000 DETROIT SELECT S.SUPPLIER# FROM SUPPLIER S, PART P, SHIPMENT H WHERE S.SUPPLIER# = H.SUPPLIER# AND H.PART# = P.PART# AND P.COLOUR = 'RED';  Time for query varies from 1 second to 2.3 days depending on the Query Plan selected


Download ppt "Distributed DB 12. 1 CSE2132 Database Systems Week 12 Lecture Distributed Database."

Similar presentations


Ads by Google