Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel and Distributed Databases CS263 Lecture 16.

Similar presentations


Presentation on theme: "Parallel and Distributed Databases CS263 Lecture 16."— Presentation transcript:

1 Parallel and Distributed Databases CS263 Lecture 16

2 LECTURE PLAN  Parallel DBMS - What and Why?  What is a Client/Server DBMS?  Why do we need Distributed DBMSs?  Date’s rules for a Distributed DBMS  Benefits of a Distributed DBMS  Issues associated with a Distributed DBMS  Disadvantages of a Distributed DBMS

3 PARALLEL DATABASE SYSTEM

4 PARALLEL DBMSs WHY DO WE NEED THEM? More and More Data! We have databases that hold a high amount of data, in the order of 10 12 bytes: 10,000,000,000,000 bytes! Faster and Faster Access! We have data applications that need to process data at very high speeds: 10,000s transactions per second! SINGLE-PROCESSOR DBMS AREN’T UP TO THE JOB!

5 Improves Response Time. INTERQUERY PARALLELISM It is possible to process a number of transactions in parallel with each other. Improves Throughput. INTRAQUERY PARALLELISM It is possible to process ‘sub-tasks’ of a transaction in parallel with each other. PARALLEL DBMSs BENEFITS OF A PARALLEL DBMS

6 Speed-Up. As you multiply resources by a certain factor, the time taken to execute a transaction should be reduced by the same factor: 10 seconds to scan a DB of 10,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs PARALLEL DBMSs HOW TO MEASURE THE BENEFITS Scale-up. As you multiply resources the size of a task that can be executed in a given time should be increased by the same factor. 1 second to scan a DB of 1,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs

7 Sub-linear speed-up Linear speed-up (ideal) Number of CPUs Number of transactions/second 1000/Sec 5 CPUs 2000/Sec 10 CPUs 16 CPUs 1600/Sec PARALLEL DBMSs SPEED-UP

8 10 CPUs 2 GB Database Number of CPUs, Database size Number of transactions/second Linear scale-up (ideal) Sub-linear scale-up 1000/Sec 5 CPUs 1 GB Database 900/Sec PARALLEL DBMSs SCALE-UP

9 MEMORY CPU Shared Memory – Parallel Database Architecture

10 CPU Shared Disk – Parallel Database Architecture MMMMMM

11 Shared Nothing – Parallel Database Architecture CPU M M M M M

12 MAINFRAME DATABASE SYSTEM

13 DUMB SPECIALISED NETWORK CONNECTION TERMINALS MAINFRAME COMPUTER PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC

14 CLIENT/SERVER DATABASE SYSTEM

15 CLIENT/SERVER DBMS  Manages user interface  Accepts user data  Processes application/business logic  Generates database requests (SQL)  Transmits database requests to server  Receives results from server  Formats results according to application logic  Present results to the user CLIENT PROCESS

16 CLIENT/SERVER DBMS  Accepts database requests  Processes database requests  Performs integrity checks  Handles concurrent access  Optimises queries  Performs security checks  Enacts recovery routines  Transmits result of database request to client SERVER PROCESS

17    Data Request  Data Response   CLIENT/SERVER DBMS ARCHITECTURE CLIENT #1 CLIENT #2 CLIENT #3 PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC (FAT CLIENT) D/BASE SERVER  

18 D/BASE SERVER      Data Request  Data Response   CLIENT/SERVER DBMS ARCHITECTURE CLIENT #1 CLIENT #2 CLIENT #3 PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC (THIN CLIENT) PL/SQL

19 LAN CLIENT LAN CLIENT LAN CLIENT LAN CLIENT Leyton CLIENT Stratford DBMS WIDE AREA NETWORK Barking Leytonstone DISTRIBUTED PROCESSING ARCHITECTURE CLIENT

20 DISTRIBUTED DATABASE SYSTEM

21  A distributed database system is a collection of logically related databases that co-operate in a transparent manner.  Transparent implies that each user within the system may access all of the data within all of the databases as if they were a single database  There should be ‘location independence’ i.e.- as the user is unaware of where the data is located it is possible to move the data from one physical location to another without affecting the user. DISTRIBUTED DATABASES WHAT IS A DISTRIBUTED DATABASE?

22 WIDE AREA NETWORK LAN CLIENT DBMS DISTRIBUTED DATABASE ARCHITECTURE LAN CLIENT DBMS Leytonstone CLIENT DBMS Stratford CLIENT DBMS Barking CLIENT Leyton

23 D/BASE SERVER #1 CLIENT #1 D/BASE SERVER #2 CLIENT #2 CLIENT #3 M:N CLIENT/SERVER DBMS ARCHITECTURE NOT TRANSPARENT!

24 DB Computer Network Site 2 Site 1 GSC DDBMS DC LDBMS GSC DDBMS DC LDBMS = Local DBMS DC = Data Communications GSC = Global Systems Catalog DDBMS = Distributed DBMS COMPONENTS OF A DDBMS

25 Reduced Communication Overhead Most data access is local, less expensive and performs better. Improved Processing Power Instead of one server handling the full database, we now have a collection of machines handling the same database. Removal of Reliance on a Central Site If a server fails, then the only part of the system that is affected is the relevant local site. The rest of the system remains functional and available. DISTRIBUTED DATABASES ADVANTAGES

26 Expandability It is easier to accommodate increasing the size of the global (logical) database. Local autonomy The database is brought nearer to its users. This can effect a cultural change as it allows potentially greater control over local data. DISTRIBUTED DATABASES ADVANTAGES

27 A distributed system looks exactly like a non-distributed system to the user! 1. Local autonomy 2. No reliance on a central site 3. Continuous operation 4. Location independence 5. Fragmentation independence 6. Replication independence 7. Distributed query independence 8. Distributed transaction processing 9. Hardware independence 10. Operating system independence 11. Network independence 12. Database independence DISTRIBUTED DATABASES DATE’S TWELVE RULES FOR A DDBMS

28  Data Allocation  Data Fragmentation  Distributed Catalogue Management  Distributed Transactions  Distributed Queries – (see chapter 20) DISTRIBUTED DATABASES ISSUES

29 1. Locality of reference Is the data near to the sites that need it? 2. Reliability and availability Does the strategy improve fault tolerance and accessibility? 3. Performance Does the strategy result in bottlenecks or under-utilisation of resources? 4. Storage costs How does the strategy effect the availability and cost of data storage? 5. Communication costs How much network traffic will result from the strategy? DISTRIBUTED DATABASES DATA ALLOCATION METRICS

30 CENTRALISED DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs Lowest Unsatisfactory Highest

31 PARTITIONED/FRAGMENTED DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs High Low (item) – High (system) Lowest Satisfactory Low

32 COMPLETE REPLICATION DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs Highest High High (update) – Low (read)

33 SELECTIVE REPLICATION DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs High Average Satisfactory Low Low (item) – High (system)

34  Usage Applications are usually interested in ‘views’ not whole relations.  Efficiency It’s more efficient if data is close to where it is frequently used.  Parallelism It is possible to run several ‘sub-queries’ in tandem.  Security Data not required by local applications is not stored at the local site. DISTRIBUTED DATABASES WHY FRAGMENT DATA?

35 DISTRIBUTED DATABASES HORIZONTAL DATA FRAGMENTATION 333.00STRATFORDKHAN456 500.00BARKINGONO400 340.14BARKINGGREEN350 23.17STRATFORDSMITH345 200.00BARKINGGRAY324 1000.00STRATFORDJONES200 BALANCE BRANCHCUSTOMERACCOUNT Horizontal Fragmentation: Consists of a Restriction on a Relation. e.g., (  branch = ‘Stratford’ Account)

36 DISTRIBUTED DATABASES HORIZONTAL DATA FRAGMENTATION STRATFORD 333.00KHAN456 23.17SMITH345 1000.00JONES200 BALANCE BRANCH CUSTOMER ACCT NO. BARKING 500.00ONO400 340.14GREEN350 200.00GRAY324 BALANCE BRANCH CUSTOMER ACCT NO. STRATFORD BRANCH BARKING BRANCH

37 DISTRIBUTED DATABASES VERTICAL DATA FRAGMENTATION KJTR78KHA456T0208-500-5821STRATFORDKHAN456 ZZEE56GRA324S0208-545-7528BARKINGGRAY324 XXYY22JON200T0208-500-9000STRATFORDJONES200 PASSWORDLOGINPHONE NOSITENAMES# Vertical Fragmentation: Consists of a Projection on a Relation. e.g., (  S#, NAME, SITE, PHONE NO Student)

38 DISTRIBUTED DATABASES VERTICAL DATA FRAGMENTATION STRATFORD BARKING STRATFORD KHAN456 GRAY324 0208-500-5821 0208-545-7528 0208-500-9000 JONES200 PHONE NO. SITE NAME S# KJTR78 ZZEE56 XXYY22 KHA456T456 GRA324S324 JON200T200 PASSWORD LOGIN-ID S# STUDENT ADMINISTRATION NETWORK ADMINISTRATION

39 DISTRIBUTED DATABASES DISTRIBUTED CATALOG MANAGEMENT Centralised Global Catalog One site maintains the full global catalog. All changes to any local system catalog have to be propagated to the site maintaining the global catalog. Bad performance, single point of failure, compromises site autonomy. Dispersed Catalog There is no physical global catalog. Each time a remote data item is required, the catalogues from ALL other sites are examined for the item. This has severe performance penalties.

40 DISTRIBUTED DATABASES DISTRIBUTED CATALOG MANAGEMENT Replicated Global Catalog Each site maintains its own global catalog. Although this greatly speeds up remote data location, it is very inefficient to maintain. A detail of every data item added, changed or deleted locally has to be propagated to ALL other sites. Local-Master Catalog Each site maintains both its local system catalog as well as a catalog of all of its data items that are replicated at other sites. This avoids compromising site autonomy, is fairly efficient, and is not a single point of failure.

41 ATOMIC DISTRIBUTED TRANSACTION DISTRIBUTED DATABASES DISTRIBUTED TRANSACTIONS Stratford DB Barking DB Leyton DB Stratford DBMS Stratford Client Stratford Client Stratford Client Barking DBMS Leyton DBMS Global Transaction (a) Debit Stratford A/C £500 (b) Credit Barking A/C £350 (c) Credit Leyton A/C £150 (a) (b) (c)

42 TWO-PHASE COMMIT (2PC) - OK

43 TWO-PHASE COMMIT (2PC) - ABORT ‘Global Abort’

44  Architectural complexity.  Cost.  Security.  Integrity control more difficult.  Lack of standards.  Lack of experience.  Database design more complex. DISTRIBUTED DATABASES DISADVANTAGES OF DDBMSs


Download ppt "Parallel and Distributed Databases CS263 Lecture 16."

Similar presentations


Ads by Google