1 Distributed Databases (DDBs) Chap. 25. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.

Slides:



Advertisements
Similar presentations
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Advertisements

V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Chapter 25 Distributed Databases and Client-Server Architectures Copyright © 2004 Pearson Education, Inc.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
1 © Prentice Hall, 2002 Chapter 13: Distributed Databases Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 13 (Web): Distributed Databases
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed databases
Alexandria Dodd Janelle Toungett
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
Lecture 16- Distributed Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
DISTRIBUTED DATABASES
Distributed Database Systems Overview
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Chapter 19 Distributed Databases. 2 Distributed Database System n A distributed DBS consists of loosely coupled sites that share no physical component.
Distributed Database: Part 2. Distributed DBMS Distributed database requires distributed DBMS Distributed database requires distributed DBMS Functions.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 11 Distributed Databases Modern Database Management 9 th Edition Jeffrey A. Hoffer,
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
Distributed Databases
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Distributed Databases and Client-Server Architectures
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Distributed Database Concepts
Parallel and Distributed Databases
Database System Implementation CSE 507
Chapter 19: Distributed Databases
Distributed Databases
Distributed Databases Recovery
Introduction of Week 14 Return assignment 12-1
Distributed Databases
Presentation transcript:

1 Distributed Databases (DDBs) Chap. 25

2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations Multiple CPU's – each has DBMS, but data distributed

3 Advantages of DDBs Distributed nature of some DB applications (bank branches) Increased reliability and availability if site failure - also replicate data at > 1 site Data sharing but also local control Improved performance - smaller DBs exist at each site Easier expansion

4 Disadvantages: Increased complexity Additional functions needed: global vs. local queries –access remote sites, transmit queries and data keep track of data and replication execution strategies if data at > 1 site –which copy to access –maintain consistency of copies recover from site crashes

5 Architectures – parallel vs distributed Distributed Systems goal: to offer local DB autonomy at geographically distributed locations Parallel Systems goal: to construct a faster centralized computer –Improve performance through parallelization –Distribution of data governed by performance –Processing, I/O simultaneously

6 Parallel DBSs Shared-memory multiprocessor –get N times as much work with N CPU's access –MIMD, SIMD - equal access to same data, massively parallel Parallel shared nothing –data split among CPUs, each has own CPU, divide work for transactions, communicate over high speed networks LANs - homogeneous machines CPU + memory - called a site

7 Distributed DBSs (DDBS) Distributed DB - share nothing –lower communication rates –WAN –heterogeneous machines –Homogeneous DDBS homogeneous – same DBMSs –Heterogeneous DDBS different DBMSs - need ODBC, standard SQL

Slide 25- 8

9 Heterogeneous distributed DBSs HDDBs –Data distributed and each site has own DBMS ORACLE at one site, DB2 at another, etc. –need ODBC, standard SQL –usually transaction manager responsible for cooperation among sites – must coordinate distributed transaction –need data conversion and to access data at other sites

10 Federated DB - FDBS federated DB is a multidatabase that is autonomous collection of cooperating DBSs that are heterogeneous preexisting DBs form new database Each DB specifies import/export schema (view) –keeps a partial view of total schema Each DB has its own local users, local transparency and DBA – appears centralized for local autonomous users – appears distributed for global users

Must change classroom Wed. –SEC

12 FDBSs - Data Integration Fully-integrated, logical composite of all constituent databases Potential problems –Incompatible data types or query syntax –Semantically equivalent by differently named parts of schemas –Use views for data integration reference –Five level schema architecture Component – translate local schema into common database model

DDBs Client-Server –Client sends request for service (strict – fixed roles) –3-tier architecture P2P –Every site can act as server to store part of DB and as client to request service 13

Slide

15 DDBS Issues in DDBS in slides that follow

16 Data fragments Can distribute a whole relation at a site or Data fragments – logical units of the DB assigned for storage at various sites – horizontal fragmentation - subset of tuples in the relation (select) – vertical fragmentation - keeps only certain attributes of relation (project) need a PK

17 Fragments cont’d horizontal fragments: – complete - set of fragments whose conditions include every tuple – if disjoint - tuples only member of 1 fragment salary < 5000 and dno=4 vertical fragments – –Complete can be obtained by: L1 U L2 U... Ln - attributes of R where Li intersect Lj = PK(R)

18 Replication Full vs. partial replication Which copy to access Improves performance for global queries but updates a problem Ensure consistency of replicated copies of data

19 Example replication/fragmentation Example of fragments for company DB: site 1 - company headquarters gets entire DB site 2, 3 – horizontal fragments based on dept. no.

Slide

Slide

22 To process a query Must use data dictionary that includes info on data distribution among servers Parse user query –decomposed into independent site queries –each site query sent to appropriate server site –site processes local query, sends result to result site –result site combines results of subqueries Ensure atomicity

23 Additional Issues in DDBSs Distributed transaction if data distributed Transaction manager (TM) knows where data is stored TM decides which site starts local components of transaction Distributed query processing

24 Distributed Query Processing If Query (read-only) - no 2PC needed If horizontally fragmented – decompose query into subqueries that can be processed in parallel –one site (result site) accepts all results and puts together Must also do order by, count

Query Parallelism Decompose query into parts that can be executed in parallel at several sites –Intra query parallelism If shared nothing & horizontally fragmented: Select name, phone from account where age > 65 –Decompose into K different queries –Result site accepts all and puts together (order by, count) 25

Query Parallelism What if a join and table is fragmented? –Difficult problem if table fragments at different sites –Must get all values of join attributes at one site –Then broadcast to relevant sites value of join attribute If site 1 has values 1-10 and site 2 has 11-20, only sent to those sites Result tuples returned, join performed –Example: A |X|B Similar to semi-join |X A1 B1 A2 B2 A3 B3 each at a different site A1,A2,A3 sent to S1 S1 sends FK to B1,B2,B3 sites B1,B2,B3 site send tuples where PK=A1,A2,A3 join performed at S1 26

27 Distributed Query Processing If the data is not fragmented –process whatever operations you can locally (select, project) Query can involve joins between data at multiple sites Must transfer data over network Transfer of data highest cost –Algorithms to minimize amount of data transferred (NP-hard)

28 Distributed Query Processing Assume R1 |X| R2 → Site 3 with R1 at S1 and R2 at S2 Can do: a)R1 → S3 and R2 → S3, do join at S3 b)R1 → S2, execute join at S2, result to S3 c)R2 → S1, execute join at S1, result to S3

Semi-join |X Send join attribute from Si to Sj Select tuples at Sj who match keys sent, send those tuples back to Si Perform join at Si 29

30 Example using Semi-join  dname, lname Department |X| mgr=ssn Employee Assume Department is R1 and Employee is R2: 1) At S2: E‘ ← (  ssn Emp) //project join attribute Send to S1 2) At S1: E’’ ← (  mgrssn, dname (Department |X| mgrssn=ssn E')) join with department and send only needed attributes Send to S2  3) At S3:  dname, lname (E'' |X| mgrssn=ssn Employee) join with employee and send final attributes needed Send to S3

31 Distributed Query Processing Henver & Yao Cost: –time to compute results –time to transmit data Local data requires only processing time Assume transmission time is most costly Goal is to minimize transmission time

32 Distributed Query Processing Assume you have: R3=R1|X| att=att R2 R1 is at S1, R2 at S2 and result relation R3 sent to result site S Amount of data to transmit: –Assume R1 is 1M bytes –R2 is 250K bytes –R3 is 50k bytes –Using semi-join:  att R1 is 100K  att R2 is 50K  att R1|X|R2 is 10K  att R2|X|R1 is 25K

33 Distributed Query Processing Show all possibilities to previous problem and determine best distributed query processing strategy

Why is Checkpoints/recovery important? Checkpoints/recovery HPC – IBM used different chkpt / recovery schemes Took several days to recover from a crash because different schemes used, no prior knowledge Needs to be studied 34

35 Concurrency Control and Recovery in Distributed Databases Distributed transactions inherit ACID properties problems: – failure at individual sites and communication links – distributed deadlock (usually timeout) – distributed commit (use 2PC) – concurrency control with multiple copies of data

36 Distributed concurrency control using locking Centralized vs. distributed solutions Definition: distinguished copy –Locks associated with this copy

37 Distributed Locking - Primary site technique Primary site technique Distinguished copy of all data at this site All locks kept at this site, all requests sent here Site acts as coordinator for all DB items If write lock, DBMS must update all copies If read lock from primary site, can access local copy disadv: all locking requests sent to same site - bottleneck if failure at primary site? must abort and restart all Ts

38 Distributed Locking - Primary site with backup Primary site with backup all lock information at both sites, so failure is less of a problem disadv: slower - 2 sites to communicate with

39 Distributed Locking - Primary copy technique Primary copy technique - distribute responsibility distinguished copies of different items at different sites (can also choose a new coordinator if a failure - election to choose self as new site)

40 Distributed Locking - Voting Voting - no distinguished copy lock request sent to all sites with copy of data if granted lock by majority of copies, hold lock and inform sites with copies lock is granted if don't get a majority, timeout and cancel request sent to all sites disadv: lots of message traffic

41 Distributed Commit - Chap if updating data at different sites for same transaction, when to commit? difficult part - commit together so have atomicity states of basic transactions: –active - can enter other 2 states –committed - cannot leave this state –aborted - cannot leave this state

42 States commit Active abort

43 Commit problems To commit a distributed transaction need a coordinator (originator) site (S1 in our example) Given transaction T with with components T1 and T2 – where T1 executes at site S1 – T2 executes at site S2 a) if S1 commits locally, sends commit message to S2 – BUT if S2 crashes, T2 aborts before completes subtask – not atomic b) if both T1 and T2 ready to commit S1 sends message to commit – BUT if T2 crashes during commit, not atomic if can’t recover

44 To solve problem How to solve these problems? Need an additional state ready state when T is done and ready to commit, it goes to the ready state in the ready state, a transaction can go to the commit or abort state place prepare log on log buffer, forced to log file - can reconstruct the transaction

45 States commit ActiveReady to commit abort

46 Use Two-Phase Commit (2PC) Used by commercial DBS to achieve coordinated commit of distributed transactions Need a coordinator - usually a site that is participating in the transaction Only needed if updates

47 2PC Phase 1 – – When each subquery concludes, signal coordinator –If coordinator doesn’t hear from everyone (time out) send abort –Else coordinator receives done from all sends messages to all sites asking them to prepare site transactions Each site writes changes to log file if log successful –Send “ready to commit” and enter prepared state else if fails or cannot commit for other reasons –send “cannot commit”

48 2PC Phase 2 – –If coordinator does not receive prepare to commit from all, coordinator sends abort –Otherwise all prepare requests were successful responses, so coordinator sends commit Sites record commit in log, update DB subquery concluded ready to commit (prepared state) commit commit recorded

49 Solved by 2PC? a) if S1 commits locally, sends commit to S2 – BUT if S2 crashes, T2 aborts and it is not atomic With 2PC –if T1 completes Transaction T1 is prepared –If T2 crashes T2 is unsuccessful T1 can still abort from prepared state

50 Solved by 2PC? b) If both T1 and T2 ready to commit, S1 sends message to commit – BUT if T2 crashes before commit completed, not atomic With 2PC –if both T1 and T2 successfully complete Transaction T2 and T1 are both prepared T1 able to commit. If T2 crashes during commit can recover prepared state and commit – uses log

51 Any problems with 2PC? A blocking protocol (block until commit or rollback received) In most practical applications, blocking caused by 2PC are rare so most systems use 2PC However, when could 2PC block?

52 2PC blocking However, when could 2PC block? –Problem can occur when site in prepared state –Site voted to commit –Cannot commit until receive commit from coordinator –Cannot change vote and abort –Participant is blocked until receive message from coordinator – what if coordinator fails? (or coordinator can block while waiting for replies from participants – what about time out?) Solution? –Divide second commit phase into prepare-to-commit and commit

3PC subquery concluded ready to commit (prepared state) prepare to commit commit commit recorded In prepare to commit –Sets maximum time-out period (if coordinator crashes and don’t hear from coordinator) –Allows sites to communicate with each other –Can still commit or abort and release locks 53