1 Distributed Databases architecture, fragmentation, allocation Lecture 1.

Slides:



Advertisements
Similar presentations
Distributed Database Systems
Advertisements

V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Distributed Database Systems Dr. Mohamed Osman Hegazi.
Transaction.
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
CS 347Notes 021 CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design Hector Garcia-Molina.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Chapter 25 Distributed Databases and Client-Server Architectures Copyright © 2004 Pearson Education, Inc.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Overview Distributed vs. decentralized Why distributed databases
1 Distributed Databases Review CS347 June 6, 2001.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
1 Distributed Databases CS347 Lecture 13 May 23, 2001.
...Looking back Why use a DBMS? How to design a database? How to query a database? How does a DBMS work?
Distributed Databases
Distributed databases
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database Design – Lecture 16
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
DISTRIBUTED DATABASE DESIGN
Session-9 Data Management for Decision Support
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
DDBMS Distributed Database Management Systems Fragmentation
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CS742 – Distributed & Parallel DBMSPage 2. 1M. Tamer Özsu Outline Introduction & architectural issues  Data distribution  Fragmentation  Data Allocation.
LM 9. Distributed Database Dr. Lei Li 1. Note: The content of the slides including figures are mainly based on a publicly available textbook chapter:
Distributed Databases and Client-Server Architectures
Distributed Database Concepts
Parallel Databases.
Distributed Database Management Systems
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Distributed Databases
The Gamma Database Machine Project
Distributed Databases
Presentation transcript:

1 Distributed Databases architecture, fragmentation, allocation Lecture 1

2 Centralized DBMS Software: Application SQL Front End Query Processor Transaction Proc. File Access P M... Simplifications: single front end one place to keep locks if processor fails, system fails, …..

3 Distributed DB Multiple processors, memories, and disks –Opportunity for parallelism (+) –Opportunity for enhanced reliability (+) –Synchronization issues (-) Heterogeneity and autonomy of “components” –Autonomy example: may not get statistics for query optimization from a site

4 Heterogeneity Select new investments Application RDBMS Files Stock ticker tape Portfolio History of dividends, ratios,...

5 Data management with multiple processors and possible autonomy, heterogeneity. Impacts: Data organization Query processing Access structures Concurrency control Recovery Big Picture

6 Outline Introductory topics –Database architectures –Distributed versus Parallel DB systems Distributed database design –Fragmentation –Allocation

7 Common DB architectures Shared memory PPP... M P M P M P M Shared disk

8 Common DB architectures P M... P M P M Shared nothing Number of other “hybrid” architectures are possible.

9 Selecting the “right” architecture Reliability Scalability Geographic distribution of data Performance Cost

10 Parallel vs. Distributed DB system Typically, parallel DBs: –Fast interconnect –Homogeneous software –Goals: High performance and Transparency Typically, distributed DBs: –Geographically distributed –Disconnected operation possible –Goal: Data sharing (heterogeneity, autonomy)

11 Typical query processing scenarios Parallel DB: –Distribute/partition/sort…. data to make certain DB operations (e.g., Join) fast Distributed DB: –Given data distribution, find query processing strategy to minimize cost (e.g. communication cost)

12 Distributed DB Design Top-down approach: have a database how to split and allocate to individual sites Multi-databases (or bottom-up): combine existing databases how to deal with heterogeneity & autonomy

13 Two issues in top-down design Fragmentation Allocation Note: issues not independent, but studied separately for simplicity.

14 Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E where loc=Sa where loc=Sb and… and... Motivation: Two sites: Sa, Sb Qa   Qb Sa Sb

15 # Name Loc Sal Sa10 SallySb25 TomSa15 Joe 5 8 Sa10 TomSa15 Joe7Sb25Sally.. F = {F 1,F 2 } At Sa At Sb E F 1 =  loc=Sa (E)F 2 =  loc=Sb (E)  primary horizontal fragmentation

16 Fragmentation Horizontal Primary depends on local attributes R Derived depends on foreign relation Vertical R

17 Horizontal partitioning techniques Round robin Hash partitioning Range partitioning

18 Round robin RF 0 F 1 F 2t1 t2 t3t4...t5 Evenly distributes data Good for scanning full relation Not good for point or range queries

19 Hash partitioning RF 0 F 1 F 2 t1  h(k 1 )=2t1 t2  h(k 2 )=0t2 t3  h(k 3 )=0t3 t4  h(k 4 )=1t4... Good for point queries on key; also for joins Not good for range queries; point queries not on key Good hash function even distribution

20 Range partitioning RF 0 F 1 F 2 t1: A=5t1 t2: A=8t2 t3: A=2t3 t4: A=3t4... Good for some range queries on A Need to select good vector: else create imbalance  data skew  execution skew 47 partitionin g vector V 0 V 1

21 Which are good fragmentations? Example 2: F = {F 3,F 4 } F 1 =  sal 5 (E) Example 1: F = {F 1,F 2 } F 1 =  sal 20 (E) Problem: Some tuples lost! Tuples with 5 < sal < 10 are duplicated

22 Prefer to deal with replication explicitly Example: F = { F 5, F 6, F 7 } F 5 =  sal<=5 (E) F 6 =  5 < sal<10 (E) F 7 =  sal>=10 (E)  Then replicate F 6 if desired as part of allocation

23 Horizontal Fragmentation Desiderata R  F = {F 1,F 2,….} (1) Completeness  t  R,  F i  F such that t  F i (2) Disjointness F i  F j = Ø,  i,j such that i  j (3) Reconstruction   such that R =  F i i

24 Generating horizontal fragments Given simple predicates P r = {p 1, p 2,.. p m } and relation R. Generate “minterm” predicates M = {m | m =  p k *, 1  k  m}, where p k * is either p k or ¬ p k Eliminate useless minterms and simplify M to get M’. Generate fragments  m (R) for each m  M’.

25 Example: say queries use predicates A 5, Loc = S A, Loc = S B Eliminate and simplify minterms A 5  Loc = S A  Loc = S B A 5  Loc = S A  ¬(Loc = S B ) Final set of fragments (5 < A < 10)  (Loc = S A ) (5 < A < 10)  (Loc = S B ) (A  5)  (Loc = S A ) (A  5)  (Loc = S B ) (A  10)  (Loc = S A ) (A  10)  (Loc = S B ) Example Work out details for all minterms. 5 < A < 10

26 More on Horizontal Fragmentation Elimination of useless fragments/predicates depends on application semantics: –e.g.: if Loc  S A and  S B is possible, must retain fragments such as (5 <A < 10)  (Loc  S A )  (Loc  S B ) Minterm-based fragmentation generates complete, disjoint, and reconstructible fragments. Justify this statement.

27 Choosing simple predicates E (#,name,loc,sal,…) with common queries Qa: select * from E where loc = S A and… Qb: select * from E where loc = S B and… Qc: select * from E where sal < 10 and… Three choices for P r and hence F[P r ]: –P r = {} F 1 = F[P r ] = {E} –P r = {Loc = S A, Loc = S B } F 2 = F[P r ] = {  loc=SA (E),  loc=SB (E)} –P r = {Loc = S A, Loc = S B, Sal < 10} F 3 = F[P r ] = {  loc=SA  sal< 10 (E),  loc=SB  sal< 10 (E),  loc=SA  sal  10 (E),  loc=SA  sal  10 (E)}

28 Loc=S A  sal < 10 Loc=S A  sal  10 Loc=S B  sal < 10 Loc=S B  sal  10 F1F1 F3F3 F2F2 Q a : Select … loc = S A... Q b : Select … loc = S B... Prefer F 2 to F 1 and F 3

29 Desiderata for simple predicates Completeness Set of predicates P r is complete if for every F i  F[P r ], every t  F i has equal probability of access by every major application. Minimality Set of predicates P r is minimal if no P r ’  P r is complete. To get complete and minimal P r use predicates that are “relevant” in frequent queries Different from completeness of fragmentation

30 Derived horizontal fragmentation Example: Two relations Employee and Jobs E(#, NAME, SAL, LOC) J(#, DES,…) Fragment E into {E 1, E 2 } by LOC Common query: “Given employee name, list projects (s)he works in”

31 E1E1 (at S a ) (at S b ) E2E2 J

32 E1E1 (at S a ) (at S b ) E2E2 J1J1 J2J2 J 1 = J E 1 J 2 = J E 2

33 Derived horizontal fragmentation R, fragmented as F = {F 1, F 2, …, F n }  S, derive D = {D 1, D 2, …, D n } where D i =S F i Convention: R is called the owner relation S is called the member relation

34 Completeness of derived fragmentation J 1 U J 2  J (incomplete fragmentation) For completeness, enforce referential integrity constraint join attribute of member relation  joint attribute of owner relation Example: Say J is

35 E1E1 E2E2 J1J1 J J2J2 Fragmentation is not disjoint! Common way to enforce disjointness: make join attribute key of owner relation.

36 Vertical fragmentation E1E1 E E2E2 Example: R[T]  R 1 [T 1 ], R 2 [T 2 ],…, R n [T n ] T i  T  Just like normalization of relations

37 Properties R[T]  R i [T i ], i = 1.. n Completeness:  T i = T Reconstruction: R i = R (lossless join) –One way to guarantee lossless join: repeat key in each fragment, i.e., key  T i  i Disjointness: T i  T j = {key} –Check disjointness only on non-key attributes

38 E 1 (#,NM,LOC) E 2 (#,SAL) Example: E(#,NM,LOC,SAL)E 1 (#,NM) E 2 (#,LOC) E 3 (#,SAL) Which is the right vertical fragmentation? ….. Grouping Attributes

39 A 1 A 2 A 3 A 4 A 5 A A A A A A i,j  a measure of how “often” A i and A j are accessed by the same query Hand constructed using knowledge of queries and their frequencies Attribute affinity matrix Cluster attributes based on affinity R 1 [K,A 1,A 2,A 3 ] R 2 [K,A 4,A 5 ]

40 Allocation Example: E  F 1 =  loc=Sa (E); F 2 =  loc=Sb (E) Site a Site b Fragment E Do we replicate fragments? Where do we place each copy of each fragment? Site c F1F1 F1F1 F2F2

41 Issues Origin of queries Communication cost and size of answers, relations, etc. Storage capacity, storage cost at sites, and size of fragments Processing power at the sites Query processing strategy –How are joins done? Where are answers collected? Fragment replication –Update cost, concurrency control overhead

42 Optimization problem What is the best placement of fragments and/or best number of copies to: –minimize query response time –maximize throughput –minimize “some cost” –... Subject to constraints –Available storage –Available bandwidth, processing power,… –Keep 90% of response time below X –... Very hard problem