Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.

Slides:



Advertisements
Similar presentations
Query Processing Reading: CB, Chaps 5 & 23. Dept of Computing Science, University of Aberdeen2 In this lecture you will learn the basic concepts of Query.
Advertisements

Distributed DBMS©M. T. Özsu & P. Valduriez Ch.15/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed DBMS©M. T. Özsu & P. Valduriez Ch.14/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Distributed Database Systems
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
Relational Model dww-database system.
Distributed DBMSPage 6. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database Design.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed Query Processing –An Overview
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Relational Model The main reference of this presentation is the textbook and PPT from : Elmasri & Navathe, Fundamental of Database Systems, 4 th edition,
Relational Model Indra Budi Fakultas Ilmu Komputer UI 2 Essentials of Relational Approach The Relational Model of Data is Based on.
Distributed Database Systems Dr. Mohamed Osman Hegazi.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.4/1 Outline Introduction Background Distributed Database Design Database Integration ➡ Schema Matching ➡
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline  Introduction à What is a distributed DBMS à Problems à Current state-of-affairs.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Distributed DBMSPage 4. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background  Distributed DBMS Architecture  Datalogical Architecture.
1 Distributed Databases Review CS347 June 6, 2001.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Institut für Scientific Computing – Universität WienP.Brezany Optimization of Distributed Queries Univ.-Prof. Dr. Peter Brezany Institut für Scientific.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Lecture 5 on Query Optimization
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline  Introduction à What is a distributed DBMS à Problems à Current state-of-affairs.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Query Processing Presented by Aung S. Win.
low level data manipulation
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Session-9 Data Management for Decision Support
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Οι διαφάνειες καλύπτουν μέρος των Κεφαλαίων 7&8: Distributed Database QueryProcessing and Optimization.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Overview of Query Processing
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Query Processor  A query processor is a module in the DBMS that performs the tasks to process, to optimize, and to generate execution strategy for a high-level.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.8/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Query Processing Bayu Adhi Tama, MTI. 1 ownerNoclient © Pearson Education Limited 1995, 2005.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Chapter 18 Query Processing. 2 Chapter - Objectives u Objectives of query processing and optimization. u Static versus dynamic query optimization. u How.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
L4: Query Optimization (1) - 1 L4: Query Processing and Optimization v 4.1 Query Processing  Query Decomposition  Data Localization v 4.1 Query Optimization.
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline n Introduction Background Distributed DBMS Architecture Distributed Database.
CS742 – Distributed & Parallel DBMSPage 2. 1M. Tamer Özsu Outline Introduction & architectural issues  Data distribution  Fragmentation  Data Allocation.
CS742 – Distributed & Parallel DBMSPage 3. 1M. Tamer Özsu Outline Introduction & architectural issues Data distribution  Distributed query processing.
Outline Background Introduction Distributed DBMS Architecture
DISTRIBUTED DATABASE ARCHITECTURE
DATA ACCESS CONTROL, MANAGEMENT DATA AND SECURITY (CIB125) PERTEMUAN 6
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Distributed Database Management Systems
Example Schema: Employee (ENO, ENAME, TITLE)
Advance Database Systems
Database Dr. Roueida Mohammed.
Distributed Database Management Systems
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control Distributed Query Processing ➡ Overview ➡ Query decomposition and localization ➡ Distributed query optimization Multidatabase query processing Distributed Transaction Management Data Replication Parallel Database Systems Distributed Object DBMS Peer-to-Peer Data Management Web Data Management Current Issues

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/2 Query Processing in a DDBMS high level user query query processor Low-level data manipulation commands for D-DBMS

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/3 Distributed Query Processing Methodology Calculus Query on Distributed Relations CONTROL SITE LOCAL SITES Query Decomposition Query Decomposition Data Localization Data Localization Algebraic Query on Distributed Relations Global Optimization Global Optimization Fragment Query Local Optimization Local Optimization Optimized Fragment Query with Communication Operations Optimized Local Queries GLOBAL SCHEMA GLOBAL SCHEMA FRAGMENT SCHEMA FRAGMENT SCHEMA STATS ON FRAGMENTS STATS ON FRAGMENTS LOCAL SCHEMAS LOCAL SCHEMAS

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/4 Step 1 – Query Decomposition Input : Calculus query on global relations Normalization ➡ manipulate query quantifiers and qualification Analysis ➡ detect and reject “incorrect” queries ➡ possible for only a subset of relational calculus Simplification ➡ eliminate redundant predicates Restructuring ➡ calculus query   algebraic query ➡ more than one translation is possible ➡ use transformation rules

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/5 Normalization Lexical and syntactic analysis ➡ check validity (similar to compilers) ➡ check for attributes and relations ➡ type checking on the qualification Put into normal form ➡ Conjunctive normal form ( p 11  p 12  …  p 1 n )  …  ( p m 1  p m 2  …  p mn ) ➡ Disjunctive normal form ( p 11  p 12  …  p 1 n )  …  ( p m 1  p m 2  …  p mn ) ➡ OR's mapped into union ➡ AND's mapped into join or selection

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/6 Analysis Refute incorrect queries Type incorrect ➡ If any of its attribute or relation names are not defined in the global schema ➡ If operations are applied to attributes of the wrong type Semantically incorrect ➡ Components do not contribute in any way to the generation of the result ➡ Only a subset of relational calculus queries can be tested for correctness ✦ Those that do not contain disjunction and negation ➡ To detect ✦ connection graph (query graph) ✦ join graph

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/7 Analysis – Example SELECTENAME,RESP FROMEMP, ASG, PROJ WHEREEMP.ENO = ASG.ENO AND ASG.PNO = PROJ.PNO ANDPNAME = "CAD/CAM" ANDDUR ≥ 36 ANDTITLE = "Programmer" Query graph Join graph DUR≥36 PNAME=“CAD/CAM” ENAME EMP.ENO=ASG.ENO ASG.PNO=PROJ.PNO RESULT TITLE = “Programmer” RESP ASG.PNO=PROJ.PNO EMP.ENO=ASG.ENO ASGPROJEMP PROJASG

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/8 Analysis If the query graph is not connected, the query may be wrong or use Cartesian product SELECTENAME,RESP FROMEMP, ASG, PROJ WHEREEMP.ENO = ASG.ENO ANDPNAME = "CAD/CAM" ANDDUR > 36 ANDTITLE = "Programmer" PNAME=“CAD/CAM” ENAME RESULT RESP ASGPROJEMP

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/9 Simplification Why simplify? ➡ Remember the example How? Use transformation rules ➡ Elimination of redundancy ✦ idempotency rules p 1  ¬( p 1 )  false p 1  ( p 1  p 2 )  p 1 p 1  false  p 1 … ➡ Application of transitivity ➡ Use of integrity rules

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/10 Simplification – Example SELECTTITLE FROMEMP WHEREEMP.ENAME = "J. Doe" OR(NOT(EMP.TITLE = "Programmer") AND(EMP.TITLE = "Programmer" OREMP.TITLE = "Elect. Eng.") ANDNOT(EMP.TITLE = "Elect. Eng."))  SELECTTITLE FROMEMP WHEREEMP.ENAME = "J. Doe"

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/11 Restructuring Convert relational calculus to relational algebra Make use of query trees Example Find the names of employees other than J. Doe who worked on the CAD/CAM project for either 1 or 2 years. SELECTENAME FROMEMP, ASG, PROJ WHEREEMP.ENO = ASG.ENO ANDASG.PNO = PROJ.PNO ANDENAME≠ "J. Doe" ANDPNAME = "CAD/CAM" AND(DUR = 12 OR DUR = 24)  ENAME σ DUR=12 OR DUR=24 σ PNAME=“CAD/CAM” σ ENAME≠“J. DOE” PROJASGEMP Project Select Join ⋈ PNO ⋈ ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/12 Restructuring –Transformation Rules Commutativity of binary operations ➡ R ×  S  S  × R ➡ R ⋈ S  S ⋈ R ➡ R  S  S  R Associativity of binary operations ➡ ( R  × S )  × T  R  × ( S  × T ) ➡ ( R ⋈ S ) ⋈ T  R ⋈ ( S ⋈ T ) Idempotence of unary operations ➡  A ’ (  A ’ ( R ))  A ’ ( R ) ➡  p 1 ( A 1 ) (  p 2 ( A 2 ) ( R ))  p 1 ( A 1 )  p 2 ( A 2 ) ( R ) where R [ A ] and A'  A, A"  A and A'  A" Commuting selection with projection

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/13 Restructuring – Transformation Rules Commuting selection with binary operations ➡  p ( A ) ( R  × S )  (  p ( A ) ( R ))  × S ➡  p ( A i ) ( R ⋈ ( A j,B k ) S )  (  p ( A i ) ( R )) ⋈ ( A j,B k ) S ➡  p ( A i ) ( R  T )  p ( A i ) ( R )  p ( A i ) ( T ) where A i belongs to R and T Commuting projection with binary operations ➡  C ( R  × S )  A ’ ( R ) ×  B ’ ( S ) ➡  C ( R ⋈ ( A j,B k ) S )  A ’ ( R ) ⋈ ( A j,B k )  B ’ ( S ) ➡  C ( R  S )  C ( R )  C ( S ) where R [ A ] and S [ B ]; C = A '  B ' where A'  A, B'  B

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/14 Example Recall the previous example: Find the names of employees other than J. Doe who worked on the CAD/CAM project for either one or two years. SELECT ENAME FROMPROJ, ASG, EMP WHEREASG.ENO=EMP.ENO ANDASG.PNO=PROJ.PNO ANDENAME ≠ "J. Doe" ANDPROJ.PNAME="CAD/CAM" AND(DUR=12 OR DUR=24)  ENAME  DUR=12  DUR=24  PNAME=“CAD/CAM”  ENAME≠“J. DOE” PROJASGEMP Project Select Join ⋈ PNO ⋈ ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/15 Equivalent Query  ENAME  PNAME=“CAD/CAM”  (DUR=12  DUR=24)  ENAME≠“J. Doe” × PROJ ASG EMP ⋈ PNO,ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/16 EMP  ENAME  ENAME ≠ "J. Doe" ASGPROJ  PNO,ENAME  PNAME = "CAD/CAM"  PNO  DUR =12  DUR=24  PNO,ENO  PNO,ENAME Restructuring ⋈ PNO ⋈ ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/17 Step 2 – Data Localization Input: Algebraic query on distributed relations Determine which fragments are involved Localization program ➡ substitute for each global query its localized query ✦ A localized query is a relational algebra query whose operands are the fragments of relations instead of the relations themselves ✦ We call these operands that are fragments of relations “ localization programs ” ✓ Union for horizontal fragmentation; Join for vertical fragmentation ✦ Replication is not taken into account in this chapter ➡ Optimize ✦ For each type of fragmentation, use reduction techniques to generate simpler queries ✦ To do so, use appropriate heuristics

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/18 Example Assume ➡ EMP is fragmented into EMP 1, EMP 2, EMP 3 as follows: ✦ EMP 1 =  ENO≤“E3” (EMP) ✦ EMP 2 =  “E3”<ENO≤“E6” (EMP) ✦ EMP 3 =  ENO≥“E6 ” (EMP) ➡ ASG fragmented into ASG 1 and ASG 2 as follows: ✦ ASG 1 =  ENO≤“E3” (ASG) ✦ ASG 2 =  ENO>“E3” (ASG) Replace EMP by (EMP 1  EMP 2  EMP 3 ) and ASG by (ASG 1  ASG 2 ) in any query  ENAME  DUR=12  DUR=24  PNAME=“CAD/CAM”  ENAME≠“J. DOE” PROJ   EMP 1 EMP 2 EMP 3 ASG 1 ASG 2 ⋈ PNO ⋈ ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/19 Reduction for PHF Reduction with selection ➡ Relation R and F R ={ R 1, R 2, …, R w } where R j =  p j ( R )  p i ( R j )=   if  x in R : ¬( p i ( x )  p j ( x )) ➡ Example SELECT* FROMEMP WHEREENO="E5"  ENO=“E5” EMP 1 EMP 2 EMP 3 EMP 2  ENO=“E5” 

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/20 Reduction for PHF Reduction with join ➡ Possible if fragmentation is done on join attribute, i.e., the selection attribute used for the fragmentation is the same as the join attribute ➡ Algorithm ✦ Distribute joins over unions ( R 1  R 2 ) ⋈ S  ( R 1 ⋈ S )  ( R 2 ⋈ S ) ✦ Eliminate useless joins as follows: Given R i =  p i ( R ) and R j =  p j ( R ) R i ⋈ R j =  if  x in R i,  y in R j : ¬( p i ( x )  p j ( y )) That is, qualifications of the joined fragments are in contradiction

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/21 Reduction for PHF Assume EMP is fragmented as before and ➡ ASG 1 :  ENO ≤ "E3" (ASG) ➡ ASG 2 :  ENO > "E3" (ASG) Consider the query SELECT* FROMEMP,ASG WHEREEMP.ENO=ASG.ENO Distribute join over unions Apply the reduction rule  EMP 1 EMP 2 EMP 3 ASG 1 ASG 2 ⋈ ENO  EMP 1 ASG 1 EMP 2 ASG 2 EMP 3 ASG 2 ⋈ ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/22 Provides Parallellism EMP 3 ASG 1 EMP 2 ASG 2 EMP 1 ASG 1  EMP 3 ASG 2 ⋈ ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/23 Eliminates Unnecessary Work EMP 2 ASG 2 EMP 1 ASG 1 EMP 3 ASG 2  ⋈ ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/24 Reduction for VF Find useless (not empty) intermediate relations Relation R defined over attributes A = { A 1,..., A n } vertically fragmented as R i =  A ' ( R ) where A '  A :  D,K ( R i ) is useless if the set of projection attributes D is not in A ' Example: EMP 1 =  ENO,ENAME (EMP); EMP 2 =  ENO,TITLE (EMP) SELECTENAME FROMEMP EMP 1 EMP 2  ENAME ⋈ ENO  ENAME

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/25 Reduction for DHF Rule : ➡ Distribute joins over unions ➡ Apply the join reduction for horizontal fragmentation (using the qualification of the primary fragments!) Example ASG 1 : ASG ⋉ ENO EMP 1 ASG 2 : ASG ⋉ ENO EMP 2 EMP 1 :  TITLE=“Programmer” (EMP) EMP 2 :  TITLE≠“Programmer” (EMP) Query SELECT * FROMEMP, ASG WHEREASG.ENO = EMP.ENO ANDEMP.TITLE = "Mech. Eng."

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/26 Generic query Selections first Reduction for DHF   ASG 1  TITLE=“Mech. Eng.” ASG 2 EMP 1 EMP 2  ASG 1 ASG 2 EMP 2  TITLE=“Mech. Eng.” ⋈ ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/27 Joins over unions Reduction for DHF Elimination of the empty intermediate relations (left sub-tree)  ASG 1 EMP 2  TITLE=“Mech. Eng.” ASG 2  TITLE=“Mech. Eng.” ASG 2 EMP 2  TITLE=“Mech. Eng.” ⋈ ENO

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/28 Reduction for Hybrid Fragmentation Combine the rules already specified: ➡ Remove empty relations generated by contradicting selections on horizontal fragments; ➡ Remove useless relations generated by projections on vertical fragments; ➡ Distribute joins over unions in order to isolate and remove useless joins.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/29 Reduction for HF Example Consider the following hybrid fragmentation: EMP 1 =  ENO≤"E4" (  ENO,ENAME (EMP)) EMP 2 =  ENO>"E4" (  ENO,ENAME (EMP)) EMP 3 =   ENO,TITLE (EMP) and the query SELECTENAME FROMEMP WHEREENO="E5" EMP 1 EMP 2  EMP 3  ENO=“E5”  ENAME EMP 2  ENO=“E5”  ENAME  ⋈ ENO