Tutorial 2: Distributed Query Processing

Slides:

Advertisements

Similar presentations

Query Processing Chapter 21 in Textbook.

Advertisements

Uncertainty in Data Integration Ai Jing

1 Concurrency: Deadlock and Starvation Chapter 6.

1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.

CS 319: Theory of Databases

CSCI 3130: Formal Languages and Automata Theory Tutorial 5

17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.

Cristian Lumezanu Neil Spring Bobby Bhattacharjee Decentralized Message Ordering for Publish/Subscribe Systems.

Jongsok Choi M.A.Sc Candidate, University of Toronto.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Database Performance Tuning and Query Optimization

Database Design Process

Microsoft Confidential. We look at the world... with our own eyes...

Digital Logic Design Gate-Level Minimization

1 Database Systems 10.8, 2008 Lecture #4. 2 Course Administration Assignment #1 is due next Wed (outside 336/338). Course slides will now have black background.

1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.

Step By Step Lei Yang Computer Science Department.

KU College of Engineering Elec 204: Digital Systems Design

Distributed Database Management Systems Lecture 15.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©

CMU SCS Carnegie Mellon Univ. School of Computer Science /615 - DB Applications C. Faloutsos & A. Pavlo Lecture #4: Relational Algebra.

1 © R. Guerraoui The Limitations of Registers R. Guerraoui Distributed Programming Laboratory.

CS4432: Database Systems II

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.

Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.

1 Distributed Databases CS347 Lecture 14 May 30, 2001.

Institut für Scientific Computing – Universität WienP.Brezany Fragmentation Univ.-Prof. Dr. Peter Brezany Institut für Scientific Computing Universität.

1 Distributed Databases Review CS347 June 6, 2001.

CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:

Simplifying Boolean Expressions Using K-Map Method

Overview Part 2 – Circuit Optimization 2-4 Two-Level Optimization

Tutorial of Course Project: Distributed Query Engine Jun Wang( 王军 ) East Main Building

DISTRIBUTED DATABASES IN ADBMS Shilpa Seth

Session-9 Data Management for Decision Support

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)

Optimization Algorithm

M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

3 1 Chapter 3 The Relational Database Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.

DDBMS Distributed Database Management Systems Fragmentation

Combinational Logic Part 2: Karnaugh maps (quick).

Operation Data Analysis Hints and Guidelines EGN 5621 Enterprise Systems Collaboration Summer B, 2014.

PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Methodology – Monitoring and Tuning the Operational System.

PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li.

3 1 Database Systems The Relational Database Model.

3 1 Chapter 3 The Relational Database Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.

Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,

CS4432: Database Systems II Query Processing- Part 1 1.

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Relational Algebra COMP3211 Advanced Databases Nicholas Gibbins

CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.

1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.

CS742 – Distributed & Parallel DBMSPage 2. 1M. Tamer Özsu Outline Introduction & architectural issues  Data distribution  Fragmentation  Data Allocation.

Module 2: Intro to Relational Model

Chapter 2: Intro to Relational Model

Order Database – ER Diagram

Optimization Algorithm

Gates Type AND denoted by X.Y OR denoted by X + Y NOR denoted by X + Y

Distributed Database Management Systems

Chapter 2: Intro to Relational Model

Chapter 2: Intro to Relational Model

Example of a Relation attributes (or columns) tuples (or rows)

Chapter 2: Intro to Relational Model

Lesson 3.3 Writing functions.

Presentation transcript:

Tutorial 2: Distributed Query Processing INFS3200/7907 Advanced Database Systems Semester 1, 2007

Overview Distributed query processing Distributed database design How does DDBMS process a query? Semi-join Distributed database design Derived horizontal fragmentation

Questions Q2 – Distributed Query Optimisation Q3 – Query Processing using Semijoin Q5 (a, c, d, e and f) – Derived Horizontal Fragmentation Q1 – Primary Horizontal Fragmentation

Q2 – Distributed query Optimization Question: Give operator trees for the following query After query decomposition using global schema After localization After reduction.

Question 2 (Continue) Database Schemas Results (Olympiad, EventID, CompID, Pos) Olympiad - one of the 4-year intervals between Olympic Games CompID (Competitor ID) is a foreign key of table Competitors. Competitors (CompID, Name, Country, FirstOlympiad, Other) FirstOlympiad is the first Olympiad in which the competitor competed. Other – a group of other attributes irrelevant to this question.

Question 2 (Continue) - Fragmentation BOC (Beijing Olympic Committee): σ Olympiad=2008 Results ∏ CompID, Name, Country Competitors IOC (International Olympic Committee): σ Olympiad<2008 Results ∏ CompID, FirstOlympiad, Other Competitors

Question 2 (Continue) –Fragmentation BOC σ Olympiad=2008 Results (Horizontal) Fragment rb (the rows in Beijing) Select * From Results Where Olympiad = ‘2008’ ∏ CompID, Name, Country Competitors (Vertical) Fragment cb (the columns in Beijing) Select CompID, Name, Country From Competitors

Question 2 (Continue) –Fragmentation IOC σ Olympiad<2008 Results (Horizontal) Fragment ri (The rows in IOC) Select * From Results Where Olympiad < ‘2008’ ∏ CompID, FirstOlympiad, Other Competitors (Vertical) Fragment ci (The columns in IOC) Select CompID, FirstOlympiad, Other From Competitors

Question 2 (Continue) ∏ Competitors.Name, Competitors.FirstOlympiad Query SELECT c.Name, c.FirstOlympiad FROM Competitors c, Results r WHERE r.pos=1 AND r.Olympiad=2008 AND c.CompId=r.CompID The Query Tree with optimization using global schema ∏ Competitors.Name, Competitors.FirstOlympiad Competitors.CompID=Results.CompID ∏ CompID ∏ Name, FirstOlympiad, CompID σ Results.Pos=1 σ Results.Olympiad=2008 Competitors Results

Question 2 (Continue) ∏ Competitors.Name, Competitors.FirstOlympiad After localization The global relations are replaced by their fragments Results is replaced by the UNION of rb: σ Olympiad=2008 Results ri: σ Olympiad<2008 Results Competitors by the JOIN of cb: ∏ CompID, Name, Country Competitors ci: ∏ CompID, FirstOlympiad, Other Competitors Competitors.CompID=Results.CompID ∏ CompID ∏ Name, FirstOlympiad, CompID σ Results.Pos=1 σ Results.Olympiad=2008 CompID=CompID Competitors Union Results cb ci cb and ci are vertical fragments rb ri rb and ri are horizontal fragments

Question 2 (Continue) ∏ c.Name, c.FirstOlympiad After Reduction After localization, but before reduction ∏ c.Name, c.FirstOlympiad ∏ c.Name, c.FirstOlympiad c.CompID=r.CompID c.CompID=rb.CompID ∏ r.CompID ∏ c.Name, c.FirstOlympiad, c.CompID ∏ rb.CompID cb.CompID=ci.CompID σ r.Pos=1 σ r.Olympiad=2008 σrb.Pos=1 ∏ cb.Name, cb.CompID ∏ ci.FirstOlympiad, ci.CompID Results Union cb.CompID=ci.CompID rb cb ci ri ci rb cb

Semijoin Semijoin The semijoin is joining similar to the natural join written as R ⋉ S where R and S are relations, and Whereas the result of the semijoin is only the set of all tuples in R for which there is a tuple in S that is equal on their common attribute names (foreign keys). R ⋉ S is not equivalent to S ⋉ R

Question 3 – Query Processing using semijoin Schemas Results (EventID, CompID, Pos) Records (EventID, Time) Results EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 Iles 7 ShotputW Cumba Ostapchuk 4 Li 9 Records EventID Time SwM100Free 0:47.84 SwW100Free 0:53.77

Q3a Compute results < records. EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 Iles 7 ShotputW Cumba Ostapchuk 4 Li 9 Can Results semi-joins Records (Results < Records)? Yes, EventID in Results is a foreign key of Records. From Results perspective, the outcome of Results < Records is a set of the tuples in Results including the identical values (SwM100Free) under the foreign key (EventID) in Records. Records EventID Time SwM100Free 0:47.84 SwW100Free 0:53.77 Results < Records EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 Iles 7

Q3b Compute records < results. EventID Time SwM100Free 0:47.84 SwW100Free 0:53.77 Can Records semi-joins Results (Records <Results)? Yes, EventID in Records is a foreign key of Results. From Records perspective, the outcome of Records < Results is a set of the tuples in Records including the identical values (SwM100Free) under the foreign key (EventID) in Results. Results EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 Iles 7 ShotputW Cumba Ostapchuk 4 Li 9 Records < Results EventID Time SwM100Free 0:47.84

Q3c – Assume Results is at site 1 and Records is at site 2, and a query Results Records has been issues at site 2. Give steps for a query processing strategy using semijoin, and check if the semijoin is a beneficial option in this case (ignore local processing cost) SITE 1 SITE 2 Results EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 Iles 7 ShotputW Cumba Ostapchuk 4 Li 9 Records EventID Time SwM100Free 0:47.84 SwW100Free 0:53.77 Strategy 1 (naïve) Send the entire table (Results) from site 1 to site 2 Cost: 21 cells (7 tuples * 3 attributes/tuple)

Q3c (Continues) – Assume Results is at site 1 and Records is at site 2, and a query Results Records has been issues at site 2. Give steps for a query processing strategy using semijoin, and check if the semijoin is a beneficial option in this case (ignore local processing cost) SITE 1 SITE 2 Results EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 Iles 7 ShotputW Cumba Ostapchuk 4 Li 9 Records EventID Time SwM100Free 0:47.84 SwW100Free 0:53.77 Strategy 2 (semijoin) Send the cell values (2 cells) under EventID in Records from site 2 to site 1. In site 1, Results < Records with the cells, and send back the outcome of the semijoin (totally 12 cell values) to site 2. The total (transmission) cost is 14 cells that are, obviously, CHEAPER than the Strategy 1 which cost is 21 cells.

A Quick Recap of Fragmentation Types of Fragmentation Horizontal Fragmentation Vertical Fragmentation Hybrid fragmentation

Types of Fragmentation Horizontal Fragmentation Characteristics A relation (table) is divided along tuples. The fragments are mutually exclusive/disjointed. How? SELECT Operation in Relational Algebra The reconstruction of the fragmentation is by UNION.

Types of Fragmentation Vertical Fragmentation Characteristics A relation is partitioned with attributes. Each fragment must contain primary keys. How? Project Operation in Relational Algebra The reconstruction of the fragmentation is by JOIN.

Types of Fragmentation Hybrid fragmentation The combination of Horizontal and Vertical Fragmentation

Question 5 Derived Horizontal Fragmentation Q (a): Write an SQL query to find the CompID and Pos of competitors in swimming events. Database Schemas Events (EventID, EName, Venue, Indoors?, Sport) Results (EventID, CompID, Pos) Solution: SELECT Results.CompID, Results.Pos FROM Results, Events WHERE Events.EventID=Results.EventID AND Events.Sport =“Swim”

Question 5 (continue) Q (c): How many fragments of Events are generation from applying the predicates P1: Sport = ‘Swim’ P2: Sport = ‘Track’ Solution: there are totally two fragments: one with Sport = ‘Swim’, SELECT * FROM Events WHERE Sport = ‘Swim’ the other with Sport = ‘Track’. WHERE Sport = ‘Track’

Question 5 (continue) Q (d): What is the relationship between Events and Results, that is which is owner and which is member? Derived Horizontal Fragmentation is defined on a member relation of a link according to a selection operation specified on its owner. Database Schema Events (EventID, EName, Venue, Indoors?, Sport) Results (EventID, CompID, Pos) Solution: Events is the owner and Results is the member because the Results depends on the Events. Weak-Entity Relationship

Question 5 (continue) Q (e): What is the join predicate of the relations Events and Results Database Events (EventID, EName, Venue, Indoors?, Sport) Results (EventID, CompID, Pos) Solution: Events.EventID=Results.EventID

Question 5 (continue) Q (f): Find the derived horizontal fragments of the Results relation on the basis of the predicates used in (c) (P1: Sport = ‘Swim’, P2: Sport = ‘Track’ )

Fragment 1 (Sport=“Swim”) Fragment 1 (Sport=“Track”) Question 5 (continue) Solution for Q (f):- Fragment 1 (Sport=“Swim”) EventID CompID Pos M100FS Thorpe Hoogenband Schoeman Iles 3 1 2 7 Results (member) EventID CompID Pos M100FS WSP Thorpe Hoogenband Schoeman Iles Cumba Ostapchuk Li 3 1 2 7 4 9 Events (owner) EventID Sport M100FS M100Sp W200B W1500 MSP WSP MMar Swim Track Fragment 1 (Sport=“Track”) EventID CompID Pos WSP Cumba Ostapchuk Li 1 4 9 Join Predicate: Events.EventID =Results.EventID

Q1- Primary horizontal fragmentation Primary Horizontal Fragmentation (of a relation) is performed using predicates that are defined on that relation For example of relation S EventID CompID Pos SwM100Free Thorpe 3 SwM100Free Hoogenband 1 Predicates: p1: Pos≤3 p2: Pos>3 SwM100Free Schoman 2 SwM100Free Iles 7 ShotputW Cumba 1 ShotputW Ostapchuk 4 ShotputW Li 9

Question 1a (Continue) Question (a): How do we check if Pr (Pr={P1, P2})is complete and minimal. Predicates: p1: Pos≤3 p2: Pos>3

Question 1a (Continue) SELECT * Primary Horizontal Fragmentation is defined by a selection operation on the owner relations of a databases schema. In this Question: there are a couple of selection operations :- Database Schema Relation Results (EventID, CompID, Pos) SELECT * FROM Results WHERE Pos≤3 WHERE Pos>3

Question 1a (Continue) COM_MIN Algorithm Given: a relation Results and a set of simple predicates Pr (p1: Pos≤3, p2: Pos>3 ) Output: a complete and minimal set of simple predicates Pr’ for Pr Rule 1: a relation or fragment is partitioned into at least two parts which are accessed differently by at least one application.

Question 1a (Continue) OR COM_MIN Algorithm EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 ShotputW Cumba COM_MIN Algorithm Given: relation S and Pr= {p1, p2} Output: Pr’ = Pr EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 Iles 7 ShotputW Cumba Ostapchuk 4 Li 9 p1: Pos≤3 OR EventID CompID Pos SwM100Free Iles 7 ShotputW Ostapchuk 4 Li 9 p2: Pos>3

Question 1b (Continue) Question (b), apply the PHORIZONTAL algorithm. Sometimes, the resultant set of predicates from COM_MIN algorithm could be large and redundant. Thus, how we can minimize the number of predicates, but maintain the completeness.

PHORIZONTAL Algorithm Question 1b (Continue) PHORIZONTAL Algorithm Makes use of COM_MIN to preform fragmentation. Input: a relation S and a set of simple predicates Pr Output: a set of minterm predicates M according to which relation S is to be fragmented

Question 1b (Continue) Workings Step 1: Pr’ ← COM_MIN(S, Pr) According to the Q2(a) result, Pr’ = Pr = {p1, p2}

Question 1b (Continue) Step 2: determine the set M of minterm predicates. Minterm - as a logical expression of n variables consisting of only the logical and () operator and complements. m1: (Pos≤3)  (Pos>3) = Ф m2: ¬ (Pos≤3)  (Pos>3) = (Pos>3) m3: (Pos≤3)  ¬ (Pos>3) = (Pos≤3) m4: ¬ (Pos≤3)  ¬ (Pos>3) = Ф

Step 3: determine the set I of implications among pi  Pr Question 1b (Continue) Step 3: determine the set I of implications among pi  Pr i1: (Pos≤3) → ¬ (Pos>3) i2: ¬ (Pos≤3) → (Pos>3) i3: (Pos>3) → ¬ (Pos≤3) i4: ¬ (Pos>3) → (Pos ≤ 3)

Final step: according to I, Question 1b (Continue) Final step: according to I, m1 ((Pos≤3)  (Pos>3) ) AND m4 (¬ (Pos≤3)  ¬ (Pos>3)) are contradictory and eliminated from M. Thus, we are left with M={m2, m3} and define two fragments FS={S1, S2) according to M.

Question 1b (Continue) OR m2: Pos>3 m3: Pos≤3 EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 ShotputW Cumba EventID CompID Pos SwM100Free Thorpe 3 Hoogenband 1 Schoman 2 Iles 7 ShotputW Cumba Ostapchuk 4 Li 9 m2: Pos>3 OR EventID CompID Pos SwM100Free Iles 7 ShotputW Ostapchuk 4 Li 9 m3: Pos≤3

Thank You