Distributed Database Management Systems

Slides:



Advertisements
Similar presentations
Distributed Database Systems
Advertisements

Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed Query Processing –An Overview
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Distributed Database Systems Dr. Mohamed Osman Hegazi.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Distributed DBMSPage 4. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background  Distributed DBMS Architecture  Datalogical Architecture.
Institut für Scientific Computing – Universität WienP.Brezany Optimization of Distributed Queries Univ.-Prof. Dr. Peter Brezany Institut für Scientific.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.
Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.
Query Processing & Optimization
L Distributed Query Optimization Algorithms -- 1 Distributed Query Optimization Algorithms v System R and R* v Hill Climbing and SDD-1.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Query Processing Presented by Aung S. Win.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
low level data manipulation
DISTRIBUTED DATABASE DESIGN
1 6. Distributed Query Optimization Chapter 9 Optimization of Distributed Queries.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Οι διαφάνειες καλύπτουν μέρος των Κεφαλαίων 7&8: Distributed Database QueryProcessing and Optimization.
Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.
Overview of Query Processing
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Query Processor  A query processor is a module in the DBMS that performs the tasks to process, to optimize, and to generate execution strategy for a high-level.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.8/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
DDBMS Distributed Database Management Systems Fragmentation
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Relational Algebra p BIT DBMS II.
Chapter 17: Additional Slides February 6, Outline Physical Data Management  Fragments  Distributed Query Processing  Transactions Logical Data.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
L4: Query Optimization (1) - 1 L4: Query Processing and Optimization v 4.1 Query Processing  Query Decomposition  Data Localization v 4.1 Query Optimization.
1 Minggu 6, Pertemuan 12 Query Processing Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
CS742 – Distributed & Parallel DBMSPage 3. 1M. Tamer Özsu Outline Introduction & architectural issues Data distribution  Distributed query processing.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Distributed Databases and Client-Server Architectures
Distributed Database Concepts
COMP3017 Advanced Databases
Database System Implementation CSE 507
Introduction to the database systems (1)
UNIT 11 Query Optimization
Query Optimization Kush Kashyap B.Tech -IT.
DISTRIBUTED DATABASE ARCHITECTURE
Prepared by : Ankit Patel (226)
ER Modeling Exercise Consider a set of courses, both at grad and undergrad level. Each course has at least one section. Each section is taught by only.
Outline Introduction Background Distributed DBMS Architecture
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Outline Introduction Background Distributed DBMS Architecture
Distributed Database Management Systems
Distributed Database Management Systems
Lecture 33: The Relational Model 2
Database Architecture
Advance Database Systems
Query Optimization.
Query Processing.
Course Instructor: Supriya Gupta Asstt. Prof
Distributed Database Management Systems
Presentation transcript:

Distributed Database Management Systems Lecture 30

In the previous lecture Locking based CC Timestamp ordering based CC Concluded TM.

In this Lecture Basic Concepts of Query Optimization QP in centralized and Distributed DBs.

Introduction SQL one of the success factors of RDBMS Query processor transforms complex queries into concise and simple ones

Query processing is critical performance issue QP a complex problem specially in DDBS environment

Main function of QP is to transform an SQL query into equivalent relational algebra one (low level language) Transformation must achieve correctness and efficiency

Correctness is straightforward since rules exist An SQL query can have many equivalents in R Algebra

Considering the tables EMP(eNo, eName, title) ASG(eNo, pNo, resp, dur) PROJ(pNo, pName, budget, loc) Query: Get the names of employees who are managing a project

SELECT eName FROM EMP, ASG WHERE EMP.eNo = ASG.eNo AND resp = ‘Manager’

eName(resp=‘Manager’ ^ EMP.eNo = ASG.eNo) (EMPxASG) eName(EMP ⋈ (resp=‘Manager’ (ASG))) Obviously second one needs less computing resources since avoids Cartesian product

Centralized QP is to choose best query execution plan Distributed is more complex; it also involves the selection of site to execute query

Same query in DDBS Suppose EMP and ASG are HF as EMP1 = eNo ≤ ‘E3’ (EMP) EMP2 = eNo > ‘E3’ (EMP) ASG1 = eNo ≤ ‘E3’ (ASG) ASG2 = eNo > ‘E3’ (ASG)

Further suppose these fragments are stored at site 1, 2, 3 and 4 and result at site 5

Site 5 Site 4 Site 3 Site 2 Site 1 EMP1’ EMP2’ ASG1’ ASG2’ ASC1’=resp = ‘Manager(ASG1) EMP1’=EMP1 ⋈(ASG1’) Site 1 Site 3 ASC2’=resp = ‘Manager(ASG2) EMP2’=EMP2 ⋈(ASG2’) Site 2 Site 4 ASG1’ ASG2’ result = EMP1’ U EMP2’ Site 5 EMP1’ EMP2’

 resp = ‘Manager’ (ASG1 U ASG2) result = (EMP1 U EMP2) ⋈ eNo  resp = ‘Manager’ (ASG1 U ASG2) Site 1 Site 2 Site 3 Site 4 ASG1 ASG2 EMP1 EMP2

Lets Assume size(EMP) size(ASG) 400 1000 tuple access cost tuple transfer cost 1 unit 10 units There are 20 Managers Data distributed evenly at all sites

Strategy 1 produce ASG': 20*1 20 transfer ASG' to the sites of E: 20 * 10 200 produce EMP': (10+10) *1*2 40 transfer EMP' to result site: 20*10 Total 460

Strategy 2 Transfer EMP to site 5: 400 * 10 4000 Transfer ASG to the site 5 1000 * 10 10000 Produce ASG‘ by selecting ASG 1000 Join EMP and ASG’ 8000 Total 23000

Query Optimization An important aspect of QP Minimize resource consumption I/O cost + CPU cost + communication cost First two in Centralized DB

Communication Cost will dominate in WAN Not that dominant in LANs, so total cost should be considered in LANs QO can also maximize throughput

Operators’ Complexity Select, Project (without duplicate elimination) O(n) Project (with duplicate elimination), Group O(nlogn) Join, Semi-Join, Division, Set Operators O(nlog n) Cartesian Product O(n2)

Characterization of Query Processors

Types of Optimization Exhaustive search for the cost of each strategy to find the most optimal one May be very costly in case of multiple options and more fragments Heuristics

Optimization Timing Static: during compilation Size of intermediate tables not known always Cost justified with repeated execution Dynamic: during execution Intermediate tables’ size known Re-optimzation may be required

Statistics Relation/Fragment: Cardinality, size of a tuple, fraction of tuples participating in a join with another relation Attribute: cardinality of domain, actual number of distinct values

Decision Sites Centralized: simple, need knowledge about the entire distributed database Distributed: cooperation among sites to determine the schedule, need only local information Hybrid: one site determines the global schedule, each site optimizes the local subqueries

Other factors like: Network topology Replicated fragments Use of semijoins.

Optimized Local Query SQL Query on Distributed Relations QUERY GLOBAL DECOMPOSITION GLOBAL SCHEMA Algebraic Query on Distributed Relations DATA LOCALIZATION FRAGMENT Fragment Query OPTIMIZATION STAT OF FRAGMENTS Optimized Fragment Query with Communication Operations LOCAL Optimized Local Query