Spring 2003 ECE569 Lecture 06.1 ECE 569 Database System Engineering Spring 2003 Topic VIII: Query Execution and optimization Yanyong Zhang www.ece.rutgers.edu/~yyzhangwww.ece.rutgers.edu/~yyzhang.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Advertisements

Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Spring 2003 ECE569 Lecture ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Query Optimization Example Source: Query Optimization, Y. E. Ioannidis, ACM Computing Surveys, 28(1), March Database Tables: Emp (name, age, sal,
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
Spring 2004 ECE569 Lecture ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Bitmap Indexes.
Query Processing & Optimization
Chapter 19 Query Processing and Optimization
1 Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example:
CSCE Database Systems Chapter 15: Query Execution 1.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
1 B + -Trees: Search  If there are n search-key values in the file,  the path is no longer than  log  f/2  (n)  (worst case).
Query Processing CS 405G Introduction to Database Systems.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.
Query Processing – Implementing Set Operations and Joins Chap. 19.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
File Processing : Query Processing
Query processing and optimization
Lecture 2- Query Processing (continued)
Advance Database Systems
Overview of Query Evaluation
Implementation of Relational Operations
Lecture 13: Query Execution
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Spring 2003 ECE569 Lecture 06.1 ECE 569 Database System Engineering Spring 2003 Topic VIII: Query Execution and optimization Yanyong Zhang Course URL

Spring 2003 ECE569 Lecture 06.2 Select Operation  simple condition (C=A  V) Selectivity of condition C for relation R = |{t  R | C (t)}| / |R| -The number of records satisfying a condition divided by number of tuples -If there are i distinct values of V with uniform distribution, average selectivity is 1/i. l Linear Search -Retrieve every tuple in relation and test predicate -Cost = N_Pages l Equality predicate with index on A -Locate all the tuples with search key V using the index -Cost = ?? l Inequality with B + -tree on A -Locate first tuple with t with search ky V using index -Retrieve tuples in ascending order starting with t if  is . Otherwise retrieve tuples in desending order -Cost = ?? (selectivity)

Spring 2003 ECE569 Lecture 06.3 Select operation (cont’d)  Conjunctive conditions (C = C 1  C 2 ...  C k ) l Use one of the access methods above to retrieve tuples satisfying C i. For each tuple, test remaining conditions. -Choose C i with lowest selectivity to reduce cost If secondary indices containing tuple pointers exist for all or some of the attributes, retrieve all pointers that satisfy individual conditions. Intersect pointers and retrieve tuples.  Disjunctive Conditions (C 1  C 2 ...  C k ) l If there is an access path for every condition, then select records and perform union to eliminate duplicates.

Spring 2003 ECE569 Lecture 06.4 Join Operation  T = R >< S  Nested loop l Algorithm while records remain in R do fetch next record r from R while records remain in S do fetch next record s from S if (r[A] == s[B]) then insert t into T where t[R] = r and t[S] =s end l Analysis -r R # of records in R -b R # of blocks in R -Cost = r R (b s +1) A = B

Spring 2003 ECE569 Lecture 06.5 Join Operation (cont’d)  T = R >< S  Nested loop with multiple buffers l Use one buffer to sequence through blocks of S l Use n b -2 buffers for R while records remain in R do read in n b -2 buffers of tuples from R while records remain in S do fetch next record s from S for every record r of R in a buffer do if (r[A] == s[B]) then insert t into T end for end while -Every block of R is read only once -Every block in S is read  b R /(n b -2)  -Cost = b R +  b S /(n b -2)  -Outer loop should scan smaller relation A = B

Spring 2003 ECE569 Lecture 06.6 Join operation (cont’d)  Index method l Requires an index (or hash key) for one of the join attributes. (Assume there is an index on B of S) l Algorithm while records remain in R do fetch next tuple from R use index to retrieve all records with key r(B) in S for each record s retrieved do insert t into T end for end while l Analysis -x B average # of block accesses to retrieve a record using access path for attribute B -Cost = b R + r R x B + b T -These disk accesses may be slower than those from nested join.

Spring 2003 ECE569 Lecture 06.7 Join operation (cont’d)  Sort-merge join l Requires that records in R be ordered by their values in A and that S be ordered according to B. l Algorithm below assumes A and B are candidate keys. fetch next record r from R fetch next record s from S while (r  NULL and s  NULL) do if(r(A) > s(B)) then fetch next record s from S else if (r(A) < s(B)) then fetch next record r from R else /* r(A) == s(B) */ insert t into T fetch next record r from R fetch next record s from S end while Analysis -Records of each file are scanned only once -Cost = b R + b s + b T

Spring 2003 ECE569 Lecture 06.8 Projection operations  Projection -  p (R) l P includes a candidate key for R -No need to check for duplicates l Otherwise, one of following can be used to eliminate duplicates -If result is hashed, check for duplicates as tuples are inserted -Sort resulting relation and eliminate duplicates that are now adjacent.

Spring 2003 ECE569 Lecture 06.9 Set operations  R  S l Hash records of R and S to same file. Do not insert duplicates l Concatenate files, sort, and remove adjacent duplicates  R  S l Scan smaller file, attempt to locate each record in larger file. (If found, add tuple to result)  R – S l Copy records from R to result l Hash records of S to result. If tuple is found, delete it

Spring 2003 ECE569 Lecture Query Optimization rule of thumb  R1: selections and projections are processed on the fly and almost never generate intermediate relations. Selections are processed as relations are accessed for the first time. Projections are processed as the results of other operators are generated.  R2: Cross products are never formed, unless the query itself asks for them. Relations are always combined through joins in the query.  R3: The inner operand of each join is a database relation, never an intermediate result.

Spring 2003 ECE569 Lecture Heuristic Optimization of Query Trees  Consider following schema customers (cid, cname, ccity, discnt) products (pid, pname, pcity, pquantity, price) agents (aid, aname, acity, percent) orders (ordno, month, ocid, oaid, opid, quantity, oprice)  Query R:(select pid from products) except (select opid from customers, orders, agents whereccity = “Duluth”and acity = “New York”and cid = ocidand aid = oaid)

Spring 2003 ECE569 Lecture Heuristic (cont’d)  Translate the query into algebra R:  pid (products)-  opid (  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid ((customers X orders) X agents)  Query Tree -  pid  opid products  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid X X customersordersagents

Spring 2003 ECE569 Lecture Heuristic (cont’d) 1. Replace  F  F’ (R) with  F (  F’ (R)) wherever possible (allow flexibility in scheduling selects) - products  pid  opid  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid X X customersordersagents

Spring 2003 ECE569 Lecture Heuristic (cont’d) 2. Move select operations as close to leaves as possible - products  pid  opid  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid X X customers ordersagents

Spring 2003 ECE569 Lecture Heuristic (cont’d) 3. Rearrange tree so that most restrictive select executes first. Most restrictive select produces smallest result, or is one with smallest selectivity. Assume most restrictive select is probably  acity=“New York - products  pid  opid  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid X X customers orders agents

Spring 2003 ECE569 Lecture Heuristic (cont’d) 4. Replace cartesian product and adjacent select with join - products  pid  opid  city=“Duluth”  acity=“New York” cid = ocid aid = oaid customers orders agents ><

Spring 2003 ECE569 Lecture Heuristic (cont’d) 5. Project out unnecessary attributes as soon as possible. - products  pid  opid  city=“Duluth”  acity=“New York” cid = ocid aid = oaid customers orders agents ><  aid  oaid,ocid,opid  cid

Spring 2003 ECE569 Lecture Heuristic (cont’d) 6.Map subtrees to execution methods such as: l A single selection or projection l A selection followed by a projection l A join, union, or set difference with two operands. Each input can be preceded by selections and/or projections. The output can also be followed by a selection and/or projection.

Spring 2003 ECE569 Lecture Heuristic (cont’d) - products  pid  opid  city=“Duluth”  acity=“New York” cid = ocid aid = oaid customers orders agents ><  aid  oaid,ocid,opid  cid

Spring 2003 ECE569 Lecture Example  Characteristics of DBMS l Available join methods – (1) nested loop; (2) sort-merge join  Query SELECTemp.Name, dept.name, acnt.type FROMemp, dept, acnt WHEREemp.dno = dept.dno AND dept.ano = acnt.ano AND emp.age  50 AND acnt.balance  10000

Spring 2003 ECE569 Lecture Example (cont’d)   name,dname,type (  emp.dno=dept.dno  dept.ano=acnt.ano  emp.age  50  acnt.balance  ((emp x dept) x acnt))  Now decide the order of join l acnt is the third relation -emp > < emp l emp is the third relation -dept > < dept  emp.age >= 50 emp ><  acnt.balance >= acnt dept

Spring 2003 ECE569 Lecture Relations  emp(name, age, sal, dno ) l Pages – 20, 000 l Number of tuples – 100,000 l Indexes – dense clustered B + -tree on sal (3-level deep)  dept(dno, dname, floor, budget, mgr, ano) l Pages – 10 l Number of tuples – 100 l Indexes – dense clustered hash table on dno (avg bucket length = 1.2 pages)  acnt (ano, type, balance, bno) l Pages – 100 l Number of tuples – 1000 l Indexes – dense clustered B + -tree on balance (3-level deep)  bank (bno, bname, address) l Pages – 20 l Number of tuples – 200 l Indexes – none

Spring 2003 ECE569 Lecture Histograms  emp.page (assume uniform distribution) rangefrequency 0< x  < x  < x  < x  < x  < x  < x   Acnt.balance rangefrequency 0< x  / 100 = 2 100< x  / 400 = < x  / 4500 = < x  / 5000 = < x  / = < x <  0

Spring 2003 ECE569 Lecture MethodCostOrderResult SizeComment SCAN20,000none12,000 tuples 2,400 pages Size reduced by selectivity of age <= 50 Plans for accessing relation  Plans for retrieving tuples from emp, dept and acnt. EMP DEPT MethodCostOrderResult SizeComment SCAN10none100 tuples 10 pages

Spring 2003 ECE569 Lecture Plans for accessing relation (cont’d) MethodCostOrderResult SizeComment SCAN100none200 tuples 20 pages Size corrected for selectivity of balance >= B+-tree on balance 3+20 =23none200 tuples 20 pages ACNT

Spring 2003 ECE569 Lecture Plans for joining two relations MethodCostOrderResult Size Comment Nested Loop (page oriented) *10 = none12000 tuples 2400 pages Assume that tuple size is same as EMP tuples. Nested Loop using hash table on dno *( ) = none12000 tuples 2400 pages EMP >< DEPT

Spring 2003 ECE569 Lecture Plans for joining two relations (cont’d) DEPT >< EMP MethodCostOrderResult Size Comment Nested Loop (page oriented) * = none12000 tuples 2400 pages

Spring 2003 ECE569 Lecture MethodCostOrderResult Size Comment Nested Loop (page oriented) * 100 = none200 tuples 20 pages Plans for joining two relations (cont’d) DEPT >< ACNO ACNO >< DEPT MethodCostOrderResult Size Comment Nested Loop (page oriented) * 10 = 223 none200 tuples 20 pages

Spring 2003 ECE569 Lecture Plans for joining the third relation to the other two  Think on your own …