Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2003 ECE569 Lecture 06.1 ECE 569 Database System Engineering Spring 2003 Topic VIII: Query Execution and optimization Yanyong Zhang www.ece.rutgers.edu/~yyzhangwww.ece.rutgers.edu/~yyzhang.

Similar presentations


Presentation on theme: "Spring 2003 ECE569 Lecture 06.1 ECE 569 Database System Engineering Spring 2003 Topic VIII: Query Execution and optimization Yanyong Zhang www.ece.rutgers.edu/~yyzhangwww.ece.rutgers.edu/~yyzhang."— Presentation transcript:

1 Spring 2003 ECE569 Lecture 06.1 ECE 569 Database System Engineering Spring 2003 Topic VIII: Query Execution and optimization Yanyong Zhang www.ece.rutgers.edu/~yyzhangwww.ece.rutgers.edu/~yyzhang Course URL www.ece.rutgers.edu/~yyzhang/spring03www.ece.rutgers.edu/~yyzhang/spring03

2 Spring 2003 ECE569 Lecture 06.2 Select Operation  simple condition (C=A  V) Selectivity of condition C for relation R = |{t  R | C (t)}| / |R| -The number of records satisfying a condition divided by number of tuples -If there are i distinct values of V with uniform distribution, average selectivity is 1/i. l Linear Search -Retrieve every tuple in relation and test predicate -Cost = N_Pages l Equality predicate with index on A -Locate all the tuples with search key V using the index -Cost = ?? l Inequality with B + -tree on A -Locate first tuple with t with search ky V using index -Retrieve tuples in ascending order starting with t if  is . Otherwise retrieve tuples in desending order -Cost = ?? (selectivity)

3 Spring 2003 ECE569 Lecture 06.3 Select operation (cont’d)  Conjunctive conditions (C = C 1  C 2 ...  C k ) l Use one of the access methods above to retrieve tuples satisfying C i. For each tuple, test remaining conditions. -Choose C i with lowest selectivity to reduce cost If secondary indices containing tuple pointers exist for all or some of the attributes, retrieve all pointers that satisfy individual conditions. Intersect pointers and retrieve tuples.  Disjunctive Conditions (C 1  C 2 ...  C k ) l If there is an access path for every condition, then select records and perform union to eliminate duplicates.

4 Spring 2003 ECE569 Lecture 06.4 Join Operation  T = R >< S  Nested loop l Algorithm while records remain in R do fetch next record r from R while records remain in S do fetch next record s from S if (r[A] == s[B]) then insert t into T where t[R] = r and t[S] =s end l Analysis -r R # of records in R -b R # of blocks in R -Cost = r R (b s +1) A = B

5 Spring 2003 ECE569 Lecture 06.5 Join Operation (cont’d)  T = R >< S  Nested loop with multiple buffers l Use one buffer to sequence through blocks of S l Use n b -2 buffers for R while records remain in R do read in n b -2 buffers of tuples from R while records remain in S do fetch next record s from S for every record r of R in a buffer do if (r[A] == s[B]) then insert t into T end for end while -Every block of R is read only once -Every block in S is read  b R /(n b -2)  -Cost = b R +  b S /(n b -2)  -Outer loop should scan smaller relation A = B

6 Spring 2003 ECE569 Lecture 06.6 Join operation (cont’d)  Index method l Requires an index (or hash key) for one of the join attributes. (Assume there is an index on B of S) l Algorithm while records remain in R do fetch next tuple from R use index to retrieve all records with key r(B) in S for each record s retrieved do insert t into T end for end while l Analysis -x B average # of block accesses to retrieve a record using access path for attribute B -Cost = b R + r R x B + b T -These disk accesses may be slower than those from nested join.

7 Spring 2003 ECE569 Lecture 06.7 Join operation (cont’d)  Sort-merge join l Requires that records in R be ordered by their values in A and that S be ordered according to B. l Algorithm below assumes A and B are candidate keys. fetch next record r from R fetch next record s from S while (r  NULL and s  NULL) do if(r(A) > s(B)) then fetch next record s from S else if (r(A) < s(B)) then fetch next record r from R else /* r(A) == s(B) */ insert t into T fetch next record r from R fetch next record s from S end while Analysis -Records of each file are scanned only once -Cost = b R + b s + b T

8 Spring 2003 ECE569 Lecture 06.8 Projection operations  Projection -  p (R) l P includes a candidate key for R -No need to check for duplicates l Otherwise, one of following can be used to eliminate duplicates -If result is hashed, check for duplicates as tuples are inserted -Sort resulting relation and eliminate duplicates that are now adjacent.

9 Spring 2003 ECE569 Lecture 06.9 Set operations  R  S l Hash records of R and S to same file. Do not insert duplicates l Concatenate files, sort, and remove adjacent duplicates  R  S l Scan smaller file, attempt to locate each record in larger file. (If found, add tuple to result)  R – S l Copy records from R to result l Hash records of S to result. If tuple is found, delete it

10 Spring 2003 ECE569 Lecture 06.10 Query Optimization rule of thumb  R1: selections and projections are processed on the fly and almost never generate intermediate relations. Selections are processed as relations are accessed for the first time. Projections are processed as the results of other operators are generated.  R2: Cross products are never formed, unless the query itself asks for them. Relations are always combined through joins in the query.  R3: The inner operand of each join is a database relation, never an intermediate result.

11 Spring 2003 ECE569 Lecture 06.11 Heuristic Optimization of Query Trees  Consider following schema customers (cid, cname, ccity, discnt) products (pid, pname, pcity, pquantity, price) agents (aid, aname, acity, percent) orders (ordno, month, ocid, oaid, opid, quantity, oprice)  Query R:(select pid from products) except (select opid from customers, orders, agents whereccity = “Duluth”and acity = “New York”and cid = ocidand aid = oaid)

12 Spring 2003 ECE569 Lecture 06.12 Heuristic (cont’d)  Translate the query into algebra R:  pid (products)-  opid (  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid ((customers X orders) X agents)  Query Tree -  pid  opid products  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid X X customersordersagents

13 Spring 2003 ECE569 Lecture 06.13 Heuristic (cont’d) 1. Replace  F  F’ (R) with  F (  F’ (R)) wherever possible (allow flexibility in scheduling selects) - products  pid  opid  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid X X customersordersagents

14 Spring 2003 ECE569 Lecture 06.14 Heuristic (cont’d) 2. Move select operations as close to leaves as possible - products  pid  opid  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid X X customers ordersagents

15 Spring 2003 ECE569 Lecture 06.15 Heuristic (cont’d) 3. Rearrange tree so that most restrictive select executes first. Most restrictive select produces smallest result, or is one with smallest selectivity. Assume most restrictive select is probably  acity=“New York - products  pid  opid  city=“Duluth”  acity=“New York”  cid = ocid  aid = oaid X X customers orders agents

16 Spring 2003 ECE569 Lecture 06.16 Heuristic (cont’d) 4. Replace cartesian product and adjacent select with join - products  pid  opid  city=“Duluth”  acity=“New York” cid = ocid aid = oaid customers orders agents ><

17 Spring 2003 ECE569 Lecture 06.17 Heuristic (cont’d) 5. Project out unnecessary attributes as soon as possible. - products  pid  opid  city=“Duluth”  acity=“New York” cid = ocid aid = oaid customers orders agents ><  aid  oaid,ocid,opid  cid

18 Spring 2003 ECE569 Lecture 06.18 Heuristic (cont’d) 6.Map subtrees to execution methods such as: l A single selection or projection l A selection followed by a projection l A join, union, or set difference with two operands. Each input can be preceded by selections and/or projections. The output can also be followed by a selection and/or projection.

19 Spring 2003 ECE569 Lecture 06.19 Heuristic (cont’d) - products  pid  opid  city=“Duluth”  acity=“New York” cid = ocid aid = oaid customers orders agents ><  aid  oaid,ocid,opid  cid

20 Spring 2003 ECE569 Lecture 06.20 Example  Characteristics of DBMS l Available join methods – (1) nested loop; (2) sort-merge join  Query SELECTemp.Name, dept.name, acnt.type FROMemp, dept, acnt WHEREemp.dno = dept.dno AND dept.ano = acnt.ano AND emp.age  50 AND acnt.balance  10000

21 Spring 2003 ECE569 Lecture 06.21 Example (cont’d)   name,dname,type (  emp.dno=dept.dno  dept.ano=acnt.ano  emp.age  50  acnt.balance  10000 ((emp x dept) x acnt))  Now decide the order of join l acnt is the third relation -emp > < emp l emp is the third relation -dept > < dept  emp.age >= 50 emp ><  acnt.balance >= 10000 acnt dept

22 Spring 2003 ECE569 Lecture 06.22 Relations  emp(name, age, sal, dno ) l Pages – 20, 000 l Number of tuples – 100,000 l Indexes – dense clustered B + -tree on sal (3-level deep)  dept(dno, dname, floor, budget, mgr, ano) l Pages – 10 l Number of tuples – 100 l Indexes – dense clustered hash table on dno (avg bucket length = 1.2 pages)  acnt (ano, type, balance, bno) l Pages – 100 l Number of tuples – 1000 l Indexes – dense clustered B + -tree on balance (3-level deep)  bank (bno, bname, address) l Pages – 20 l Number of tuples – 200 l Indexes – none

23 Spring 2003 ECE569 Lecture 06.23 Histograms  emp.page (assume uniform distribution) rangefrequency 0< x  100 10< x  20500 20< x  302500 30< x  404000 40< x  502000 50< x  60800 60< x  70200  Acnt.balance rangefrequency 0< x  100200 / 100 = 2 100< x  500200 / 400 = 0.5 500< x  5000200 / 4500 = 0.044 5000< x  10000200 / 5000 = 0.04 10000< x  50000200 / 40000 = 0.005 50000< x <  0

24 Spring 2003 ECE569 Lecture 06.24 MethodCostOrderResult SizeComment SCAN20,000none12,000 tuples 2,400 pages Size reduced by selectivity of age <= 50 Plans for accessing relation  Plans for retrieving tuples from emp, dept and acnt. EMP DEPT MethodCostOrderResult SizeComment SCAN10none100 tuples 10 pages

25 Spring 2003 ECE569 Lecture 06.25 Plans for accessing relation (cont’d) MethodCostOrderResult SizeComment SCAN100none200 tuples 20 pages Size corrected for selectivity of balance >= 10000 B+-tree on balance 3+20 =23none200 tuples 20 pages ACNT

26 Spring 2003 ECE569 Lecture 06.26 Plans for joining two relations MethodCostOrderResult Size Comment Nested Loop (page oriented) 20000 + 2400*10 = 44000 none12000 tuples 2400 pages Assume that tuple size is same as EMP tuples. Nested Loop using hash table on dno 20000 + 12000*(1 + 1.2 + 1) = 58400 none12000 tuples 2400 pages EMP >< DEPT

27 Spring 2003 ECE569 Lecture 06.27 Plans for joining two relations (cont’d) DEPT >< EMP MethodCostOrderResult Size Comment Nested Loop (page oriented) 10 + 10 * 20000 = 200010 none12000 tuples 2400 pages

28 Spring 2003 ECE569 Lecture 06.28 MethodCostOrderResult Size Comment Nested Loop (page oriented) 10 + 10 * 100 = 10010 none200 tuples 20 pages Plans for joining two relations (cont’d) DEPT >< ACNO ACNO >< DEPT MethodCostOrderResult Size Comment Nested Loop (page oriented) 23 + 20 * 10 = 223 none200 tuples 20 pages

29 Spring 2003 ECE569 Lecture 06.29 Plans for joining the third relation to the other two  Think on your own …


Download ppt "Spring 2003 ECE569 Lecture 06.1 ECE 569 Database System Engineering Spring 2003 Topic VIII: Query Execution and optimization Yanyong Zhang www.ece.rutgers.edu/~yyzhangwww.ece.rutgers.edu/~yyzhang."

Similar presentations


Ads by Google