Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP 451/651 Optimizing Performance

Similar presentations


Presentation on theme: "COMP 451/651 Optimizing Performance"— Presentation transcript:

1 COMP 451/651 Optimizing Performance
Query Execution Optimizing Performance Chapter 1 1

2 COMP 451/651 Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional processing to supply all of the missing details. In practice, an SQL query is translated internally into a relational algebra expression. One advantage of using relational algebra is that it makes alternative forms of a query easier to explore. The different algebraic expressions for a query are called logical query plans. We will focus first on the methods for executing the operations of the relational algebra. Chapter 1

3 Preview Parsing: read SQL, output relational algebra tree
COMP 451/651 Preview Parsing: read SQL, output relational algebra tree Query rewrite: Transform tree to a form, which is more efficient to evaluate Physical plan generation: select implementation for each operator in tree, and for passing results up the tree. In this chapter we will focus on the implementation for each operator. Chapter 1

4 σyear=1996 AND gender = “F” AND starName = name
COMP 451/651 A graphical picture A query plan is a decomposition of the original SQL into an expression tree. At the right, we have a typical relational query. This is a starting point for a query plan. Is it the best way? πtitle, birthdate σyear=1996 AND gender = “F” AND starName = name × SELECT title, birthdate FROM MoviewStar, StarsIN WHERE year=1996 AND gender=“F” AND starName=name MovieStar StarsIn 4 Chapter 1

5 COMP 451/651 A graphical picture A better plan would focus on reducing the cost of the expensive cross product operation So, a much better solution would use a true join. But we would want the join to be as small as possible We could make the joined relations smaller by doing the Select before the join, rather than after it. πtitle, birthdate starName = name MovieStar StarsIn σgender = “F” σyear=1996 5 Chapter 1

6 Relational Algebra recap
COMP 451/651 COMP 451/651 Relational Algebra recap RA union, intersection and difference correspond to UNION, INTERSECT, and EXCEPT in SQL Selection  corresponds to the WHERE-clause in SQL Projection  corresponds to SELECT-clause Product  corresponds to FROM-clause Join’s corresponds to JOIN, NATURAL JOIN, and OUTER JOIN in the SQL2 standard Duplicate elimination  corresponds to DISTINCT in SELECT-clause Grouping  corresponds to GROUP BY Sorting  corresponds to ORDER BY Chapter 1 Chapter 1 6

7 Relational algebra for real SQL
COMP 451/651 COMP 451/651 Relational algebra for real SQL Basic SELECT-FROM-WHERE queries correspond to (( ..  ..  ..)) in relational algebra However, as we discuss query plans, keep in mind the following fact: A relation in algebra is a set, while a relation in SQL is probably a bag In short, a bag allows duplicates. Not surprisingly, this can effect the cost of related operations. Chapter 1 Chapter 1 7

8 Bag union, intersection, and difference
COMP 451/651 Bag union, intersection, and difference Card(t,R) means the number of occurrences of tuple t in relation R Card(t, RS) = Card(t,R) + Card(t,S) Card(t,RS) = min{Card(t,R), Card(t,S)} Card(t,R–S) = max{Card(t,R)–Card(t,S), 0} Example: R= {A,B,B}, S = {C,A,B,C} R  S = {A,A,B,B,B,C,C} R  S = {A,B} R – S = {B} Chapter 1

9 COMP 451/651 Physical operators Physical query plans are built from physical operators. In most cases, the physical operators are direct implementations of the relational algebra operators. However, there are several other physical operators for various supporting tasks. Table-scan (the most basic operation we want to perform in a physical query plan) Index-scan (E.g. if we have an index on some relation R we can retrieve the blocks of R by using the index) Sort-scan (takes a relation R and a specification of the attributes on which the sort is to be made, and produces R in sorted order) Chapter 1

10 COMP 451/651 Computational Model When comparing algorithms for the same operations we will make the following assumption: We assume that the arguments of any operator are found on disk, but the result of the operator is left in main memory. This is because the cost of writing the output on the disk depends on the size of the result, not on the way the result was computed. In other words, it is the same for any computational alternative. Also, we can often pipeline the result to other operators when the result is constructed in main memory. So a final output phase may not even be required. Chapter 1

11 Cost parameters Recall that final output is not counted
COMP 451/651 COMP 451/651 Cost parameters In order to evaluate or estimate the cost of query resolution, we need to identify the relevant query parameters. Typical cost parameters include: R = the relation on disk M = number of main memory buffers available (1buffer = 1block) B(R) = number of blocks of R T(R) = number of tuples of R V(R, a) = number of distinct values in column a of R V(R, L) = number of different tuples in R (where L is a list of attributes or columns) Simple cost estimate: Basic scan: B(R) TPMMS: 3B(R) Recall that final output is not counted Chapter 1 Chapter 1 11

12 Iterators for Implementation of Physical Operators
COMP 451/651 Iterators for Implementation of Physical Operators This is a group of three functions that allow a consumer of the result of a physical operation to get the result one tuple at a time. An iterator consists of three parts: Open: Initializes data structures. Doesn’t return tuples GetNext: Returns next tuple & adjusts the data structures Close: Cleans up afterwards We assume these to be overloaded names of methods. Chapter 1

13 Iterator for table­scan operator
COMP 451/651 Iterator for table­scan operator Open(R) { b := the first block of R; t := the first first tuple of block b; Found := TRUE; } GetNext(R) { IF (t is past the last tuple on block b) { increment b to the next block; IF (there is no next block) { Found := FALSE; RETURN; ELSE /*b is a new block*/ t := first tuple on block b; oldt := t; /*Now we are ready to return t and increment*/ increment t to the next tuple of b; RETURN oldt; Close(R) {} Chapter 1

14 Iterator for Bag Union of R and S
COMP 451/651 Iterator for Bag Union of R and S Open(R,S) { R.open(); CurRel := R; } GetNext(R,S) { IF (CurRel = R) { t := R.GetNext(); IF(Found) /*R is not exhausted*/ RETURN t; ELSE /*R is exhausted*/ { S.Open(); CurRel := S; /*Here we read from S*/ RETURN S.GetNext(); /*If s is exhausted Found will be set to FALSE by S.GetNext */ Close(R,S) { R.Close(); S.Close() } Chapter 1

15 Iterator for sort-scan
COMP 451/651 Iterator for sort-scan In an iterator for sort-scan Open has to do all of 2PMMS, except the merging GetNext outputs the next tuple from the merging phase Chapter 1

16 Algorithms for implementing RA-operators
COMP 451/651 Algorithms for implementing RA-operators Classification of algorithms Sorting based methods Hash based methods Index based methods Degree of difficultness of algorithms One pass (when one relation can fit into main memory) Two pass (when no relation can fit in main memory, but again the relations are not very extremely large) Multi pass (when the relations are very extremely large) Classification of operators Tuple-at-a-time, unary operations(s, p) Full-relation, unary operations (d, g) Full-relation, binary operations (union, join,…) Chapter 1

17 One pass, tuple-at-a-time
COMP 451/651 One pass, tuple-at-a-time Selection and projection Cost = B(R) or T(R) (if the relation is not clustered) Space requirement: M  1 block Principle: Read one block (or one tuple if the relation is not clustered) at a time Filter in or out the tuples of this block. Chapter 1

18 One pass, unary full-relation operations
COMP 451/651 One pass, unary full-relation operations Duplicate elimination: for each tuple decide: seen before: ignore new: output Principle: It is the first time we have seen this tuple, in which case we copy it to the output. We have seen the tuple before,in which case we must not output this tuple. We need a Main Memory hash-table to be efficient. Requirement:  Chapter 1

19 One pass, unary full-relation operations
COMP 451/651 One pass, unary full-relation operations Grouping: Accumulate the information on groups in main memory. Need to keep one entry for each value of the grouping attributes, through a main memory search structure (hash table). Then, we need for each group to keep an aggregated value (or values if the query asks for more than one aggregation). For MIN/MAX we keep the min or max value seen so far for the group. For COUNT aggregation we keep a counter which is incremented each time we encounter a tuple belonging to the group. For SUM, we add the value if the tuple belongs to the group. For AVG? MM requirement. Typically, a (group) tuple will be smaller than a tuple of the input relation, Typically, the group number will be smaller than the number of tuples in the input relation. This is their number: How you would do an iterator for grouping? Chapter 1

20 One pass, binary operators
COMP 451/651 One pass, binary operators Requirement: min(B(R),B(S)) ≤ M Exception: bag union Cost: B(R) + B(S) Assume R is larger than S. How to perform the operations below: Set union, set intersection, set difference Bag intersection, bag difference Cartesian product, natural join All these operators require reading the smaller of the relations into main memory using there a search scheme (e.g. main memory hash table) for easy search and insertion. Chapter 1

21 Set Union Let R and S be sets.
COMP 451/651 Set Union Let R and S be sets. We read S into M-1 buffers of main memory. All these tuples are also copied to the output. We then read each block of R into the Mth buffer, one at a time. For each tuple t of R we see if t is in S, and if not, we copy t to output. Chapter 1

22 Set Intersection Let R and S be sets or bags. The result will be set.
COMP 451/651 Set Intersection Let R and S be sets or bags. The result will be set. We read S into M-1 buffers of main memory. We then read each block of R into the M-th buffer, one at a time. For each tuple t of R we see if t is in S, and if so, we copy t to output. At the same time we delete t from S in Main Memory. Chapter 1

23 Set Difference Let R and S be sets.
COMP 451/651 Set Difference Let R and S be sets. Since difference is not a commutative operator, we must distinguish between R-S and S-R assuming that S is the smaller relation. Read S into M-1 buffers of main memory. Then read each block of R into the Mth buffer, one at a time. To compute R-S: for each tuple t of R we see if t is not in S, and if so, we copy t to output. To compute S-R: for each tuple t of R we see if t is is in S, we delete t from S in such a case. At the end we output those tuples of S that remain. Chapter 1

24 Bag Intersection Let R and S be bags.
COMP 451/651 Bag Intersection Let R and S be bags. Read S into M-1 buffers of main memory. Also, associate with each tuple a count, which initially measures the number of times the tuple occurs in S. Then read each block of R into the M-th buffer, one at a time. For each tuple t of R we see if t is in S. If not we ignore it. Otherwise, we check to see if it appears in S, and if the counter is more than zero we output t and decrement the counter. Chapter 1

25 Bag Difference We read S into M-1 buffers of main memory.
COMP 451/651 Bag Difference We read S into M-1 buffers of main memory. Also, we associate with each tuple a count, which initially measures the number of times the tuple occur in S. We then read each block of R into the M-th buffer, one at a time. To compute S-R: for each tuple t of R we see if t is is in S, we decrement its counter. At the end we output those tuples of S that remain with counter positive. To compute R-S: we may think of the counter c for tuple t as having c reasons to not output t. Now, when we process a tuple of R we check to see if that tuple appears in S. If not we output t. Otherwise, we check to see the counter c of t. If it is 0 we output t. If not, we don’t output t, and we decrement c. Chapter 1

26 COMP 451/651 Product We read S into M-1 buffers of main memory. No special structure is needed. We then read each block of R into the M-th buffer, one at a time. And combine each tuple with all the tuples of S. Chapter 1

27 COMP 451/651 Natural Join We read S into M-1 buffers of main memory and build a search structure where the search key is the shared attributes of R and S. We then read each block of R into the M-th buffer, one at a time. For each tuple t of R we see if t is in S, and if so, we copy t to output. Chapter 1


Download ppt "COMP 451/651 Optimizing Performance"

Similar presentations


Ads by Google