Download presentation
Presentation is loading. Please wait.
Published byDella Hopkins Modified over 9 years ago
1
Copyright © Curt Hill 2003-2008 Query Evaluation Translating a query into action
2
Copyright © Curt Hill 2003-2008 Why? Relational algebra is the means for accessing the tables –Relational calculus must be translated into algebra before it is executed Several relational operators may be implemented in various ways None of these is always best
3
Copyright © Curt Hill 2003-2008 What factors are involved? The organization of the file –Existing sort order The number and types of indexes The size of the tables –Both the cardinality and the number of pages Memory space available in buffer pool The source of much of this information is the system catalogsystem catalog
4
Copyright © Curt Hill 2003-2008 What is really needed? The Join is the most expensive common operator –At its heart is a cartesian product –This needs M by N accesses This is where most but not all optimizations occur Most of what will be discussed applies more to a join evaluation than most other operators
5
Copyright © Curt Hill 2003-2008 The M by N problem Want Cartesian Product on two tables, A and B –A contains M pages and B N Read each page in A Read N pages of B to connect M N accesses We desperately want to avoid this 1 2 1 3 M...... 2 N...... 3 A B
6
Copyright © Curt Hill 2003-2008 File Techniques Indexing –Use the index to only examine the involved tuples –If the key is the only field of interest the index may suffice without accessing the data Iteration –Examine all the tuples sequentially Partitioning –Group the tuples by sort key to avoid the M by N problems –Sorting and hashing do this
7
Copyright © Curt Hill 2003-2008 Access Paths A way of retrieving one or more tuples from a relation Typically two ways –Scan entire file –Use an index to obtain record directly Every relational operator uses one or two tables so this is important
8
Copyright © Curt Hill 2003-2008 Conjunctive Normal Form A series of comparisons with ANDs connecting them All comparisons are of the form: Attr Op Value –Attr is field name –Op is a comparison such as =, >, etc –Value is a constant –Comparison is called a conjunct CNF has nothing to do with 1NF, 2NF, 3NF, BCNF among others
9
Copyright © Curt Hill 2003-2008 CNF Examples Courses table –Number = 160 AND Dept = ‘CS’ Grades table –Dept = ‘CS’ AND Score > 89 Join –S.Naid = G.Naid –First of these becomes a constant when iterating through S table
10
Copyright © Curt Hill 2003-2008 Using a Hash Index Conjunct must use an equality Whole key must be –Indexed by hash index –Used in equality conjuncts Example from Grades table: –Naid = 2013 AND Dept = ‘CS’ AND Number = 160
11
Copyright © Curt Hill 2003-2008 Using a B+Tree index Any comparison operator The B+Tree may only prefix a key Example –Naid = 2013 AND Dept = ‘CS’ AND Number = 160 –Only Dept and Number are actually indexed
12
Copyright © Curt Hill 2003-2008 Indexing CNF The form may include conjuncts that are not indexed Those conjuncts that are indexed are called the primary conjuncts A form may have two separate sets indexed by different indexes –Either can be retrieved and then the others checked against them
13
Copyright © Curt Hill 2003-2008 Selectivity of Access Paths Number of pages fetched to obtain all records –This includes index and data pages The most selective path accesses the fewest pages –This minimizes the retrieval costs –Not always predictable
14
Copyright © Curt Hill 2003-2008 Reduction Each conjunct reduces the number of tuples that could be included in a query –This is the reduction factor –Each is a probability The expected probability of the conjunction of unrelated probabilities is their product It is often the case that the conjuncts are not unrelated, but this is still a good estimate
15
Copyright © Curt Hill 2003-2008 Selection Use an index –If all the fields in selection criteria are indexed –If some of the selection criteria fields are indexed retrieve these and then reduce them with the other criteria Sequential scan –The criteria is tested on each tuple –Always possible If both are available –Use the cost estimators to determine which is most selective
16
Copyright © Curt Hill 2003-2008 Selection Example Suppose: –File of 400 pages –Criteria X > 300 AND Y = 123 –BTree clustered index on X Use the index and then scan sequentially from this location –This works because table is sorted by the BTree
17
Filter Threshold Suppose that I have a choice of sequential scanning or accessing through a tree Each tree access needs several page accesses If only a few records are to be accessed the index approach will generate the fewest accesses If many then the scan will be best We can calculate a reduction probability and choose the best approach Copyright © Curt Hill 2003-2008
18
Selection Example Suppose: –Page contains 20 records –Three page accesses for a record via index What is the filter threshold? –If the probability of looking at a record is 1 of 20 (5%) then clearly sequential scan is desirable –Since it takes three accesses to get a record, divide the 5% by 3 to get 1.66%
19
Example Continued The filter threshold is 1.66% Estimate the reduction factor using the product of the conjunct probabilities If the reduction probability is greater than 1.66% do a sequential scan If the reduction probability is less than 1.66% use the index Copyright © Curt Hill 2003-2008
20
Projection If the required fields are all in the index, we may not need to access the data at all –Does not matter if index is clustered or not If duplicates are to be eliminated –Sort or hash the results to do the elimination
21
Copyright © Curt Hill 2003-2008 Join Very important Very common Very well studied Many algorithms Most DBMS use more than one algorithm We will consider three
22
Copyright © Curt Hill 2003-2008 Index Nested Loops Join Select … From t1 as A, t2 as B … Where A.x = B.y –B is indexed on Y Scan A sequentially Use B’s index to find if x exists The quality of the index determines how many accesses are needed Desirable if we can restrict one of the files before accessing
23
Copyright © Curt Hill 2003-2008 Sort Merge Join AKA Zipper Join Sort both tables on the join field Does not require an index It is much less expensive if one or both files is already sorted on the field The accesses are M + N once sorted The sort itself is N log N + M log M
24
Copyright © Curt Hill 2003-2008 The Sort Merge Join 1024 a 1092 b 1233 c 1279 d 1092 v 1068 u 1024 t 1024 s 1024 r FacultySchedule 1092 w 1279 x 1279 y 1279 z Only one scan of each is needed.
25
Copyright © Curt Hill 2003-2008 Hash Join Sorting partitions the records so that all the possible candidates for a join can be considered at once Hash join also partitions but using a hash instead of sort Two phases –Partitioning –Probing
26
Copyright © Curt Hill 2003-2008 Process Partition phase –Hash file R into the hash file r –Hash file S into the hash file s Probe phase –For each of k partitions Read partition in r Hash each tuple into an in-memory hash table using a new hash function Read partition in s Find matches with new hash function Make joins and flush to output
27
Copyright © Curt Hill 2003-2008 Which to use? This is where the system catalog information plays into the process Given the right data any of these three will be best So we test the possibilities –Construct the alternative plans –Compute the estimate of accesses –Choose the lowest Considering the rest of the query is also important
28
Copyright © Curt Hill 2003-2008 Query Evaluation Plan A plan is a tree of relational operators Joins, unions, intersections have two descendents (relations) Projection and selection just one These can be augmented with access paths
29
Copyright © Curt Hill 2003-2008 A Query Consider the following query Select s.name, g.dept, g.score From grades g, students s Where s.naid=g.naid AND g.score>79 and g.dept = 'CS'
30
Copyright © Curt Hill 2003-2008 A Plan Join s.naid = g.naid StudentsGrades Select g.score > 79 Select g.dept > ‘CS’ Project s.name, g.dept, g.score
31
Copyright © Curt Hill 2003-2008 Optimization of Plans We can reorganize the tree to produce a better plan In this case push the selection below the join The join has many fewer entries This makes the index loop join or any other much less expensive Selections may be applied upon input –Thus not cause any additional accesses
32
Copyright © Curt Hill 2003-2008 A Revised Plan Join s.naid = g.naid Students Grades Select g.score > 79 Select g.dept > ‘CS’ Project s.name, g.dept, g.score
33
Copyright © Curt Hill 2003-2008 Other optimizations Pushing the selection below (that is before) the join makes the table smaller and the join easier A projection also makes a table smaller by eliminating columns Eliminate any column that does not appear in the rest of the query This will reduce the number of pages and thus the cost of the join Projections may be done as the table is read or written –Thus do not add accesses
34
Copyright © Curt Hill 2003-2008 Another Revised Plan Join s.naid = g.naid Students Grades Select g.score > 79 Select g.dept > ‘CS’ Project s.name,s.naid Project s.name, g.dept, g.score Project g.naid,g.dept, g.score
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.