# Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: selectivity.

## Presentation on theme: "Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: selectivity."— Presentation transcript:

Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: selectivity query trees optimization goals query rewrite rules common sub-expression identification

2/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Query optimization in relational databases Java  C (speed-up factor 2-5) C  Assembler (speed-up factor 2-5), but SQL  optimized SQL (speed-up by 1-100 and more) - Why? - logical query optimization - physical query optimization ( next chapter )

3/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Operators of the relational algebra R1 U R2 = union of relations R1 and R2 R1  R2 = intersection of relations R1 and R2 R1 X R2 = cartesian product of relations R1 and R2 R1 |X| C R2 = join of relations R1 and R2 with condition C R1 - R2 = difference of relations R1 and R2 P A1,…,An ( R ) = projection of relation R to the attributes A1, …, An S C (R) = selection of the subset of tuples of relation R that match condition C

4/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Selectivity of queries significant for size of intermediate result selectivity of the selection with condition C : selectivity ( S C (R) ) = | S C (R) | / | R | selectivity of the Join with condition C : selectivity ( R1 |X| C R2 ) = | R1 |X| C R2 | / ( | R1 | * | R2 | ) estimated (e.g. based on samples, histograms)

5/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Goal of logical query optimization SQL queries select A1,...,An from R1,..., Rm where C correspond to algebra expression P A1,...,An ( S C ( R1 x... x Rm ) ) - very large intermediate results  task: obtain the same result with smaller intermediate results e.g. move selection and projection inside expressions as far as possible

6/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Algebra tree for SQL query – example Write a letter to each student enrolled in a database course select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID P B.firstName, B.lastName | S C.title = 'databases' and C.courseID = E-courseID and B.sID=E.sID | X / \ X C / \ B E

7/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Logical query optimization – example (2) assumptions: university database with following relations bachelorStudent B : 10000 bachelor students, each taking 5 courses on average  enroll E : 50000 enrollments course C : 1000 courses, 2 of which have title ‘databases‘ and have 100 enrolled students each SQL-Query: return firstName and lastName of all bachelor students in a database course select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID

8/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Logical query optimization – example (3) SQL-Query: return firstName and lastName of all bachelor students in a database course select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID P B.firstName, B.lastName | S C.title = 'databases' and C.courseID = E-courseID and B.sID=E.sID | X / \ X C / \ B E assumptions: 10000 bachelor students, each taking 5 courses on average 1000 courses, 2 of which have title ‘databases‘ and have 100 enrolled students each

9/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Logical query optimization – example (4) select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID P B.firstName, B.lastName | S C.title = 'databases' and C.courseID = E-courseID and B.sID=E.sID | X / \ X C / \ B E 500.000.000.000500.000.000 50.000 10.000 1.000 200 assumptions: 10000 bachelor students, each taking 5 courses on average 1000 courses, 2 of which have title ‘databases‘ and have 100 enrolled students each

10/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Logical query optimization – example (5) A possible optimization : P B.firstName, B.lastName | |X| C.courseID = E.courseID / \ |X| S C.title = 'databases' B.sID=E.sID \ / \ \ B E C 2 50.000 10.000 1.000 200 select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID assumptions: 10000 bachelor students, each taking 5 courses on average 1000 courses, 2 of which have title ‘databases‘ and have 100 enrolled students each

11/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Logical query optimization – example (6) A better optimization : P B.firstName, B.lastName | |X| B.sID = E.sID / \ B |X| C.courseID = E.courseID / \ E S C.title = 'databases' | C 2 50.000 10.000 1.000 200 assumptions: 10000 bachelor students, each taking 5 courses on average 1000 courses, 2 of which have title ‘databases‘ and have 100 enrolled students each select B.firstName, B.lastName from bachelorStudent B, enroll E, course C where C.title = 'databases' and C.courseID = E.courseID and B.sID=E.sID

12/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Rules of logical query optimization (1) Union, intersection, cartesian product and join are commutative and associative. R1 U R2 = R2 U R1 R1  R2 = R2  R1 R1 X R2 = R2 X R1 R1 |X| C R2 = R2 |X| C R1 ( R1 U R2 ) U R3 = R1 U ( R2 U R3 ) ( R1  R2 )  R3 = R1  ( R2  R3 ) ( R1 X R2 ) X R3 = R1 X ( R2 X R3 ) ( R1 |X| C1 R2 ) |X| C2 R3 = R1 |X| C1 ( R2 |X| C2 R3 )

13/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Rules of logical query optimization (2) whenever the selection condition is a conjunction, selections can be cut off and their order can be swapped : S C1 and C2 (R) = S C1 (S C2 (R)) = S C2 (S C1 (R)) push selections inside union, difference and intersection: S C ( R1 U R2 ) = S C ( R1 ) U S C ( R2 ) S C ( R1 - R2 ) = S C ( R1 ) - S C ( R2 ) S C ( R1  R2 ) = S C ( R1 )  S C ( R2 )

14/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Rules of logical query optimization (3) push selection inside a join, i.e. to a join argument, S C ( R1 |X| C2 R2 ) = S C ( R1 ) |X| C2 R2 if C only uses attributes of R1 push selection inside an argument of a cartesian product S C ( R1 X R2 ) = S C ( R1 ) X R2 if C only uses attributes of R1 if this is impossible for both R1 and R2, i.e., C uses attributes of R1 and of R2 : substitute selection applied to cartesian product with join S C ( R1 X R2 ) = R1 |X| C R2

15/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Rules of logical query optimization (4) order of projection and selection can be swapped, if the projection yields all attributes needed for the selection condition : S C ( P A1,...,Am ( R1 ) ) = P A1,...,Am ( S C ( R1 ) ) if C only uses attributes of A1,...,Am. push projection inside union P A1,...,Am ( R1 U R2 ) = P A1,...,Am ( R1 ) U P A1,...,Am ( R2 ) push projection into the join, i.e. apply it a join argument, if the join attributes are contained in the projection P A1,...,Am ( R1 |X| C R2 ) = P A1,...,Am ( ( P A1,...,Am,AC1,...,ACn ( R1 ) ) |X| C R2 ) where AC1,...,ACn are the attributes of R1 needed to check the join condition C. projections can be combined and inserted additionally P A1,...,Am ( R1 ) = P A1,...,Am ( P A1,...,Am,AC1,...,ACn ( R1 ) )

16/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Logical query optimization - steps represent SQL query as a logical query tree apply the following optimizations to this query tree cut off and push down selections combine selections and cartesian products to joins determine join sequence with smallest intermediate result where possible push down and insert projections

17/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Logical query optimization - exercises 1.represent following SQL query as a logical query tree select E.courseID from bachelorStudent B, enroll E where B.lastName = 'Meier' and B.sID = E.sID optimize step by step and write down optimized logical query 2.assume 1000 courses, 10000 bachelor students, each taking 5 courses on average 4 of them have lastName 'Meier' compute selectivity of selection and join in optimized query

18/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Finding common sub-expressions - goal SQL query represented as a logical query tree sub- tree R1 sub- tree R2 op R1 = R2 ? R1  R2 ? reuse ! recompute

19/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Finding common sub-expressions – (2) R1 = R2 ? R1  R2 ? reuse ! recompute R1 = R2 ?  normalize R1 and R2 by applying algebra rules + compare normalized queries S C1 (R)  S C2 (R) if C1 implies C2 ( i.e. (not C1 or C2) = true ) R1 |X| C R2  R1 |X| R2

20/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Finding common sub-expressions – (3) R1  R2 ? reuse ! recompute use monotonicity of union, intersection, join, cartesian product, selection and projection If R1  R3 and R2  R4 then R1 U R2 R1  R2 R1 X R2 R1 |X| C R2 R1 - R4 P A1,…,An ( R1 ) S C (R1) R3 U R4 R3  R4 R3 X R4 R3 |X| C R4 R3 - R2 P A1,…,An ( R3 ) S C (R3)       

21/21 Databases and Information Systems I – WS 2009/2010 – Logical Query Optimization Summary – logical query optimization Goal: minimize intermediate results SQL query  logical query tree  apply transformation rules  search common sub-expressions

Download ppt "Databases and Information Systems 1 Prof. Dr. Stefan Böttcher Fakultät EIM, Institut für Informatik Universität Paderborn WS 2009 / 2010 Contents: selectivity."

Similar presentations