Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2008 Query & Application Performance These.

Similar presentations


Presentation on theme: "1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2008 Query & Application Performance These."— Presentation transcript:

1 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2008 Query & Application Performance These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see http://www.openlineconsult.com/db

2 2 © Ellis Cohen 2001-2008 Overview of Lecture Optimization & Query Planning Ordering, Grouping & Sorting Simple SQL Tuning Implementing Joins as Nested Loops Join Implementation and Tuning Other Performance Issues

3 3 © Ellis Cohen 2001-2008 Optimization & Query Planning

4 4 © Ellis Cohen 2001-2008 Stages of Statement Execution 1.Statement Preparation Parse statement. Check that tables & columns exist and are consistent w statement; convert statement to internal parsed representation. 2.Query Optimization / Planning Plan how to best execute the statement (primarily the query portion) 3.Statement Execution Locate tables & indices, check access, execute the plan, run triggers, check constraints All stages involve access to many metadata tables, which generally are cached in memory

5 5 © Ellis Cohen 2001-2008 Understanding Query Optimization & Planning Understanding how query planning & optimization works will Further help you understand how to index your tables Understand the limitations of query optimizers, and help you write queries so they will be executed as quickly as possible

6 6 © Ellis Cohen 2001-2008 Query Optimization Process 1.Convert query to a symbolically representation that is easy to manipulate 2.Transform the representation to alternatives that are semantically equivalent 3.Consider various approaches to implementing each alternative (taking ordering requirements and available indexes into account) 4.Evaluate each approach and choose the best one Use pruning and heuristic techniques to avoid evaluating all approaches

7 7 © Ellis Cohen 2001-2008 Rule vs. Cost-based Optimization Rule-based Optimization Uses rules to decide which plan to choose: e.g. include index with largest # of matching attributes (no longer used by Oracle) Cost-based Optimization Uses analysis of the tables involved: e.g. Consider selectivity of indices

8 8 © Ellis Cohen 2001-2008 Plans & Statistics Use EXPLAIN PLAN or SET AUTOTRACE ON to describe the plan that Oracle will use to execute a statement Use ANALYZE TABLE to compute statistics that will be used by the query optimizer to decide among alternate plans.

9 9 © Ellis Cohen 2001-2008 What to Minimize Fastest Overall Time Minimize total time of operation Oracle Hint: SELECT /*+ ALL_ROWS */ … Fastest Initial Result Minimize time to get first tuples of result set Oracle Hint: SELECT /*+ FIRST_ROWS */ …

10 10 © Ellis Cohen 2001-2008 Ordering, Grouping & Sorting

11 11 © Ellis Cohen 2001-2008 Implementing ORDER BY How could a query engine implement SELECT * FROM Emps ORDER BY sal Can this be done to get the fastest initial result?

12 12 © Ellis Cohen 2001-2008 ORDER BY Alternatives SELECT * FROM Emps ORDER BY sal 1.Sort the employees -- initial table scan, then n*log(n), -- if too large to fit in memory, sort the largest group that can fit, then merge these groups (known as merge-sort) 2.If index by sal exists, go through each group of employees with the same salary. -- However, gets fastest initial results -- but this be slow if there are a large # of distinct salary values (unless table is index-organized by sal)

13 13 © Ellis Cohen 2001-2008 ORDER BY Tuning Don’t use ORDER BY unless you really need to. Sorts of larger result sets are expensive, even in RAM. Some DBMS do an implicit ORDER BY so that the result set order doesn’t depend on the storage order. Suppress this sort for optimal performance. In MySQL you do this with ORDER BY NULL. Copyright © Robert Schudy 2005 (adapted)

14 14 © Ellis Cohen 2001-2008 Implementing DISTINCT How could a query engine implement SELECT DISTINCT sal FROM Emps Can this be done to get the fastest initial result?

15 15 © Ellis Cohen 2001-2008 DISTINCT Alternatives SELECT DISTINCT sal FROM Emps 1) If index by sal exists extract the distinct values 2) Could sort the employees by sal (on disk if 1M employees), then iterate through and find the unique values (see where sal changes). But too expensive 2) Build a structure (a B+ tree or hash table) of distinct sal values As each employee is processed, see if its sal is in the structure. If not, include it. When done, use it to produce the result set If fast initial results are required, emit them when adding them to the structure

16 16 © Ellis Cohen 2001-2008 Implementing GROUP BY How could a query engine implement SELECT sal, count(*) FROM Emps GROUP BY sal Can this be done to get the fastest initial result?

17 17 © Ellis Cohen 2001-2008 GROUP BY Alternatives SELECT sal, count(*) FROM Emps GROUP BY sal 1) If index by sal exists count the # of distinct values for each sal Gives fast initial results 2) Build a structure (a B+ tree or hash table) of distinct sal values, with an associated count for each As each employee is processed, see if its sal is in the structure. If so, add 1 to its count. If not, include it & set its count to 1. When done, return the sals & counts. How about SELECT sal, min(hiredate) FROM Emps GROUP BY sal

18 18 © Ellis Cohen 2001-2008 GROUP BY with Materialized Views If the GROUP BY operates on large intermediate result sets, then query execution will be slow. If appropriate, use a materialized view to store the results of the GROUP BY and query that directly. Note: there is a cost to maintain the materialized table. Note: Oracle has a Query Rewrite capability which allows the query to be written using the original table, and which automatically converts the query to use materialized views which have been defined Copyright © Robert Schudy 2005 (adapted)

19 19 © Ellis Cohen 2001-2008 Simple SQL Tuning

20 20 © Ellis Cohen 2001-2008 SELECT Tuning Avoid selecting things that you don’t really need. Avoid selecting literals, constants or other data that is known to the application. Beware of the overhead of DISTINCT and other operations on the SELECT list. It can be useful to compute the SQL at runtime (dynamic SQL) to omit unneeded items from the SELECT list. Copyright © Robert Schudy 2005 (adapted)

21 21 © Ellis Cohen 2001-2008 WHERE Tuning Use equality tests in WHERE clauses when possible. This enables hash and bitmap indexes and speeds B+-tree indexes. Test against literals in WHERE clauses when possible. A cost-based optimizer can use the sparsity of the literal to improve optimization. It can be useful to compute the SQL at runtime (dynamic SQL) to omit unneeded computation from the WHERE clause (e.g. CASE statements). Some DBMS, including MySQL, take the SQL order of the WHERE clauses as a hint for their execution order. Copyright © Robert Schudy 2005 (adapted)

22 22 © Ellis Cohen 2001-2008 Avoid Functions in WHERE Clauses Avoid functions in WHERE clauses, such as WHERE UPPER(loc) = 'BOSTON' WHERE SQRT(area) > 3.0 WHERE my_function(sal) = 5 How to overcome this Store loc in upper case and eliminate UPPER Transform the function: WHERE area > 9.0 Add a derived column with the function result, index it, and query that column WHERE my_function_col = 5 Use a function index CREATE INDEX... ON Emps( my_function( sal ) ) Copyright © Robert Schudy 2005 (adapted)

23 23 © Ellis Cohen 2001-2008 LIKE in WHERE Clauses Avoid LIKE, particularly with % other than at the end, because they can force index or table scans. –WHERE ename LIKE 'GEO%' can do a range scan using a B-tree index on ename –WHERE ename LIKE '%RGE' will require a full index or table scan. LIKE is handy for exploring a database or prototyping, but a LIKE clause in a production system is usually not the most efficient design. Copyright © Robert Schudy 2005 (adapted)

24 24 © Ellis Cohen 2001-2008 Only Fetch The Data Required Returning lots of data from a query is expensive. –DBMS server resources –Network resources –Client resources Only return the data needed by the middle-tier –If you can ’ t easily write the necessary SQL, use a stored procedure. Copyright © Robert Schudy 2005 (adapted)

25 25 © Ellis Cohen 2001-2008 Implementing Joins as Nested Loops

26 26 © Ellis Cohen 2001-2008 Implementing Joins Using Nested Loops SELECT e.ename, d.dname FROM Emps e, Depts d WHERE e.deptno = d.deptno SELECT e.ename, (SELECT d.dname FROM Depts d WHERE e.deptno = d.deptno) FROM Emps e SELECT d.dname, TABLE (SELECT e.ename FROM Emps e WHERE e.deptno = d.deptno) FROM Depts d 1M Emps 1000 Depts If the only indices are on the primary keys, which is faster? If we also have an index on Emps(deptno), which is faster?

27 27 © Ellis Cohen 2001-2008 Nested Loop Comparisons SELECT e.ename, (SELECT d.dname FROM Depts d WHERE e.deptno = d.deptno) FROM Emps e Full table scan through Emps. For each one, use index scan to get the appropriate Depts However, 100 blocks of Depts will be cached eventually. If optimizer is really smart, it could just cache them immediately SELECT dname, TABLE (SELECT e.ename FROM Emps e WHERE e.deptno = d.deptno) FROM Depts d Full table scan through Depts. For each one, use full table scan to get the appropriate employees. REALLY slow! With index on Emps(deptno), can use index scan to find all emps with particular deptno. STILL pretty slow. Approach only wins if Emps table is index-organized by deptno, or has an index on Emps(deptno,empno), so an index scan can be used

28 28 © Ellis Cohen 2001-2008 Simple Join Problem SELECT DISTINCT empno, ename FROM Emps e NATURAL JOIN Assigns a WHERE a.hrs BETWEEN 30 AND 35 Get the employees numbers and names of employees who work between 30 and 35 hrs per week on some project Determine how this could be implemented. Compare: 1) Outer loop is Emps 2) Outer loop is Assigns Which is better?

29 29 © Ellis Cohen 2001-2008 Emps Outer Loop a) Do a full scan through the employees b) For each employee, find the assignments for that employee, and see if any have hrs between 30 & 35. If so, keep the employee. OK if index on Assigns(empno) Better if index on Assigns(empno,hrs)

30 30 © Ellis Cohen 2001-2008 Assigns Outer Loop a) Look through Assigns to find the assignments with 30 - 35 hrs b) Look through them and build a list (or B+ tree of distinct employees) c) For each employee, look them up in Emps and get their ename OK if index on Assigns(hrs) Better if index on Assigns(hrs,empno)

31 31 © Ellis Cohen 2001-2008 More Complex Problem SELECT DISTINCT empno, ename FROM Emps e NATURAL JOIN Assigns a WHERE sal > 2500 AND a.hrs BETWEEN 30 AND 35 Get the employees numbers and names of employees whose weekly sal is > 2500 and work between 30 and 35 hrs per week on some project Determine how this could be implemented. Compare: 1) Outer loop is Emps 2) Outer loop is Assigns Which is better?

32 32 © Ellis Cohen 2001-2008 Join Implementation and Tuning

33 33 © Ellis Cohen 2001-2008 Index Join Emps where sal = 2600 Emps where sal = 2700 Emps where sal = 2800 Emps with 30 hr asns Emps with 31 hr asns Emps with 32 hr asns intersect

34 34 © Ellis Cohen 2001-2008 Join Implementations Nested Loops Good when at an index can be used on at least one of the tables (used in the inner loop). Allows fast initial result Merge Join Sort both columns on join columns, then join by merging them together. Often fastest overall time, especially when no usable indices. Hash Join Put into hash table based on join columns, then join based on matching values. Can be best way to get fast initial results. Each of these has a parallel version Hints can be used to specify implementation

35 35 © Ellis Cohen 2001-2008 FROM Tuning Be very careful when joining tables, because joining many tables can degrade performance dramatically. A single table (or materialized view) in the FROM list is fastest. Joining a small table won’t slow SQL very much, because a small table is usually mainly in the cache. Copyright © Robert Schudy 2005 (adapted)

36 36 © Ellis Cohen 2001-2008 Multi-Table Joins Joining two or more large tables takes great care. –Make sure there are join indexes covering the joined columns of both tables, particularly on DBMS that support index joins (i.e. using intersection) of the indexes instead of the tables. –Check index sparsity and beware large intermediates. –Try to use a join of one or more of the small tables first, to produce a small intermediate to join with the other large table(s). –Check the query plan, time realistically, and help the optimizer if necessary with hints. The number of tables that can be joined with reasonable efficiency depends on the DBMS, indexing, etc. –Oracle performance usually falls off around 8 tables –MySQL performance usually falls off around 3 tables. Copyright © Robert Schudy 2005 (adapted)

37 37 © Ellis Cohen 2001-2008 Other Performance Issues

38 38 © Ellis Cohen 2001-2008 UNION, INTERSECT & EXCEPT Set operations (UNION, INTERSECT & EXCEPT) also use SORT/MERGE so can’t produce fast initial results. UNION ALL doesn't require SORT. Just concatenate, so can produce fast initial results

39 39 © Ellis Cohen 2001-2008 Subqueries An optimizer will generally try to convert subqueries to joins, and then optimize the join. In theory, every subquery can be rewritten using joins, though this is a hard problem. Oracle can't convert every subquery. It has a particularly hard time with correlated subqueries using NOT IN and NOT Exists

40 40 © Ellis Cohen 2001-2008 Unnecessary Looped UPDATEs FOR prec IN (SELECT DISTINCT pmgr FROM Projects) LOOP UPDATE Emps SET sal = sal + 100 WHERE empno = prec.pmgr; END LOOP; UPDATE Emps SET sal = sal + 100 WHERE empno IN (SELECT pmgr FROM Projects) In general, avoid loops when equivalent code can be written without loops Increase salary of all project managers This code is much more efficient

41 41 © Ellis Cohen 2001-2008 Unnecessary Looped SELECTs DECLARE knt int; BEGIN FOR drec IN ( SELECT deptno FROM Depts) LOOP SELECT count(*) INTO knt FROM Emps WHERE deptno = drec.deptno; IF knt = 0 THEN RAISE_APPLICATION_ERROR( -20023, 'There is a dept with no employees' ); END LOOP; END; Raise an error if some dept has no employees The inner SELECT statement can be avoided by using a more complicated outer SELECT

42 42 © Ellis Cohen 2001-2008 Identifying the First Error BEGIN FOR drec IN ( SELECT deptno FROM DeptKntsView WHERE eknt = 0) LOOP RAISE_APPLICATION_ERROR( -20023, 'Dept ' || drec.deptno || ' has no employees' ); END LOOP; END; Raise an error if some dept has no employees CREATE VIEW DeptKntsView AS SELECT deptno, count(empno) AS eknt FROM Depts NATURAL LEFT JOIN Emps GROUP BY deptno;

43 43 © Ellis Cohen 2001-2008 The Scalable Performance Problem When there isn’t much data, databases generally perform well regardless of problems due to scalability. With un-scalable designs when the database scales many requests take much longer, often unacceptably. Obtaining scalable performance is often the most difficult task faced by database designers and application developers. Copyright © Robert Schudy 2005 (adapted)

44 44 © Ellis Cohen 2001-2008 Size, Throughput and Session Scalability The three main measures of database scalability are how well the DB maintains performance in various circumstances Size scalability: The DB grows. Software factors affecting size scalability include the application SQL, schema, and indexing. Hardware factors include I/O and storage type and configuration, and CPU speed and number. Throughput scalability: The transaction rate increases. Throughput scalability is primarily determined by lock contention, DB server concurrency, and platform performance. Session scalability: The number of simultaneous sessions increases. More concurrent sessions means more opportunities for lock contention. Session scalability is driven by schema design and DBMS concurrency support. Copyright © Robert Schudy 2005 (adapted)


Download ppt "1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2008 Query & Application Performance These."

Similar presentations


Ads by Google