Presentation is loading. Please wait.

Presentation is loading. Please wait.

Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Similar presentations


Presentation on theme: "Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring."— Presentation transcript:

1 Academic Year 2014 Spring

2 MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring

3 Query Optimization:  Query Optimization is the process of choosing the most efficient way to execute a SQL statement.  When the cost-based optimizer was offered for the first time with Oracle 7, Oracle supported only standard relational data.  Query Optimization is an important component of a modern relational database system.  Relational Database Systems provide a system managed optimization facility by making use of available tools.

4

5 Query Optimization:  Description  A Query Optimizer is essentially a program for efficient evaluation of relational queries, making use of relevant statistic information  Objective  To choose the most efficient strategy for implementing a given relational query, thereby improve the efficiency and performance of a relational database system

6 Need of Query Optimization: 1. To perform automatic navigation:  A relational database system (based on non-navigational relational model) allows users to simply state what data they require and leave system to locate and process that data in database

7 Need of Query Optimization: 2. To achieve acceptable performance:  There may be different plans (called query plan) to perform a single user query and query optimizer aims to select and execute most efficient query plan based on information available to system

8 Need of Query Optimization: 3. To minimize existing differences:  Due to existing difference in speed between CPU and I/O devices, a query optimizer aims to minimize I/O activities by choosing ‘cheapest’ query plan for a given query

9 Effects of Optimization – Example:  Consider following Student, Lending and Book tables:  Student (student_no, student_name, gender, address)  Lending (lending_no, student_no, book_no)  Book (book_no, title, author, edition)

10 Effects of Optimization – Example  Assume that database tables contains  100 students in Student table  1000 lending in Lending table, of which only 50 are for book ‘B1’  5000 books in Book table  Further assume that only results (intermediate relations) of up to 50 tuples can be kept in memory during query processing

11 Effects of Optimization – Example:  Query  Retrieve names of students who have borrowed book ‘B1’  SQL  SELECT DISTINCT student_name FROM student, lending WHERE student.student_no = lending.student_no AND lending.book_no = ‘B1’

12 Query Plan A – No Optimization:  Operation Sequence – Join – Select – Project  Step 1 Join student and lending over student_no giving T1  Step 2 Select T1 where book_no = ‘B1’ giving T2  Step 3 Project T2 over student_name giving result

13 Query Plan A – No Optimization:  We calculate number of database accesses (tuple I/O operations) required for each item  Number of tuple I/O is described as number of tuples (records) to be read and written during operation

14 Query Plan A – Calculation:  Step 1 – Join student and lending over student: no giving T1  Step 2 – Select T1 where book_no = ‘B1’ giving T2  Step 3 – Project T2 over student_name giving result IR: Intermediate Relation  Total tuple I/O: 1,02,0000 StepReadWriteIRSubtotal 1100 x 10,00010,000 1,01,0000 210,00005010,000 300<= 500

15 Query Plan B – with Optimization:  Operation Sequence – Select – Join – Project  Step 1 Select lending where book_no = ‘B1’ giving T1  Step 2 Join T1 and student over student_no giving T2  Step 3 Project T2 over student_name giving result

16 Query Plan B – with Optimization:  We again calculate number of tuple I/O operations required for each step

17 Query Plan B – Calculation:  Step 1 – Select lending where book_no = ‘B1’ giving T1  Step 2 – Join T1 and student over student_no giving T2  Step 3 – Project T2 over student_name giving result IR: Intermediate Relation  Total tuple I/O: 10,100 StepReadWriteIRSubtotal 110,00005010,000 2100050100 300<= 500

18 Comparison Plan A vs. Plan B:  Ratio of I/O tuples (Plan A to Plan B):  1,02,0000 / 10,100  Intermediate relations in Plan B are much smaller than those in Plan A  Tuple I/O can be further reduced by using indexes  If there is an index on book_no in lending table, tuples to be read will be just 50 instead of 10000

19 Four Stages of Optimization: what how The query processing activity therefore acts as an interface between the querying individual/process and the database. It relieves the querying individual/ process of the burden of deciding the best execution strategy. So while the querying individual/ process specifies what, the query processor determines how.

20 Four Stages of Optimization:  Stage 1  Convert query into some internal form more suitable for machine manipulation e.g. Query Tree Relational Algebra  Stage 2  Further convert internal form into some equivalent and more efficient Canonical Form making use of well defined transformation rules

21 Four Stages of Optimization:  Example of Query Tree – Plan A (Join – Select – Project) StudentLending Join Restrict Project Result Over student_no Where book_no = ‘B1’ Over student_name

22 Four Stages of Optimization:  Stage 3  Choose a set of low-level procedures using statistics about database Low Level Operations (e.g. join, select, project) Implementation procedures (one for each low level operation based on varying conditions) Cost formulae (one for each implementation procedure)

23 Four Stages of Optimization:  Stage 4  Generate a set of candidate query plans and choose best of those plans by evaluating cost formulae Process of selecting a query plan is also called ‘access path’ selection ‘cheapest’ query plan is normally considered to be one which produces minimum I/O tuple operations and smallest set of intermediate relations

24 Database Statistics:  Selection of ‘optimal’ query plans in optimization process makes use of database statistics stored in System Catalogue or Data Dictionary of database system  In other words, without this information (meta data) being available, query optimizer will not be able to choose most efficient query plan for implementing a given query

25 Database Statistics:  Typical Database Statistics include  For each base table Cardinality Number of pages for this tables  For each column of each base table Number of distinct values Maximum, minimum and average value Actual values and their frequencies

26 Database Statistics:  Typical Database Statistics include (continued)  For each index Number of levels Number of leaf pages

27 Thank you!!! Questions are WELCOME Academic Year 2014 Spring


Download ppt "Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring."

Similar presentations


Ads by Google