Analyzing Your Data with Analytic Functions Carl Dudley University of Wolverhampton, UK UKOUG Council Oracle ACE Director

Slides:



Advertisements
Similar presentations
Oracle 10g & 11g for Dev Virtual Columns DML error logging
Advertisements

BACS 485—Database Management Advanced SQL Overview Advanced DDL, DML, and DCL Commands.
12-1 Copyright  Oracle Corporation, All rights reserved. What Is a View? EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
Restricting and sorting data 16 May May May Created By Pantharee Sawasdimongkol.
Subqueries 11. Objectives After completing this lesson, you should be able to do the following: Describe the types of problems that subqueries can solve.
1Eyad Alshareef Enhanced Guide to Oracle 10g Chapter 3: Using SQL Queries to Insert, Update, Delete, and View Data.
Copyright  Oracle Corporation, All rights reserved. 2 Restricting and Sorting Data.
Aggregating Data Using Group Functions. Objectives After completing this lesson, you should be able to do the following: Identify the available group.
Introduction to Structured Query Language (SQL)
Writing Basic SQL statement 2 July July July Create By Pantharee Sawasdimongkol.
Session 3: SQL (B): Parts 3 & 4 Original materials supplied by the Oracle Academic Initiative (OAI). Edited for classroom use by Professor Laku Chidambaram.
Logical Operators Operator AND OR NOT Meaning Returns TRUE if both component conditions are TRUE Returns TRUE if either component condition is TRUE Returns.
o At the end of this lesson, you will be able to:  Describe the life-cycle development phases  Discuss the theoretical and physical aspects of a relational.
Copyright  Oracle Corporation, All rights reserved. I Introduction.
Copyright  Oracle Corporation, All rights reserved. 1 Writing Basic SQL Statements.
4-1 Copyright  Oracle Corporation, All rights reserved. Displaying Data from Multiple Tables.
Dr. Philip Cannata 1 Programming Languages Prolog Part 3 SQL & Prolog.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Introduction to Relational Databases &
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Cursors These slides are licensed under.
Subqueries.
2 Writing Basic SELECT Statements. 1-2 Copyright  Oracle Corporation, All rights reserved. Capabilities of SQL SELECT Statements Selection Projection.
Copyright  Oracle Corporation, All rights reserved. Writing Basic SQL Statements.
Joins & Sub-queries. Oracle recognizes that you may want data that resides in multiple tables drawn together in some meaningful way. One of the most important.
Copyright س Oracle Corporation, All rights reserved. I Introduction.
Copyright  Oracle Corporation, All rights reserved. 2 Restricting and Sorting Data.
SQL- DQL (Oracle Version). 2 SELECT Statement Syntax SELECT [DISTINCT] column_list FROM table_list [WHERE conditional expression] [GROUP BY column_list]
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Basic SQL These slides are licensed under.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Session 2: SQL (A): Parts 1 and 2 Original materials supplied by the Oracle Academic Initiative (OAI). Edited for classroom use by Professor Laku Chidambaram.
1 Writing Basic SQL Statements. 1-2 Objectives At the end of this lesson, you should be able to: List the capabilities of SQL SELECT statements Execute.
I-1 Copyright س Oracle Corporation, All rights reserved. Data Retrieval.
Copyright  Oracle Corporation, All rights reserved. 12 Creating Views.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Subqueries These slides are licensed under.
An Introduction To SQL Part 2 (Special thanks to Geoff Leese)
1 Information Retrieval and Use (IRU) An Introduction To SQL Part 2.
Copyright س Oracle Corporation, All rights reserved. I Introduction.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
Oracle CONNECT BY function JAVA WEB Programming. Emp 테이블의 내용 ( 상 / 하급자 계층구조 ) SQL> select * from emp; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
2-1 Limiting Rows Using a Selection “…retrieve all employees in department 10” EMP EMPNO ENAME JOB... DEPTNO 7839KINGPRESIDENT BLAKEMANAGER CLARKMANAGER.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Grouping These slides are licensed under.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
Copyright س Oracle Corporation, All rights reserved. 12 Creating Views.
Copyright  All material contained herein is owned by Daniel Stober, the author of this presentation. This presentation and the queries, examples, and.
CSCI N311: Oracle Database Programming 5-1 Chapter 15: Changing Data: insert, update, delete Insert Rollback Commit Update Delete Insert Statement –Allows.
Copyright  Oracle Corporation, All rights reserved. 2 Restricting and Sorting Data.
DATABASES
Writing Basic SQL Statements. Objectives After completing this lesson, you should be able to do the following: –List the capabilities of SQL SELECT statements.
Defining a Column Alias
1 ORACLE I 3 – SQL 1 Salim Phone: YM: talim_bansal.
Copyright  Oracle Corporation, All rights reserved. Introduction.
Copyright س Oracle Corporation, All rights reserved. 1 Writing Basic SQL Statements.
Communicating with a RDBMS Using SQL Database SQL> SELECT loc 2 FROM dept; SQL> SELECT loc 2 FROM dept; SQL statement is entered Statement is sent to database.
Relational Normalization Theory
Enhanced Guide to Oracle 10g
Aggregating Data Using Group Functions
Enhanced Guide to Oracle 10g
Subqueries.
Subqueries Schedule: Timing Topic 25 minutes Lecture
Aggregating Data Using Group Functions
Writing Correlated Subqueries
(SQL) Aggregating Data Using Group Functions
What Is a View? EMPNO ENAME JOB EMP Table EMPVU10 View
Aggregating Data Using Group Functions
Aggregating Data Using Group Functions
Writing Basic SQL Statements
Subqueries Schedule: Timing Topic 25 minutes Lecture
Restricting and Sorting Data
Subqueries Schedule: Timing Topic 25 minutes Lecture
Database Programming Using Oracle 11g
Presentation transcript:

Analyzing Your Data with Analytic Functions Carl Dudley University of Wolverhampton, UK UKOUG Council Oracle ACE Director

2 Carl Dudley University of Wolverhampton, UK Introduction Working with Oracle since 1986 Oracle DBA - OCP Oracle7, 8, 9, 10 Oracle DBA of the Year – 2002 Oracle ACE Director Regular Presenter at Oracle Conferences Consultant and Trainer Technical Editor for a number of Oracle texts UK Oracle User Group Council Member of IOUC Day job – University of Wolverhampton, UK

3 Carl Dudley University of Wolverhampton, UK Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance Analyzing Your Data with Analytic Functions

4 Carl Dudley University of Wolverhampton, UK Analytic Functions  New set of functions introduced in Oracle — Analytic functions or Window functions  Intended for OLAP (OnLine Analytic Processing) or data warehouse purposes  Provide functionality that would require complex conventional SQL programming or other tools  Advantages —Improved performance The optimizer “understands” the purpose of the query —Reduced dependency on report generators and client tools —Simpler coding

5 Carl Dudley University of Wolverhampton, UK Analytic Function Categories  The analytic functions fall into four categories Ranking functions Aggregate functions Row comparison functions Statistical functions  The Oracle documentation describes all of the functions  Processed as the last step before ORDER BY —Work on the result set of the query —Can operate on an intermediate ordering of the rows —Actions can be based on : Partitions of the result set A sliding window of rows in the result set

6 Carl Dudley University of Wolverhampton, UK Processing Sequence  There may be several intermediate sort steps if required Rows Output WHERE evaluation GROUPING HAVING evaluation Intermediate ordering Analytic function Final ORDER BY Analytic process

7 Carl Dudley University of Wolverhampton, UK The Analytic Clause  Syntax : ( ) OVER( )  The enclosing parentheses are required even if there are no arguments RANK() OVER (ORDER BY sal DESC)

8 Carl Dudley University of Wolverhampton, UK Sequence of Processing  Being processed just before the final ORDER BY means : —Analytic functions are not allowed in WHERE and HAVING conditions Allowed only in the final ORDER BY clause  Ordering the final result set — OVER clause specifies sort order of result set before analytic function is computed —Can have multiple analytic functions with different OVER clauses, requiring multiple intermediate sorts —Final ordering does not have to match ordering in OVER clause

9 Carl Dudley University of Wolverhampton, UK Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO MILLER CLERK JAN CLARK MANAGER JUN KING PRESIDENT 17-NOV SMITH CLERK DEC ADAMS CLERK JAN JONES MANAGER APR FORD ANALYST DEC SCOTT ANALYST DEC JAMES CLERK DEC WARD SALESMAN FEB MARTIN SALESMAN SEP TURNER SALESMAN SEP ALLEN SALESMAN FEB BLAKE MANAGER MAY DEPTNO DNAME LOC ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS BOSTON The emp and dept Tables emp dept

10 Carl Dudley University of Wolverhampton, UK Example of Ranking  Ranking with ROW_NUMBER —No handling of ties Rows retrieved by the query are intermediately sorted on descending salary for the analysis SELECT ROW_NUMBER() OVER( ORDER BY sal DESC) rownumber,sal,ename FROM emp ORDER BY sal DESC; ROWNUMBER SAL ENAME KING SCOTT FORD JONES BLAKE CLARK ALLEN TURNER MILLER WARD MARTIN ADAMS JAMES SMITH —If the final ORDER BY specifies the same sort order as the OVER clause only one sort is required — ROW_NUMBER is different from ROWNUM

11 Carl Dudley University of Wolverhampton, UK Different Sort Order in Final ORDER BY  If the OVER clause sort is different from the final ORDER BY —An extra sort step is required SELECT ROW_NUMBER() OVER( ORDER BY sal DESC) rownumber,sal,ename FROM emp ORDER BY ename; ROWNUMBER SAL ENAME ADAMS ALLEN BLAKE CLARK FORD JAMES JONES KING MARTIN MILLER SCOTT SMITH TURNER WARD

12 Carl Dudley University of Wolverhampton, UK Multiple Functions With Different Sort Order  Multiple OVER clauses can be used SELECT ROW_NUMBER() OVER(ORDER BY sal DESC) sal_n,sal,ROW_NUMBER() OVER(ORDER BY comm DESC NULLS LAST) comm_n,comm,ename FROM emp ORDER BY ename;

13 Carl Dudley University of Wolverhampton, UK RANK and DENSE_RANK  ROW_NUMBER increases even if several rows have identical values —Does not handle ties  RANK and DENSE_RANK handle ties —Rows with the same value are given the same rank —After the tie value, RANK skips numbers, DENSE_RANK does not  Ranking using analytic functions has better performance, because the table is not read repeatedly

14 Carl Dudley University of Wolverhampton, UK RANK and DENSE_RANK (continued) SELECT ROW_NUMBER() OVER(ORDER BY sal DESC) rownumber,RANK() OVER(ORDER BY sal DESC) rank,DENSE_RANK() OVER(ORDER BY sal DESC) denserank,sal,ename FROM emp ORDER BY sal DESC,ename; ROWNUMBER RANK DENSERANK SAL ENAME KING FORD SCOTT JONES BLAKE CLARK ALLEN TURNER MILLER MARTIN WARD ADAMS JAMES SMITH Multiple OVER clauses may be used specifying different orderings

15 Carl Dudley University of Wolverhampton, UK Analytic Function in ORDER BY  Analytic functions are computed before the final ordering —Can be referenced in the final ORDER BY clause —An alias is used in this case SELECT RANK() OVER( ORDER BY sal DESC) sal_rank,sal,ename FROM emp ORDER BY sal_rank,ename; SAL_RANK SAL ENAME KING FORD SCOTT JONES BLAKE CLARK ALLEN TURNER MILLER MARTIN WARD ADAMS JAMES SMITH

16 Carl Dudley University of Wolverhampton, UK WHERE Conditions  Analytic (window) functions are computed after the WHERE condition and hence not available in the WHERE clause SELECT RANK() OVER(ORDER BY sal DESC) rank,sal,ename FROM emp WHERE RANK() OVER(ORDER BY sal DESC) <= 5 ORDER BY rank WHERE RANK() OVER(ORDER BY sal DESC) <= 5 * ERROR at line 5: ORA-30483: window functions are not allowed here

17 Carl Dudley University of Wolverhampton, UK WHERE Conditions (continued)  Use an inline view to force the early processing of the analytic SELECT * FROM (SELECT RANK() OVER(ORDER BY sal DESC) rank,sal,ename FROM emp) WHERE rank <= 5 ORDER BY rank,ename; RANK SAL ENAME KING FORD SCOTT JONES BLAKE —Inline view is processed before the WHERE clause

18 Carl Dudley University of Wolverhampton, UK Grouping, Aggregate Functions and Analytics  Rank the departments by number of employees SELECT deptno,COUNT(*) employees,RANK() OVER(ORDER BY COUNT(*) DESC) rank FROM emp GROUP BY deptno ORDER BY employees,deptno; DEPTNO EMPLOYEES RANK  Analytic functions are illegal in the HAVING clause —The workaround is the same; use an inline view —Ordering subclause may not reference a column alias

19 Carl Dudley University of Wolverhampton, UK Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance

20 Carl Dudley University of Wolverhampton, UK Partitioning  Analytic functions can be applied to logical groups within the result set rather than the full result set —Partitions — PARTITION BY specifies the grouping — ORDER BY specifies the ordering within each group —Not connected with database table partitioning  If partitioning is not specified, the full result set behaves as one partition  NULL values are grouped together in one partition, as in GROUP BY  Can have multiple analytic functions with different partitioning subclauses... OVER(PARTITION BY mgr ORDER BY sal DESC)

21 Carl Dudley University of Wolverhampton, UK Partitioning Example  Rank employees by salary within their manager SELECT ename,mgr,sal,RANK() OVER(PARTITION BY mgr ORDER BY sal DESC) m_rank FROM emp ORDER BY mgr,m_rank; ENAME MGR SAL M_RANK SCOTT FORD ALLEN TURNER WARD MARTIN JAMES MILLER ADAMS JONES BLAKE CLARK SMITH KING

22 Carl Dudley University of Wolverhampton, UK Result Sets With Different Partitioning  Rank the employees by salary within their manager, within the year they were hired, as well as overall SELECT ename,sal,manager,RANK() OVER(PARTITION BY mgr ORDER BY sal DESC) m_rank,TRUNC(TO_NUMBER(TO_CHAR(date_hired,'YYYY'))) year_hired,RANK() OVER(PARTITION BY TRUNC(TO_NUMBER(TO_CHAR(date_hired,'YYYY')) ORDER BY sal DESC) d_rank,RANK() OVER(ORDER BY sal DESC) rank FROM emp ORDER BY rank,ename;

23 Carl Dudley University of Wolverhampton, UK Result Sets With Different Partitioning (continued) ENAME SAL MGR M_RANK YEAR_HIRED D_RANK RANK KING FORD SCOTT JONES BLAKE CLARK ALLEN TURNER MILLER MARTIN WARD ADAMS JAMES SMITH

24 Carl Dudley University of Wolverhampton, UK Hypothetical Rank  Rank a specified hypothetical value (2999) in a group ('what-if' query) SELECT RANK(2999) WITHIN GROUP (ORDER BY sal DESC) H_S_rank,PERCENT_RANK(2999) WITHIN GROUP (ORDER BY sal DESC) PR,CUME_DIST(2999) WITHIN GROUP (ORDER BY sal DESC) CD FROM emp; H_S_RANK PR CD /14 4/15 SELECT deptno,RANK(20,'CLERK') WITHIN GROUP (ORDER BY deptno DESC,job ASC) H_D_J_rank FROM emp GROUP BY deptno; DEPTNO H_D_J_RANK A clerk in 20 would be higher than anyone in 10 A clerk would be third in ascending job order in department 20 (below analysts) A clerk in 20 would be lower than anyone in 30 (6 employees)

25 Carl Dudley University of Wolverhampton, UK Frequent Itemsets ( dbms_frequent_itemset )  Typical question —When a customer buys product x, how likely are they to also buy product y? SELECT CAST(itemset AS fi_char) itemset,support,length,total_tranx FROM TABLE(DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL( CURSOR(SELECT TO_CHAR(sales.cust_id),TO_CHAR(sales.prod_id) FROM sh.sales,sh.products WHERE products.prod_id = sales.prod_id AND products.prod_subcategory = 'Documentation'), 0.5, 2, 3, NULL, NULL)); ITEMSET SUPPORT LENGTH TOTAL_TRANX FI_CHAR('40', '41') FI_CHAR('40', '42') FI_CHAR('40', '45') FI_CHAR('41', '42') FI_CHAR('40', '41', '42') mimimum items in setmaximum items in set2 or 3 items per set Number of instances exclude itemsinclude itemsNumber of Different customers Minimum fraction of different 'Documentation' customers having this combination

26 Carl Dudley University of Wolverhampton, UK Frequent Itemsets (continued)  Need to create type to accommodate the set —Ranking functions can be applied to the itemset CREATE TYPE fi_char AS TABLE OF VARCHAR2(100);  Itemsets containing certain items can be included/excluded,CURSOR(SELECT * FROM table(fi_char(40,45))),CURSOR(SELECT * FROM table(fi_char(42))) Include any sets involving 40 or 45 Exclude any sets involving 42 —Ranking functions can be applied to the itemset SELECT COUNT(DISTINCT cust_id) FROM sales WHERE prod_id BETWEEN 40 AND 45; COUNT(DISTINCTCUST_ID)  The total transactions ( TOTAL_TRANX ) is the number of different customers involved with any product within the set of products under examination prod_id s for 'Documentation'

27 Carl Dudley University of Wolverhampton, UK Plan of Itemset Query  Only one full table scan of sales |Id | Operation | Name |Rows | | 0| SELECT STATEMENT | | 8| | 1| FIC RECURSIVE ITERATION | | | | 2| FIC LOAD ITEMSETS | | | | 3| FREQUENT ITEMSET COUNTING | | 8| | 4| SORT GROUP BY NOSORT | | | | 5| BITMAP CONVERSION COUNT | | | | 6| FIC LOAD BITMAPS | | | | 7| SORT CREATE INDEX | | 500| | 8| BITMAP CONSTRUCTION | | | | 9| FIC ENUMERATE FEED | | | | 10| SORT ORDER BY | |43755| |*11| HASH JOIN | |43755| | 12| TABLE ACCESS BY INDEX ROWID| PRODUCTS | 3 | |*13| INDEX RANGE SCAN | PRODUCTS_PROD_SUBCAT_IX | 3 | | 14| PARTITION RANGE ALL | | 918K| | 15| TABLE ACCESS FULL | SALES | 918K| | 16| TABLE ACCESS FULL | SYS_TEMP_0FD9D6605_153B1EE| |

28 Carl Dudley University of Wolverhampton, UK Applying Analytics to Frequent Itemsets SELECT itemset, support, length, total_tranx, rnk FROM (SELECT itemset, support, length, total_tranx,RANK() OVER (PARTITION BY length ORDER BY support DESC) rnk FROM (SELECT CAST(ITEMSET AS fi_char) itemset,support,length,total_tranx FROM TABLE(dbms_frequent_itemset.fi_transactional (CURSOR(SELECT TO_CHAR(sales.cust_id),TO_CHAR(sales.prod_id) FROM sh.sales,sh.products WHERE products.prod_id = sales.prod_id AND products.prod_subcategory = 'Documentation'),0.5,2,3,NULL,NULL)))) WHERE rnk < 4; ITEMSET SUPPORT LENGTH TOTAL_TRANX RNK FI_CHAR('40', '42') FI_CHAR('40', '41') FI_CHAR('40', '45') FI_CHAR('40', '41', '42')

29 Carl Dudley University of Wolverhampton, UK Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance

Window Partition (first) or entire result set Partition (second) OVER (ORDER BY col_name) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Default value for window setting - produces an expanding window Expanding Windows

Partition (first) or entire result set OVER (ORDER BY col_name) ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING Produces a sliding window Window 5 ROWS Partition (second) 3 ROWS Sliding Windows

32 Carl Dudley University of Wolverhampton, UK Aggregate Functions  Aggregate functions can be used as analytic functions —Must be embedded in the OVER clause  Analytic aggregate values can be easily included within row-level reports —Analytic functions are applied after computation of result set —Optimizer often produces a better execution plan  Aggregate level is determined by the partitioning subclause —Similar effect to GROUP BY clause —If no partitioning subclause, aggregate is across the complete result set

33 Carl Dudley University of Wolverhampton, UK Aggregate Functions – the OVER Clause  Could easily include row-level data —e.g. ename and sal SELECT deptno,AVG(sal) FROM emp GROUP BY deptno; DEPTNO AVG(SAL) SELECT deptno,AVG(sal) OVER (PARTITION BY deptno) avg_dept,AVG(sal) OVER () avg_all FROM emp; DEPTNO AVG_DEPT AVG_ALL Analytic aggregates cause no reduction in rows No subclause

34 Carl Dudley University of Wolverhampton, UK Analytic versus Conventional SQL Performance  The requirement —Data at different levels of grouping ENAME SAL DEPTNO AVG_DEPT AVG_ALL CLARK KING MILLER JONES FORD ADAMS SMITH SCOTT WARD TURNER ALLEN JAMES BLAKE MARTIN Average sal per department Overall average sal

35 Carl Dudley University of Wolverhampton, UK Conventional SQL Performance SELECT r.ename,r.sal,g.deptno,g.ave_dept,a.ave_all FROM emp r,(SELECT deptno,AVG(sal) ave_dept FROM emp GROUP BY deptno) g,(SELECT AVG(sal) ave_all FROM emp) a WHERE g.deptno = r.deptno ORDER BY r.deptno; | Id | Operation | Name | Rows | | 0 | SELECT STATEMENT | | 15 | | 1 | MERGE JOIN | | 15 | | 2 | SORT JOIN | | 3 | | 3 | NESTED LOOPS | | 3 | | 4 | VIEW | | 1 | | 5 | SORT AGGREGATE | | 1 | | 6 | TABLE ACCESS FULL| EMP | 14 | | 7 | VIEW | | 3 | | 8 | SORT GROUP BY | | 3 | | 9 | TABLE ACCESS FULL| EMP | 14 | |* 10 | SORT JOIN | | 14 | | 11 | TABLE ACCESS FULL | EMP | 14 | M row emp table : seconds consistent gets

36 Carl Dudley University of Wolverhampton, UK Analytic Function Performance SELECT ename,sal,deptno,AVG(sal) OVER (PARTITION BY deptno) ave_dept,AVG(sal) OVER () ave_all FROM emp; | Id | Operation | Name | Rows | | 0 | SELECT STATEMENT | | 14 | | 1 | WINDOW SORT | | 14 | | 2 | TABLE ACCESS FULL| EMP | 14 | M row emp table : seconds consistent gets

37 Carl Dudley University of Wolverhampton, UK Aggregating Over an Ordered Set of Rows – Running Totals  The ORDER BY clause creates an expanding window (running total) of rows SELECT empno,ename,sal,SUM(sal) OVER(ORDER BY empno ) run_total FROM emp5 ORDER BY empno; EMPNO ENAME SAL RUN_TOTAL SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER : : : : |Id| Operation | Name| | 0| SELECT STATEMENT | | | 1| WINDOW SORT | | | 2| TABLE ACCESS FULL| EMP5| emp table of 5000 rows 0.07 seconds 33 consistent gets No index necessary

38 Carl Dudley University of Wolverhampton, UK Running Total With Conventional SQL (1)  Self-join solution SELECT e1.empno,e1.sal,SUM(e2.sal) FROM emp5 e1, emp5 e2 WHERE e2.empno <= e1.empno GROUP BY e1.empno, e1.sal ORDER BY e1.empno; | Id | Operation | Name | | 0 | SELECT STATEMENT | | | 1 | SORT GROUP BY | | | 2 | MERGE JOIN | | | 3 | SORT JOIN | | | 4 | TABLE ACCESS BY INDEX ROWID| EMP5 | | 5 | INDEX FULL SCAN | PK_EMP5| |* 6 | SORT JOIN | | | 7 | TABLE ACCESS FULL | EMP5 | seconds 66 consistent gets

39 Carl Dudley University of Wolverhampton, UK Running Total With Conventional SQL (2)  Subquery in SELECT list solution – column expression SELECT empno,ename,sal,(SELECT SUM(sal) sumsal FROM emp5 WHERE empno <= b.empno) a FROM emp5 b ORDER BY empno; | Id | Operation | Name | | 0 | SELECT STATEMENT | | | 1 | SORT AGGREGATE | | | 2 | TABLE ACCESS BY INDEX ROWID| EMP5 | |* 3 | INDEX RANGE SCAN | PK_EMP5| | 4 | TABLE ACCESS BY INDEX ROWID | EMP5 | | 5 | INDEX FULL SCAN | PK_EMP5| seconds consistent gets

40 Carl Dudley University of Wolverhampton, UK Aggregate Functions With Partitioning  Find average salary of employees within each manager —Use PARTITION BY to specify the grouping SELECT ename, mgr, sal,ROUND(AVG(sal) OVER(PARTITION BY mgr)) avgsal,sal - ROUND(AVG(sal) OVER(PARTITION BY mgr)) diff FROM emp; ENAME MGR SAL AVGSAL DIFF SCOTT FORD ALLEN WARD JAMES TURNER MARTIN MILLER ADAMS JONES CLARK BLAKE SMITH KING

41 Carl Dudley University of Wolverhampton, UK SELECT deptno,SUM(sal),SUM(SUM(sal)) OVER () Totsal,SUM(SUM(sal)) OVER (ORDER BY deptno) Runtot_deptno,SUM(SUM(sal)) OVER (ORDER BY SUM(sal)) Runtot_sumsal FROM emp GROUP BY deptno ORDER BY deptno; DEPTNO SUM(SAL) TOTSAL RUNTOT_DEPTNO RUNTOT_SUMSAL Analytics on Aggregates  Analytics are processed last + sum(20 ) + sum(30 )

42 Carl Dudley University of Wolverhampton, UK Aggregate Functions and the WHERE clause  Analytic functions are applied after production of the complete result set —Rows excluded by the WHERE clause are not included in the aggregate value  Include only employees whose name starts with a ‘S’ or ‘M’ —The average is now only for those rows starting with 'S' or 'M' SELECT ename,sal,ROUND(AVG(sal) OVER()) avgsal,sal - ROUND(AVG(sal) OVER()) diff FROM emp WHERE ename LIKE 'S%' OR ename LIKE 'M%'; ENAME SAL AGSAL DIFF SMITH MARTIN SCOTT MILLER

43 Carl Dudley University of Wolverhampton, UK RATIO_TO_REPORT  Each row’s fraction of total salary can easily be found when the total salary value is available —Example: sal/SUM(sal) OVER() —The function RATIO_TO_REPORT performs this calculation SELECT ename,sal,SUM(sal) OVER() sumsal,sal/SUM(sal) OVER() ratio,RATIO_TO_REPORT(sal) OVER() ratio_rep FROM emp;

44 Carl Dudley University of Wolverhampton, UK RATIO_TO_REPORT (continued)  The query on the previous slide gives this result ENAME SAL SUMSAL RATIO RATIO_REP SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER

45 Carl Dudley University of Wolverhampton, UK Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance

46 Carl Dudley University of Wolverhampton, UK Sliding Windows  The OVER clause can have a sliding window subclause —Not permitted without ORDER BY subclause —Specifies size of window (set of rows) to be processed by the analytic function —Window defined relative to current row Slides through result set as different rows become current  Size of window is governed by ROWS or RANGE — ROWS physical offset, a number of rows relative to the current row — RANGE logical offset, a value interval relative to value in current row  Syntax for sliding window : — BETWEEN AND

47 Carl Dudley University of Wolverhampton, UK Sliding Windows Example  For each employee, show the sum of the salaries of the preceding, current, and following employee (row) —Window includes current row as well as the preceding and following ones —Must have order subclause for “preceding” and “following” to be meaningful —First row has no preceding row and last row has no following row SELECT ename,sal,SUM(sal) OVER(ORDER BY sal DESC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) sal_window FROM emp ORDER BY sal DESC,ename;

48 Carl Dudley University of Wolverhampton, UK Sliding Windows Example (continued) Calculation: = = = = = = = = = = = = = = ENAME SAL SAL_WINDOW KING FORD SCOTT JONES BLAKE CLARK ALLEN TURNER MILLER MARTIN WARD ADAMS JAMES SMITH

49 Carl Dudley University of Wolverhampton, UK Partitioned Sliding Windows  Partitioning can be used with sliding windows —A sliding window does not span partitions SELECT ename,job,sal,SUM(sal) OVER(PARTITION BY job ORDER BY sal DESC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) sal_window FROM emp ORDER BY job,sal DESC,ename;

50 Carl Dudley University of Wolverhampton, UK ENAME JOB SAL SAL_WINDOW FORD ANALYST SCOTT ANALYST MILLER CLERK ADAMS CLERK JAMES CLERK SMITH CLERK JONES MANAGER BLAKE MANAGER CLARK MANAGER KING PRESIDENT ALLEN SALESMAN TURNER SALESMAN MARTIN SALESMAN WARD SALESMAN Partitioned Sliding Windows (continued) Calculation = = = = = = = = =5000 = = = =

51 Carl Dudley University of Wolverhampton, UK Sliding Window With Logical ( RANGE ) Offset  Physical offset —Specified number of rows  Logical offset —A RANGE of values Numeric or date —Values in the ordering column indirectly determine number of rows in window SELECT ename,sal,SUM(sal) OVER(ORDER BY sal DESC RANGE BETWEEN 150 PRECEDING AND 75 FOLLOWING) sal_window FROM emp ORDER BY sal DESC,ename;

52 Carl Dudley University of Wolverhampton, UK Sliding Window With Logical ( RANGE ) Offset (continued) ENAME SAL SAL_WINDOW KING FORD SCOTT JONES BLAKE CLARK ALLEN TURNER MILLER MARTIN WARD ADAMS JAMES SMITH Range for this row is 3000 to 2775

53 Carl Dudley University of Wolverhampton, UK UNBOUNDED and CURRENT ROW  Sliding windows have starting and ending points — BETWEEN AND  Ways for specifying starting and ending points — UNBOUNDED PRECEDING specifies the first row as starting point — UNBOUNDED FOLLOWING specifies the last row as ending point — CURRENT ROW specifies the current row  Create a window that grows with each row in ename order —The RANGE clause is not necessary if a running total is required (default) SELECT ename,sal,SUM(sal) OVER(ORDER BY ename RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) run_total FROM emp ORDER BY ename;

54 Carl Dudley University of Wolverhampton, UK Keywords UNBOUNDED and CURRENT ROW (continued)  Running Total —Produced by default 'expanding' window when window not specified ENAME SAL RUN_TOTAL ADAMS ALLEN BLAKE CLARK FORD JAMES JONES KING MARTIN MILLER SCOTT SMITH TURNER WARD Explanation: =1100 = = = = = = = = = = = = =

55 Carl Dudley University of Wolverhampton, UK Keywords UNBOUNDED and CURRENT ROW (continued)  Be aware of the subtle difference between RANGE and ROWS in this context —Apparent only when adjacent rows have equal values SELECT ename,sal,SUM(sal) OVER(ORDER BY sal DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) row_tot,SUM(sal) OVER(ORDER BY sal DESC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) range_tot,SUM(sal) OVER(ORDER BY sal DESC) default_tot FROM EMP ORDER BY sal DESC,ename;

56 Carl Dudley University of Wolverhampton, UK Difference between ROWS and RANGE  Ford and Scott fall within the same range - also applies to Martin and Ward —For example Scott is included in range when the value for Ford is calculated ENAME SAL ROW_TOT RANGE_TOT DEFAULT_TOT KING FORD SCOTT JONES BLAKE CLARK ALLEN TURNER MILLER MARTIN WARD ADAMS JAMES SMITH

57 Carl Dudley University of Wolverhampton, UK Time Intervals  Sliding windows are often based on time intervals  Example: Compare the salary of each employee to the maximum and minimum salaries of hirings made within three months of their own hiring date SELECT ename,hiredate,sal,MIN(sal) OVER(ORDER BY hiredate RANGE BETWEEN INTERVAL '3' MONTH PRECEDING AND INTERVAL '3' MONTH FOLLOWING) min,MAX(sal) OVER(ORDER BY hiredate RANGE BETWEEN INTERVAL '3' MONTH PRECEDING AND INTERVAL '3' MONTH FOLLOWING) max FROM emp;

58 Carl Dudley University of Wolverhampton, UK Time Intervals (continued)  Sliding time window ENAME HIREDATE SAL MIN MAX SMITH 17-DEC ALLEN 20-FEB WARD 22-FEB JONES 02-APR BLAKE 01-MAY CLARK 09-JUN TURNER 08-SEP MARTIN 28-SEP KING 17-NOV JAMES 03-DEC FORD 03-DEC MILLER 23-JAN SCOTT 09-DEC ADAMS 12-JAN

59 Carl Dudley University of Wolverhampton, UK Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance

60 Carl Dudley University of Wolverhampton, UK LAG and LEAD Functions  Useful for comparing values across rows —Need to specify count of rows which separate target row from current row No need for self-join — LAG provides access to a row at a given offset prior to the current position — LEAD provides access to a row at a given offset after the current position — offset is an optional parameter and defaults to 1 — default is an optional parameter and is the value returned if offset falls outside the bounds of the table or partition In this case, NULL will be returned if no default is specified {LAG | LEAD} ( value_expr [, offset] [, default] ) OVER ( [query_partition_clause] order_by_clause )

61 Carl Dudley University of Wolverhampton, UK LAG / LEAD Simple Example SELECT hiredate,sal AS salary,LAG(sal,1) OVER (ORDER BY hiredate) AS LAG1,LEAD(sal,1) OVER (ORDER BY hiredate) AS LEAD1 FROM emp; HIREDATE SALARY LAG1 LEAD DEC FEB FEB APR MAY JUN SEP SEP NOV DEC DEC JAN DEC JAN Comparison of salaries with those for nearest recruits in terms of proximity of hiredates

62 Carl Dudley University of Wolverhampton, UK FIRST_VALUE and LAST_VALUE  Hold first or last value in a partition (based on ordering) as a start point SELECT empno, deptno, hiredate,FIRST_VALUE(hiredate) OVER (PARTITION BY deptno ORDER BY hiredate) firstdate,hiredate - FIRST_VALUE(hiredate) OVER (PARTITION BY deptno ORDER BY hiredate) Day_Gap FROM emp ORDER BY deptno, Day_Gap; Days after hiring of first employee in this department EMPNO DEPTNO HIREDATE FIRSTDATE DAY_GAP JUN JUN NOV JUN JAN JUN DEC DEC APR DEC DEC DEC DEC DEC JAN DEC FEB FEB FEB FEB MAY FEB SEP FEB SEP FEB DEC FEB Works with partitioning and windowing subclauses

63 Carl Dudley University of Wolverhampton, UK SELECT deptno,ename,sal,LAST_VALUE(ename) OVER (PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS hsal1,LAST_VALUE(ename) OVER (PARTITION BY deptno ORDER BY sal) AS hsal2 FROM emp ORDER BY deptno,sal; DEPTNO ENAME SAL HSAL1 HSAL MILLER 1300 KING MILLER 10 CLARK 2450 KING CLARK 10 KING 5000 KING KING 20 SMITH 800 SCOTT SMITH 20 ADAMS 1100 SCOTT ADAMS 20 JONES 2975 SCOTT JONES 20 FORD 3000 SCOTT SCOTT 20 SCOTT 3000 SCOTT SCOTT 30 JAMES 950 BLAKE JAMES 30 MARTIN 1250 BLAKE WARD 30 WARD 1250 BLAKE WARD 30 TURNER 1500 BLAKE TURNER 30 ALLEN 1600 BLAKE ALLEN 30 BLAKE 2850 BLAKE BLAKE Influence of Window on LAST_VALUE Last value in expanding window (based on range )

64 Carl Dudley University of Wolverhampton, UK Ignoring Nulls in First and Last Values SELECT ename,FIRST_VALUE (ename) OVER (PARTITION BY deptno ORDER BY ename) fv,LAST_VALUE (ename) OVER (PARTITION BY deptno ORDER BY ename) lv,comm,FIRST_VALUE (comm) OVER (PARTITION BY deptno ORDER BY comm) fv_comm,LAST_VALUE (comm) OVER (PARTITION BY deptno ORDER BY comm) lv_comm,LAST_VALUE (comm IGNORE NULLS) OVER (PARTITION BY deptno ORDER BY comm) lv_ignore FROM emp WHERE deptno = 30; ENAME FV LV COMM FV_COMM LV_COMM LV_IGNORE ALLEN ALLEN ALLEN BLAKE ALLEN BLAKE JAMES ALLEN JAMES MARTIN ALLEN MARTIN TURNER ALLEN TURNER WARD ALLEN WARD Highest value (1400) is 'kept' for null values

65 Carl Dudley University of Wolverhampton, UK NTH_VALUE SELECT deptno,ename,sal,FIRST_VALUE(sal) OVER (PARTITION BY deptno ORDER BY sal DESC) - NTH_VALUE(sal,2) FROM FIRST OVER (PARTITION BY deptno ORDER BY sal DESC) t2_diff FROM emp; DEPTNO ENAME SAL T2_DIFF KING CLARK MILLER SCOTT FORD JONES ADAMS SMITH BLAKE ALLEN TURNER MARTIN WARD JAMES Could use FROM LAST Reports difference between first and second member of each partition 0?? SELECT deptno,ename,sal,FIRST_VALUE(sal) OVER (PARTITION BY deptno ORDER BY sal DESC) - NTH_VALUE(sal,3) FROM FIRST OVER (PARTITION BY deptno ORDER BY sal DESC) t2_diff FROM emp; DEPTNO ENAME SAL T2_DIFF KING CLARK MILLER SCOTT FORD JONES ADAMS SMITH BLAKE ALLEN TURNER MARTIN WARD JAMES

66 Carl Dudley University of Wolverhampton, UK LISTAGG Function  Example - show columns in indexes in an ordered list SELECT table_name,index_name,LISTAGG(column_name,’;’) WITHIN GROUP ( ORDER BY column_position) “Column List” FROM user_ind_columns GROUP BY table_name,index_name; TABLE_NAME INDEX_NAME Column List EMP EMP_PK EMPNO PROJ_ASST SYS_C PROJNO;EMPNO;START_DATE DEPT DEPT$DIVNO_DEPTNO DIVNO;DEPTNO

67 Carl Dudley University of Wolverhampton, UK FIRST and LAST  Compare each employee's salary with the average salary of the first year of hirings of their department —Must use KEEP —Must use DENSE_RANK SELECT empno,deptno,TO_CHAR(hiredate,'YYYY') Hire_Yr,sal,TRUNC(AVG(sal) KEEP (DENSE_RANK FIRST ORDER BY TO_CHAR(hiredate,'YYYY') ) OVER (PARTITION BY deptno)) Avg_Sal_Yr1_Hire FROM emp ORDER BY deptno, empno, Hire_Yr; EMPNO DEPTNO HIRE_YR SAL AVG_SAL_YR1_HIRE

68 Carl Dudley University of Wolverhampton, UK FIRST and LAST (Continued)  Compare salaries to the average of the ' LAST ' department —Note no ORDER BY inside the OVER clause —No support for any clause SELECT empno,deptno,TO_CHAR(hiredate,'YYYY') Hire_Yr,sal,TRUNC(AVG(sal) KEEP (DENSE_RANK LAST ORDER BY deptno ) OVER () ) AVG_SAL_LAST_DEPT FROM emp ORDER BY deptno, empno, Hire_Yr; EMPNO DEPTNO Hire_Yr SAL AVG_SAL_LAST_DEPT

69 Carl Dudley University of Wolverhampton, UK Bus Times SELECT route,stop,bus,TO_CHAR(bustime,'DD-MON-YYYY HH24.MI.SS') bustime FROM bustimes ORDER BY route,stop,bustime; ROUTE STOP BUS BUSTIME MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR MAR Times for 5 buses stopping at 5 stops on route 1

70 Carl Dudley University of Wolverhampton, UK Journey Times of Buses Between Stops SELECT route,stop,bus,TO_CHAR(bustime,'dd/mm/yy hh24:mi:ss') bus_stop_time,TO_CHAR(LAG(bustime,1) OVER (PARTITION BY bus ORDER BY route,stop,bustime),'dd/mm/yy hh24:mi:ss') prev_bus_stop_time,SUBSTR(NUMTODSINTERVAL(bustime - LAG(bustime,1) OVER (PARTITION BY bus ORDER BY route,stop,bustime),'DAY'),12,8) time_between_stops,SUBSTR(NUMTODSINTERVAL(bustime - FIRST_VALUE(bustime) OVER (PARTITION BY bus ORDER BY route,stop,bustime),'DAY'),12,8) jrny_time FROM bustimes;

71 Carl Dudley University of Wolverhampton, UK Journey Times of Buses Between Stops (cont'd) ROUTE STOP BUS BUS_STOP_TIME PREV_BUS_STOP_TIM TIME_BET JRNY_TIM /03/11 12:17:33 00:00: /03/11 12:56:19 01/03/11 12:17:33 00:38:46 00:38: /03/11 13:58:53 01/03/11 12:56:19 01:02:34 01:41: /03/11 14:17:33 01/03/11 13:58:53 00:18:40 02:00: /03/11 16:30:21 01/03/11 14:17:33 02:12:48 04:12: /03/11 13:58:41 00:00: /03/11 14:31:04 01/03/11 13:58:41 00:32:23 00:32: /03/11 14:58:41 01/03/11 14:31:04 00:27:37 01:00: /03/11 15:42:25 01/03/11 14:58:41 00:43:44 01:43: /03/11 16:18:09 01/03/11 15:42:25 00:35:44 02:19: /03/11 12:58:10 00:00: /03/11 13:00:09 01/03/11 12:58:10 00:01:59 00:01: /03/11 15:28:33 01/03/11 13:00:09 02:28:24 02:30: /03/11 15:30:30 01/03/11 15:28:33 00:01:57 02:32: /03/11 16:47:58 01/03/11 15:30:30 01:17:28 03:49: /03/11 14:06:13 00:00: /03/11 14:20:45 01/03/11 14:06:13 00:14:32 00:14: /03/11 14:35:58 01/03/11 14:20:45 00:15:13 00:29: /03/11 15:11:26 01/03/11 14:35:58 00:35:28 01:05: /03/11 15:51:14 01/03/11 15:11:26 00:39:48 01:45: /03/11 14:11:45 00:00: /03/11 14:24:01 01/03/11 14:11:45 00:12:16 00:12: /03/11 15:18:09 01/03/11 14:24:01 00:54:08 01:06: /03/11 15:55:54 01/03/11 15:18:09 00:37:45 01:44: /03/11 16:02:19 01/03/11 15:55:54 00:06:25 01:50:34

72 Carl Dudley University of Wolverhampton, UK Average Wait Times for a Bus SELECT v.route,v.stop,v.bus,v.bustime,v.prev_bus_time,SUBSTR(NUMTODSINTERVAL(v.numgap,'DAY'),12,8) wait_for_next_bus,CASE WHEN bustime = FIRST_VALUE(bustime) OVER (PARTITION BY stop ORDER BY route,stop,bustime) THEN SUBSTR(NUMTODSINTERVAL(AVG(v.numgap) OVER (PARTITION BY stop),'DAY'),12,8) ELSE NULL END ave_wait FROM (SELECT route,stop,bus,TO_CHAR(bustime,'dd/mm/yy hh24:mi:ss') bustime,TO_CHAR(LAG(bustime,1) OVER (PARTITION BY stop ORDER BY route,stop,bustime),'dd/mm/yy hh24:mi:ss') prev_bus_time,bustime - LAG(bustime,1) OVER (PARTITION BY stop ORDER BY route,stop,bustime) numgap FROM bustimes) v;

73 Carl Dudley University of Wolverhampton, UK Average Waiting Times for a Bus (continued) ROUTE STOP BUS BUSTIME PREV_BUS_TIME WAIT_FOR AVE_WAIT /03/11 12:17:33 00:28: /03/11 12:58:10 01/03/11 12:17:33 00:40: /03/11 13:58:41 01/03/11 12:58:10 01:00: /03/11 14:06:13 01/03/11 13:58:41 00:07: /03/11 14:11:45 01/03/11 14:06:13 00:05: /03/11 12:56:19 00:23: /03/11 13:00:09 01/03/11 12:56:19 00:03: /03/11 14:20:45 01/03/11 13:00:09 01:20: /03/11 14:24:01 01/03/11 14:20:45 00:03: /03/11 14:31:04 01/03/11 14:24:01 00:07: /03/11 13:58:53 00:22: /03/11 14:35:58 01/03/11 13:58:53 00:37: /03/11 14:58:41 01/03/11 14:35:58 00:22: /03/11 15:18:09 01/03/11 14:58:41 00:19: /03/11 15:28:33 01/03/11 15:18:09 00:10: /03/11 14:17:33 00:24: /03/11 15:11:26 01/03/11 14:17:33 00:53: /03/11 15:30:30 01/03/11 15:11:26 00:19: /03/11 15:42:25 01/03/11 15:30:30 00:11: /03/11 15:55:54 01/03/11 15:42:25 00:13: /03/11 15:51:14 00:14: /03/11 16:02:19 01/03/11 15:51:14 00:11: /03/11 16:18:09 01/03/11 16:02:19 00:15: /03/11 16:30:21 01/03/11 16:18:09 00:12: /03/11 16:47:58 01/03/11 16:30:21 00:17:37

74 Carl Dudley University of Wolverhampton, UK Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance

75 Carl Dudley University of Wolverhampton, UK Finding Holes in 'Sequences' SELECT DISTINCT prod_id FROM sales ORDER BY prod_id; PROD_ID : : SELECT prod_id,next_prod_id FROM ( SELECT prod_id,LEAD(prod_id) OVER(ORDER BY prod_id) next_prod_id FROM sales) WHERE next_prod_id - prod_id > 1; PROD_ID NEXT_PROD_ID  Sales table has rows —Gap in prod_id s from 48 to 113 Elapsed time : 3.17 secs

76 Carl Dudley University of Wolverhampton, UK Eliminating Duplicate rows  dup_emp table has rows with unique empno values and no primary key — dup_emp now has one extra duplicate row  Use conventional SQL to eliminate the duplicate row INSERT INTO dup_emp SELECT * FROM dup_emp WHERE empno = 1; DELETE FROM dup_emp y WHERE ROWID <>(SELECT MAX(ROWID) FROM dup_emp WHERE y.empno = empno); 1 row deleted. Elapsed: 00:01:38.76   | Id | Operation | Name | Rows |   | 0 | DELETE STATEMENT | | 3670K|  | 1 | DELETE | DUP_EMP | |  |* 2 | HASH JOIN | | 3670K|  | 3 | VIEW | VW_SQ_1 | 3670K|  | 4 | SORT GROUP BY | | 3670K|  | 5 | TABLE ACCESS FULL| DUP_EMP | 3670K|  | 6 | TABLE ACCESS FULL | DUP_EMP | 3670K| 

77 Carl Dudley University of Wolverhampton, UK Eliminating Duplicate rows (continued)  Use the ranking function to efficiently eliminate the same duplicate row — ORDER BY clause is necessary so NULL is used as a dummy DELETE FROM dup_emp WHERE ROWID IN (SELECT rid FROM (SELECT ROWID rid,ROW_NUMBER() OVER (PARTITION BY empno ORDER BY NULL) rnk FROM dup_emp) WHERE rnk > 1); 1 row deleted. Elapsed: 00:00: | Id | Operation | Name | Rows | | 0 | DELETE STATEMENT | | 1 | | 1 | DELETE | DUP_EMP | | | 2 | NESTED LOOPS | | 1 | | 3 | VIEW | VW_NSO_1 | 3670K| | 4 | SORT UNIQUE | | 1 | |* 5 | VIEW | | 3670K| | 6 | WINDOW SORT | | 3670K| | 7 | TABLE ACCESS FULL | DUP_EMP | 3670K| | 8 | TABLE ACCESS BY USER ROWID| DUP_EMP | 1 | Similar story with index on empno

78 Carl Dudley University of Wolverhampton, UK Analytic Function Performance  Example based on sales table in sh schema — rows, 72 different prod_id s PROD_ID CUST_ID TIME_ID CHANNEL_ID PROMO_ID QUANTITY_SOLD AMOUNT_SOLD FEB MAR JUL SEP JUL OCT OCT MAR NOV DEC FEB FEB JUN AUG JAN FEB JUN SEP : : : : : : :

79 Carl Dudley University of Wolverhampton, UK Analytic Function Performance - Scenario  Number of times products are on order SELECT prod_id,COUNT(*) FROM sh.sales GROUP BY prod_id; PROD_ID COUNT(*) : :

80 Carl Dudley University of Wolverhampton, UK nth Best Product – "Conventional" SQL Solution  Find nth ranked product in terms of numbers of orders for each product PROD_ID YCNT Elapsed: 00:00:24.09 SELECT prod_id,ycnt FROM (SELECT prod_id,COUNT(*) ycnt FROM sh.sales y GROUP BY prod_id) v WHERE &position - 1 = (SELECT COUNT(*) FROM (SELECT COUNT(*) zcnt FROM sh.sales z GROUP BY prod_id) w WHERE w.zcnt > v.ycnt); 5

81 Carl Dudley University of Wolverhampton, UK "Conventional" SQL Solution - Trace | Id | Operation | Name | Rows | Cost | | 0 | SELECT STATEMENT | | 72 | 134| |* 1 | FILTER | | | | | 2 | VIEW | | 72 | 67| | 3 | HASH GROUP BY | | 72 | 67| | 4 | PARTITION RANGE ALL | | 918K| 29| | 5 | BITMAP CONVERSION COUNT | | 918K| 29| | 6 | BITMAP INDEX FAST FULL SCAN | SALES_PROD_BIX | | | | 7 | SORT AGGREGATE | | 1 | | | 8 | VIEW | | 4 | 67| |* 9 | FILTER | | | | | 10 | SORT GROUP BY | | 4 | 67| | 11 | PARTITION RANGE ALL | | 918K| 29| | 12 | BITMAP CONVERSION TO ROWIDS | | 918K| 29| | 13 | BITMAP INDEX FAST FULL SCAN| SALES_PROD_BIX | | | Predicate Information (identified by operation id): filter( (SELECT COUNT(*) FROM (SELECT COUNT(*) "ZCNT" FROM "SH"."SALES" "Z" GROUP BY "PROD_ID" HAVING COUNT(*)>:B1) "W")=4) 9 - filter(COUNT(*)>:B1) Statistics consistent gets 72 sorts (memory)

82 Carl Dudley University of Wolverhampton, UK nth Best Product – "Failed" SQL Solution  Find nth ranked product in terms of numbers of orders for each product SELECT prod_id,ycnt FROM (SELECT prod_id,COUNT(*) ycnt FROM sh.sales y GROUP BY prod_id) v WHERE &position - 1 = (SELECT COUNT(*) FROM (SELECT ycnt FROM v) w WHERE w.ycnt > v.ycnt); * ERROR at line 8: ORA-04044: procedure, function, package, or type is not allowed here

83 Carl Dudley University of Wolverhampton, UK nth Best Product – Factored Subquery Solution  Find nth ranked product in terms of numbers of orders for each product WITH v AS (SELECT prod_id,COUNT(*) ycnt FROM sh.sales y GROUP BY prod_id) SELECT prod_id,ycnt FROM v WHERE &position - 1 = (SELECT COUNT(*) FROM (SELECT ycnt FROM v) w WHERE w.ycnt > v.ycnt); PROD_ID YCNT Elapsed: 00:00:

84 Carl Dudley University of Wolverhampton, UK Factored Subquery Solution - Trace | Id | Operation | Name | Rows | Cost | | 0 | SELECT STATEMENT | | 1 | 71 | | 1 | TEMP TABLE TRANSFORMATION | | | | | 2 | LOAD AS SELECT | | | | | 3 | HASH GROUP BY | | 72 | 67 | | 4 | PARTITION RANGE ALL | | 918K| 29 | | 5 | BITMAP CONVERSION COUNT | | 918K| 29 | | 6 | BITMAP INDEX FAST FULL SCAN| SALES_PROD_BIX | | | |* 7 | FILTER | | | | | 8 | VIEW | | 72 | 2 | | 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_14D8441 | 72 | 2 | | 10 | SORT AGGREGATE | | 1 | | |* 11 | VIEW | | 72 | 2 | | 12 | TABLE ACCESS FULL | SYS_TEMP_0FD9D661A_14D8441 | 72 | 2 | Predicate Information (identified by operation id): filter( (SELECT COUNT(*) FROM (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0" "PROD_ID","C1" "YCNT" "SYS"."SYS_TEMP_0FD9D661A_14D8441" "T1") "V" WHERE "YCNT">:B1)=4) 11 - filter("YCNT">:B1) Statistics consistent gets 0 sorts (memory)

85 Carl Dudley University of Wolverhampton, UK nth Best Product – Analytic Function Solution  Find nth ranked product in terms of numbers of orders for each product SELECT prod_id,vcnt FROM (SELECT prod_id,vcnt,RANK() OVER (ORDER BY vcnt DESC) rnk FROM (SELECT prod_id,COUNT(*) vcnt FROM sh.sales z GROUP BY z.prod_id)) qry WHERE qry.rnk = &position; PROD_ID YCNT Elapsed: 00:00:

86 Carl Dudley University of Wolverhampton, UK Analytic Function Solution - Trace | Id | Operation | Name | Rows | Cost | | 0 | SELECT STATEMENT | | 72 | 105| |* 1 | VIEW | | 72 | 105| |* 2 | WINDOW SORT PUSHED RANK | | 72 | 105| | 3 | HASH GROUP BY | | 72 | 105| | 4 | PARTITION RANGE ALL | | 918K| 29| | 5 | BITMAP CONVERSION COUNT | | 918K| 29| | 6 | BITMAP INDEX FAST FULL SCAN| SALES_PROD_BIX | | | Predicate Information (identified by operation id): filter("QRY"."RNK"=5) 2 - filter(RANK() OVER ( ORDER BY COUNT(*) DESC )<=5) Statistics consistent gets 1 sorts (memory)

87 Carl Dudley University of Wolverhampton, UK Analytic Function Performance  Defining the PARTITION BY and ORDER BY clauses on indexed columns will provide optimum performance —For example, a composite index on ( deptno, hiredate ) columns will prove effective  Analytic functions still provide acceptable performance in absence of indexes but need to do sorting for computing based on partition and order by columns —If the query contains multiple analytic functions, sorting and partitioning on two different columns should be avoided if they are both not indexed

88 Carl Dudley University of Wolverhampton, UK Performance  Hiding analytics in views can prevent the use of indexes — SUM(sal) has to be computed across all rows before the analysis CREATE OR REPLACE VIEW vv AS SELECT *, SUM(sal) OVER (PARTITION BY deptno) Deptno_Sum_Sal FROM emp; SELECT * FROM vv WHERE empno = 7900; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO DEPTNO_SUM_SAL JAMES CLERK DEC | Id | Operation | Name | Rows | | 0 | SELECT STATEMENT | | 14 | |* 1 | VIEW | VV | 14 | | 2 | WINDOW SORT | | 14 | | 3 | TABLE ACCESS FULL| EMP | 14 | SELECT * FROM emp WHERE empno = 7900; | Id | Operation | Name | Rows | | 0 | SELECT STATEMENT | | 1 | | 1 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | |* 2 | INDEX UNIQUE SCAN | SYS_C | 1 |

89 Carl Dudley University of Wolverhampton, UK SELECT empno, ename, sal, deptno,SUM(sal) OVER (PARTITION BY deptno ORDER BY sal) sumsal FROM emp ORDER BY deptno, sal; EMPNO ENAME SAL DEPTNO SUMSAL MILLER CLARK KING SMITH ADAMS JONES SCOTT FORD JAMES MARTIN WARD TURNER ALLEN BLAKE Default window is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Steamy Windows

90 Carl Dudley University of Wolverhampton, UK SELECT empno, ename, sal, deptno,SUM(sal) OVER (PARTITION BY deptno ORDER BY sal) sumsal FROM emp WHERE ename LIKE '%M%' ORDER BY deptno,sal EMPNO ENAME SAL DEPTNO SUMSAL MILLER SMITH ADAMS JAMES MARTIN SELECT * FROM (SELECT empno, ename, sal, deptno,SUM(sal) OVER (PARTITION BY deptno ORDER BY sal) sumsal FROM emp ) WHERE ename LIKE '%M%' ORDER BY deptno,sal; EMPNO ENAME SAL DEPTNO SUMSAL MILLER SMITH ADAMS JAMES MARTIN Steamy Windows (continued) Includes WARD who is in department 30 and has a salary of which is within the RANGE with MARTIN

91 Carl Dudley University of Wolverhampton, UK In the Final Analysis So we have discussed  The ranking of data using analytic functions  Partitioning datasets from queries  Using aggregate functions in analytic scenarios  How to apply sliding windows to query results  Comparing values across rows  Performance characteristics

92 Carl Dudley University of Wolverhampton, UK Analytic Functions Carl Dudley University of Wolverhampton, UK UKOUG Council Oracle ACE Director