Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright  All material contained herein is owned by Daniel Stober, the author of this presentation. This presentation and the queries, examples, and.

Similar presentations


Presentation on theme: "Copyright  All material contained herein is owned by Daniel Stober, the author of this presentation. This presentation and the queries, examples, and."— Presentation transcript:

1 Copyright  All material contained herein is owned by Daniel Stober, the author of this presentation. This presentation and the queries, examples, and original sample data may be shared with others for educational purposes only in the following circumstances: 1. That the person or organization sharing the information is not compensated, or 2. If shared in a circumstance with compensation, that the author has granted written permission to use this work  Using examples from this presentation, in whole or in part, in another work without proper source attribution constitutes plagiarism.  Use of this work implies acceptance of these terms

2 SQL Magic! Dan Stober Intermountain Healthcare Wednesday, March 30, 2011

3 Dan Stober  Data Architect – Intermountain Healthcare  Attended California State Univ., Fresno  Working in Oracle databases since 2001  Frequent presenter at local and national user group conferences  Oracle Open World twice  Private Instructor for Trutek  Teaching PLSQL  Oracle Certified SQL Expert  Board of Trustees – Utah Oracle Users Group  Edit newsletter  Write SQL Tip column

4 Intermountain Healthcare  23 hospitals in Utah and Idaho  Non-profit integrated health care system  750 Employed physicians  30,000 employees  The largest non-government employer in Utah  One of the largest and most complete clinical data warehouses in the world!

5 Session Norms  Questions? Clarifications?  Interrupt me!  I learn from every session I teach  Participation is welcome  Cellphones, pagers – No problem, mon!  All examples are in the slides  Most examples use Oracle sample schemas  Very intense SQL - Sit close enough to see!

6 Today’s Agenda 1. Review of Analytic Functions 2. Single Pass Queries  Don’t hit those big tables more than once 3. Spreadsheet Logic  Excel can do it, can SQL do it? 4. Trickery  Don’t be fooled  Things you can do to make your query run slower

7 Aggregate or Analytic?  What’s the difference? Which one are each of these? COUNT SUM MAX MIN SYNTAXOUTPUT Aggregate (traditional) Query often includes the keywords GROUP BY Output is a single row (or one row per group with GROUP BY) AnalyticOVER ( some other stuff)Does not change number of rows

8 Aggregate Examples DeptnoEnameSal 10Clark2450 10King5000 10Miller1300 20Adams1100 20Ford3000 20Jones2975 20Scott3000 20Smith800 30Allen1600 30Blake2850 30James950 30Martin1250 30Turner1500 30Ward1250 SELECT COUNT ( * ) FROM scott.emp; COUNT(*) ---------- 14 1 row selected. COUNT(*) ---------- 14 1 row selected. SELECT SUM ( sal ) FROM scott.emp; SUM(SAL) ---------- 29025 1 row selected. SUM(SAL) ---------- 29025 1 row selected. SELECT COUNT ( * ), SUM ( sal ), MAX ( sal ), MIN ( ename ) FROM scott.emp; COUNT(*) SUM(SAL) MAX(SAL) MIN(ENAME) ---------- ---------- 14 29025 5000 ADAMS 1 row selected. COUNT(*) SUM(SAL) MAX(SAL) MIN(ENAME) ---------- ---------- 14 29025 5000 ADAMS 1 row selected.

9 Aggregate Examples DeptnoEnameSal 10Clark2450 10King5000 10Miller1300 20Adams1100 20Ford3000 20Jones2975 20Scott3000 20Smith800 30Allen1600 30Blake2850 30James950 30Martin1250 30Turner1500 30Ward1250 SELECT COUNT ( * ), SUM ( sal ) FROM scott.emp WHERE deptno = 30; COUNT(*) SUM(SAL) ---------- 6 9400 1 row selected. COUNT(*) SUM(SAL) ---------- 6 9400 1 row selected. SELECT deptno, COUNT ( * ), SUM ( sal ) FROM scott.emp; * ERROR at line 1: ORA-00937: not a single-group group function * ERROR at line 1: ORA-00937: not a single-group group function SELECT deptno, COUNT ( * ), SUM ( sal ) FROM scott.emp GROUP BY deptno; DEPTNO COUNT(*) SUM(SAL) ---------- ---------- ---------- 10 3 8750 20 5 10875 30 6 9400 3 rows selected. DEPTNO COUNT(*) SUM(SAL) ---------- ---------- ---------- 10 3 8750 20 5 10875 30 6 9400 3 rows selected. One record for each group

10 Aggregate Examples DeptnoEnameJobSal 10ClarkManager2450 10KingPresident5000 10MillerClerk1300 20AdamsClerk1100 20FordAnalyst3000 20JonesManager2975 20ScottAnalyst3000 20SmithClerk800 30AllenSalesman1600 30BlakeManager2850 30JamesClerk950 30MartinSalesman1250 30TurnerSalesman1500 30WardSalesman1250 SELECT deptno, job, COUNT ( * ), SUM ( sal ) FROM scott.emp GROUP BY deptno, job; DEPTNO JOB COUNT(*) SUM(SAL) ---------- --------- ---------- ---------- 10 CLERK 1 1300 10 MANAGER 1 2450 10 PRESIDENT 1 5000 20 ANALYST 2 6000 20 CLERK 2 1900 20 MANAGER 1 2975 30 CLERK 1 950 30 MANAGER 1 2850 30 SALESMAN 4 5600 9 rows selected. DEPTNO JOB COUNT(*) SUM(SAL) ---------- --------- ---------- ---------- 10 CLERK 1 1300 10 MANAGER 1 2450 10 PRESIDENT 1 5000 20 ANALYST 2 6000 20 CLERK 2 1900 20 MANAGER 1 2975 30 CLERK 1 950 30 MANAGER 1 2850 30 SALESMAN 4 5600 9 rows selected. SELECT deptno, COUNT ( * ), SUM ( sal ) FROM scott.emp GROUP BY deptno, job; DEPTNO COUNT(*) SUM(SAL) ---------- ---------- ---------- 10 1 1300 10 1 2450 10 1 5000 20 2 6000 20 2 1900 20 1 2975 30 1 950 30 1 2850 30 4 5600 9 rows selected. DEPTNO COUNT(*) SUM(SAL) ---------- ---------- ---------- 10 1 1300 10 1 2450 10 1 5000 20 2 6000 20 2 1900 20 1 2975 30 1 950 30 1 2850 30 4 5600 9 rows selected. One record for each combination of deptno and job

11 Analytic Functions  What makes a function analytic?  Keyword OVER  Followed by set of parentheses DeptnoEnameSal 10Clark2450 10King5000 10Miller1300 20Adams1100 20Ford3000 20Jones2975 20Scott3000 20Smith800 30Allen1600 30Blake2850 30James950 30Martin1250 30Turner1500 30Ward1250 SELECT deptno, ename, sal, COUNT ( * ) OVER (), SUM ( sal ) OVER () FROM scott.emp; DEPTNO ENAME SAL COUNT(*)OVER() SUM(SAL)OVER() ------- ---------- ------- -------------- -------------- 10 CLARK 2450 14 29025 10 KING 5000 14 29025 10 MILLER 1300 14 29025 20 ADAMS 1100 14 29025 20 FORD 3000 14 29025 20 JONES 2975 14 29025 20 SCOTT 3000 14 29025 20 SMITH 800 14 29025 30 ALLEN 1600 14 29025 30 BLAKE 2850 14 29025 30 JAMES 950 14 29025 30 MARTIN 1250 14 29025 30 TURNER 1500 14 29025 30 WARD 1250 14 29025 14 rows selected. DEPTNO ENAME SAL COUNT(*)OVER() SUM(SAL)OVER() ------- ---------- ------- -------------- -------------- 10 CLARK 2450 14 29025 10 KING 5000 14 29025 10 MILLER 1300 14 29025 20 ADAMS 1100 14 29025 20 FORD 3000 14 29025 20 JONES 2975 14 29025 20 SCOTT 3000 14 29025 20 SMITH 800 14 29025 30 ALLEN 1600 14 29025 30 BLAKE 2850 14 29025 30 JAMES 950 14 29025 30 MARTIN 1250 14 29025 30 TURNER 1500 14 29025 30 WARD 1250 14 29025 14 rows selected. Returns one result for each record in the dataset. No grouping

12 Analytic Functions  With WHERE clause  Which happens first? DeptnoEnameSal 10Clark2450 10King5000 10Miller1300 20Adams1100 20Ford3000 20Jones2975 20Scott3000 20Smith800 30Allen1600 30Blake2850 30James950 30Martin1250 30Turner1500 30Ward1250 SELECT deptno, ename, sal, COUNT ( * ) OVER (), SUM ( sal ) OVER () FROM scott.emp WHERE deptno = 30; DEPTNO ENAME SAL COUNT(*)OVER() SUM(SAL)OVER() ------- ---------- ------- -------------- -------------- 30 ALLEN 1600 6 9450 30 BLAKE 2850 6 9450 30 JAMES 950 6 9450 30 MARTIN 1250 6 9450 30 TURNER 1500 6 9450 30 WARD 1250 6 9450 6 rows selected. DEPTNO ENAME SAL COUNT(*)OVER() SUM(SAL)OVER() ------- ---------- ------- -------------- -------------- 30 ALLEN 1600 6 9450 30 BLAKE 2850 6 9450 30 JAMES 950 6 9450 30 MARTIN 1250 6 9450 30 TURNER 1500 6 9450 30 WARD 1250 6 9450 6 rows selected. Even with OVER() and empty parens, the function operates only on the records which meet the conditions of the WHERE clause

13 The Analytic Clause  Within the set of parentheses  Expressions telling the function to calculate differently  Three possible components  Partition  Order  Windowing  Some or all are optional, depending upon the function  Components must be in this order

14 PARTITION BY DeptnoEnameSalJob 10Clark2450Manager 10King5000President 10Miller1300Clerk 20Adams1100Clerk 20Ford3000Analyst 20Jones2975Manager 20Scott3000Analyst 20Smith800Clerk 30Allen1600Salesman 30Blake2850Manager 30James950Clerk 30Martin1250Salesman 30Turner1500Salesman 30Ward1250Salesman SELECT deptno, ename, sal, job, COUNT ( * ) OVER ( PARTITION BY job ) jobcount, SUM ( sal ) OVER ( PARTITION BY deptno ) deptsum FROM scott.emp; DEPTNO ENAME SAL JOB JOBCOUNT DEPTSUM ------- -------- ------- --------- ---------- ---------- 10 CLARK 2450 MANAGER 3 8750 10 KING 5000 PRESIDENT 1 8750 10 MILLER 1300 CLERK 4 8750 20 ADAMS 1100 CLERK 4 10875 20 FORD 3000 ANALYST 2 10875 20 JONES 2975 MANAGER 3 10875 20 SCOTT 3000 ANALYST 2 10875 20 SMITH 800 CLERK 4 10875 30 ALLEN 1600 SALESMAN 4 9400 30 BLAKE 2850 MANAGER 3 9400 30 JAMES 950 CLERK 4 9400 30 MARTIN 1250 SALESMAN 4 9400 30 TURNER 1500 SALESMAN 4 9400 30 WARD 1250 SALESMAN 4 9400 14 rows selected. DEPTNO ENAME SAL JOB JOBCOUNT DEPTSUM ------- -------- ------- --------- ---------- ---------- 10 CLARK 2450 MANAGER 3 8750 10 KING 5000 PRESIDENT 1 8750 10 MILLER 1300 CLERK 4 8750 20 ADAMS 1100 CLERK 4 10875 20 FORD 3000 ANALYST 2 10875 20 JONES 2975 MANAGER 3 10875 20 SCOTT 3000 ANALYST 2 10875 20 SMITH 800 CLERK 4 10875 30 ALLEN 1600 SALESMAN 4 9400 30 BLAKE 2850 MANAGER 3 9400 30 JAMES 950 CLERK 4 9400 30 MARTIN 1250 SALESMAN 4 9400 30 TURNER 1500 SALESMAN 4 9400 30 WARD 1250 SALESMAN 4 9400 14 rows selected.  Analytic function calculated on a subset of the records  Can differ for each one

15 ORDER BY DeptnoEnameSal 10Clark2450 10King5000 10Miller1300 20Adams1100 20Ford3000 20Jones2975 20Scott3000 20Smith800 30Allen1600 30Blake2850 30James950 30Martin1250 30Turner1500 30Ward1250 SELECT deptno, ename, sal, LAG ( ename ) OVER ( ORDER BY ename ) f1, LAG ( ename ) OVER ( PARTITION BY deptno ORDER BY ename ) f2, LAG ( ename ) OVER ( PARTITION BY deptno ORDER BY sal DESC) f3 FROM scott.emp; DEPTNO ENAME SAL F1 F2 F3 -------- ---------- -------- ---------- ---------- ------- 10 CLARK 2450 BLAKE KING 10 KING 5000 JONES CLARK 10 MILLER 1300 MARTIN KING CLARK 20 ADAMS 1100 JONES 20 FORD 3000 CLARK ADAMS SCOTT 20 JONES 2975 JAMES FORD FORD 20 SCOTT 3000 MILLER JONES 20 SMITH 800 SCOTT SCOTT ADAMS 30 ALLEN 1600 ADAMS BLAKE 30 BLAKE 2850 ALLEN ALLEN 30 JAMES 950 FORD BLAKE WARD 30 MARTIN 1250 KING JAMES TURNER 30 TURNER 1500 SMITH MARTIN ALLEN 30 WARD 1250 TURNER TURNER MARTIN 14 rows selected. DEPTNO ENAME SAL F1 F2 F3 -------- ---------- -------- ---------- ---------- ------- 10 CLARK 2450 BLAKE KING 10 KING 5000 JONES CLARK 10 MILLER 1300 MARTIN KING CLARK 20 ADAMS 1100 JONES 20 FORD 3000 CLARK ADAMS SCOTT 20 JONES 2975 JAMES FORD FORD 20 SCOTT 3000 MILLER JONES 20 SMITH 800 SCOTT SCOTT ADAMS 30 ALLEN 1600 ADAMS BLAKE 30 BLAKE 2850 ALLEN ALLEN 30 JAMES 950 FORD BLAKE WARD 30 MARTIN 1250 KING JAMES TURNER 30 TURNER 1500 SMITH MARTIN ALLEN 30 WARD 1250 TURNER TURNER MARTIN 14 rows selected.  Required for some functions  Optional on others

16 Order of Items in Analytic Clause SELECT deptno, empno, ename, sal, MIN ( sal ) OVER (ORDER BY ename PARTITION BY deptno ) minsal FROM scott.emp;, MIN ( sal ) OVER (ORDER BY ename PARTITION BY deptno ) minsal * ERROR at line 2: ORA-00907: missing right parenthesis, MIN ( sal ) OVER (ORDER BY ename PARTITION BY deptno ) minsal * ERROR at line 2: ORA-00907: missing right parenthesis Error Message: Somewhat misleading. It does not recognize PARTITION BY after an ORDER BY

17 ORDER BY Caveat #1  Ensure that sort is deterministic  If not, results may vary SELECT deptno, ename, job, sal, hiredate, ROW_NUMBER ( ) OVER ( ORDER BY sal DESC) r1, ROW_NUMBER ( ) OVER ( PARTITION BY job ORDER BY sal ) r2 FROM scott.emp; DEPTNO ENAME JOB SAL HIREDATE R1 R2 ---------- ---------- --------- ---------- --------- ---------- ---------- 10 CLARK MANAGER 2450 09-JUN-81 6 1 10 KING PRESIDENT 5000 17-NOV-81 1 1 10 MILLER CLERK 1300 23-JAN-82 9 4 20 ADAMS CLERK 1100 23-MAY-87 12 3 20 FORD ANALYST 3000 03-DEC-81 2 1 20 JONES MANAGER 2975 02-APR-81 4 3 20 SCOTT ANALYST 3000 19-APR-87 3 2 20 SMITH CLERK 800 17-DEC-80 14 1 30 ALLEN SALESMAN 1600 20-FEB-81 7 4 30 BLAKE MANAGER 2850 01-MAY-81 5 2 30 JAMES CLERK 950 03-DEC-81 13 2 30 MARTIN SALESMAN 1250 28-SEP-81 10 1 30 TURNER SALESMAN 1500 08-SEP-81 8 3 30 WARD SALESMAN 1250 22-FEB-81 11 2 14 rows selected. DEPTNO ENAME JOB SAL HIREDATE R1 R2 ---------- ---------- --------- ---------- --------- ---------- ---------- 10 CLARK MANAGER 2450 09-JUN-81 6 1 10 KING PRESIDENT 5000 17-NOV-81 1 1 10 MILLER CLERK 1300 23-JAN-82 9 4 20 ADAMS CLERK 1100 23-MAY-87 12 3 20 FORD ANALYST 3000 03-DEC-81 2 1 20 JONES MANAGER 2975 02-APR-81 4 3 20 SCOTT ANALYST 3000 19-APR-87 3 2 20 SMITH CLERK 800 17-DEC-80 14 1 30 ALLEN SALESMAN 1600 20-FEB-81 7 4 30 BLAKE MANAGER 2850 01-MAY-81 5 2 30 JAMES CLERK 950 03-DEC-81 13 2 30 MARTIN SALESMAN 1250 28-SEP-81 10 1 30 TURNER SALESMAN 1500 08-SEP-81 8 3 30 WARD SALESMAN 1250 22-FEB-81 11 2 14 rows selected. There is no assurance the row_number() assignments would not be different for the $3000 sal on the next time the query is executed

18 ORDER BY Caveat #2  On many functions, using ORDER BY changes window  SUM, COUNT, MAX, MIN, LAST_VALUE SELECT deptno, ename, sal, SUM ( sal ) OVER ( ORDER BY ename ) s, COUNT ( * ) OVER ( ORDER BY ename ) c, MIN ( sal ) OVER ( ORDER BY ename ) mn, MAX ( sal ) OVER ( ORDER BY ename ) mx FROM scott.emp WHERE deptno = 10; DEPTNO ENAME SAL S C MN MX ---------- ---------- ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 2450 1 2450 2450 10 KING 5000 7450 2 2450 5000 10 MILLER 1300 8750 3 1300 5000 3 rows selected. DEPTNO ENAME SAL S C MN MX ---------- ---------- ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 2450 1 2450 2450 10 KING 5000 7450 2 2450 5000 10 MILLER 1300 8750 3 1300 5000 3 rows selected. On each record, results are from the beginning of the partition to the current record, as defined by the ORDER BY

19 Why?  This is the default behavior. If you include an ORDER BY where one would not be necessary, Oracle assumes it is there for a reason.  1 + 3 + 5 = 9 and 5 + 1 + 3 = 9  Very powerful for MTD calculations: Week Number SalesMonth To Date 111,000 215,00026,000 312,00038,000 416,00054,000 SUM ( sales ) OVER (ORDER BY week_number) SUM ( sales ) OVER (ORDER BY week_number)

20 Default Windowing CustOrder_Date A12/25/2010 A1/15/2011 A2/28/2011 B6/16/2010 B9/15/2010 B1/1/2011 B2/12/2011... OVER ( PARTITION BY cust ) Calculation on each of these records includes all three of these records Calculation on each of these records includes all four of these records COUNT ( * ) OVER... COUNT 3 3 3 4 4 4 4

21 Default Windowing CustOrder_Date A12/25/2010 A1/15/2011 A2/28/2011 B6/16/2010 B9/15/2010 B1/1/2011 B2/12/2011... OVER ( PARTITION BY cust ORDER BY order_date ROWS BETWEEN ? ) Calculation on each of these records includes only the records which preceded it in the partition COUNT ( * ) OVER... COUNT 3 3 3 4 4 4 4 1 2 3 1 2 3 4

22 Windowing  Demonstration of default windowing  With and without ORDER BY SELECT deptno, ename, sal, SUM ( sal ) OVER ( ) sum1, SUM ( sal ) OVER ( ORDER BY ename ) sum2, SUM ( sal ) OVER ( ORDER BY ename ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) sum3, SUM ( sal ) OVER ( ORDER BY ename ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) sum4 FROM scott.emp WHERE deptno = 10; DEPTNO ENAME SAL SUM1 SUM2 SUM3 SUM4 ---------- ---------- ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 8750 2450 8750 2450 10 KING 5000 8750 7450 8750 7450 10 MILLER 1300 8750 8750 8750 8750 3 rows selected. DEPTNO ENAME SAL SUM1 SUM2 SUM3 SUM4 ---------- ---------- ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 8750 2450 8750 2450 10 KING 5000 8750 7450 8750 7450 10 MILLER 1300 8750 8750 8750 8750 3 rows selected. SUM 1 is the same as SUM3 SUM 2 is the same as SUM4 Default windowing saves a lot of typing and eliminates clutter

23 Windowing  Selects a smaller subset than the partition  Based on a number of records before/after  Or a time period before/after SELECT deptno, ename, sal, SUM ( sal ) OVER ( ORDER BY ename ROWS BETWEEN '1' PRECEDING AND '1' FOLLOWING ) sum1, SUM ( sal ) OVER ( PARTITION BY deptno ORDER BY ename ROWS BETWEEN '1' PRECEDING AND '1' FOLLOWING ) sum2 FROM scott.emp; DEPTNO ENAME SAL SUM1 SUM2 ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 8300 7450 10 KING 5000 9225 8750 10 MILLER 1300 5550 6300 20 ADAMS 1100 2700 4100 20 FORD 3000 6400 7075 20 JONES 2975 8925 8975 20 SCOTT 3000 5100 6775 20 SMITH 800 5300 3800 30 ALLEN 1600 5550 4450 30 BLAKE 2850 6900 5400 30 JAMES 950 6925 5050 30 MARTIN 1250 7550 3700 30 TURNER 1500 3550 4000 30 WARD 1250 2750 2750 14 rows selected. DEPTNO ENAME SAL SUM1 SUM2 ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 8300 7450 10 KING 5000 9225 8750 10 MILLER 1300 5550 6300 20 ADAMS 1100 2700 4100 20 FORD 3000 6400 7075 20 JONES 2975 8925 8975 20 SCOTT 3000 5100 6775 20 SMITH 800 5300 3800 30 ALLEN 1600 5550 4450 30 BLAKE 2850 6900 5400 30 JAMES 950 6925 5050 30 MARTIN 1250 7550 3700 30 TURNER 1500 3550 4000 30 WARD 1250 2750 2750 14 rows selected. 2450+5000+1300 2450+5000 5000+1300 1100+3000 1100+3000+2975

24 Windowing SELECT deptno, ename, sal, SUM ( sal ) OVER ( ORDER BY ename ROWS BETWEEN '1' PRECEDING AND '1' FOLLOWING ) sum1, SUM ( sal ) OVER ( PARTITION BY deptno ORDER BY ename ROWS BETWEEN '1' PRECEDING AND '1' FOLLOWING ) sum2 FROM scott.emp; DEPTNO ENAME SAL SUM1 SUM2 ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 8300 7450 10 KING 5000 9225 8750 10 MILLER 1300 5550 6300 20 ADAMS 1100 2700 4100 20 FORD 3000 6400 7075 20 JONES 2975 8925 8975 20 SCOTT 3000 5100 6775 20 SMITH 800 5300 3800 30 ALLEN 1600 5550 4450 30 BLAKE 2850 6900 5400 30 JAMES 950 6925 5050 30 MARTIN 1250 7550 3700 30 TURNER 1500 3550 4000 30 WARD 1250 2750 2750 14 rows selected. DEPTNO ENAME SAL SUM1 SUM2 ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 8300 7450 10 KING 5000 9225 8750 10 MILLER 1300 5550 6300 20 ADAMS 1100 2700 4100 20 FORD 3000 6400 7075 20 JONES 2975 8925 8975 20 SCOTT 3000 5100 6775 20 SMITH 800 5300 3800 30 ALLEN 1600 5550 4450 30 BLAKE 2850 6900 5400 30 JAMES 950 6925 5050 30 MARTIN 1250 7550 3700 30 TURNER 1500 3550 4000 30 WARD 1250 2750 2750 14 rows selected. Adams: 1100 + 1600 Allen: 1100 + 1600 + 2850 Adams Allen Blake Clark Ford James Jones King Martin Miller Scott Smith Turner Ward Blake: 1600 + 2850 + 2450

25 DISTINCT and GROUP BY  Is there a difference? SELECT deptno, COUNT(*) OVER ( PARTITION BY deptno ) AS empcnt FROM scott.emp GROUP BY deptno; SELECT DISTINCT deptno, COUNT(*) OVER ( PARTITION BY deptno ) AS empcnt FROM scott.emp; DEPTNO EMPCNT ---------- 10 1 20 1 30 1 3 rows selected. DEPTNO EMPCNT ---------- 10 1 20 1 30 1 3 rows selected. DEPTNO EMPCNT ---------- 10 3 20 5 30 6 3 rows selected. DEPTNO EMPCNT ---------- 10 3 20 5 30 6 3 rows selected. When analytic functions are involved: YES Order of evaluation in a single select query: 1.Table Joins 2.WHERE clause filters 3.GROUP BY 4.Analytic Functions 5.Ordering

26 SINGLE PASS QUERIES

27 Which departments have no employees who earn $3000 or more?

28 Question………  Which depts have no emps who earn $3,000 or more? DeptnoEnameSal 10Clark2450 10King5000 10Miller1300 20Adams1100 20Ford3000 20Jones2975 20Scott3000 20Smith800 30Allen1600 30Blake2850 30James950 30Martin1250 30Turner1500 30Ward1250 SELECT deptno, COUNT(*) FROM scott.emp WHERE sal >= 3000 GROUP BY deptno HAVING COUNT(*) = 0; no rows selected SELECT deptno, ename, sal FROM scott.emp WHERE sal >= 3000; DEPTNO ENAME SAL ---------- ---------- ---------- 10 KING 5000 20 FORD 3000 20 SCOTT 3000 3 rows selected. DEPTNO ENAME SAL ---------- ---------- ---------- 10 KING 5000 20 FORD 3000 20 SCOTT 3000 3 rows selected. Why? The filtering condition in the WHERE clause eliminated the records in dept 30

29 Which records do not meet a condition?  Which depts have no emps who earn $3,000 or more? SELECT d.deptno, COUNT(e.deptno) FROM scott.dept d LEFT JOIN scott.emp e ON ( e.deptno = d.deptno AND e.sal >= 3000 ) GROUP BY d.deptno HAVING COUNT(e.deptno) = 0; SELECT d.deptno, COUNT(e.deptno) FROM scott.dept d, scott.emp e WHERE e.deptno (+) = d.deptno AND e.sal (+) >= 3000 GROUP BY d.deptno HAVING COUNT(e.deptno) = 0; DEPTNO COUNT(E.DEPTNO) ---------- --------------- 30 0 40 0 2 rows selected. DEPTNO COUNT(E.DEPTNO) ---------- --------------- 30 0 40 0 2 rows selected. The outer join returned one department which has NO EMPLOYEES

30 SELECT d.deptno, COUNT(e.deptno) FROM scott.dept d LEFT JOIN scott.emp e ON ( e.deptno = d.deptno AND sal >= 3000 ) WHERE EXISTS ( SELECT 1 FROM scott.emp s WHERE d.deptno = s.deptno ) GROUP BY d.deptno HAVING COUNT(e.deptno) = 0; One approach: Test the records returned by the outer join to make sure that emp records do exist SELECT DISTINCT e.deptno, ( SELECT COUNT(*) FROM scott.emp WHERE deptno = e.deptno AND sal >= 3000 ) empcount FROM scott.emp e WHERE ( SELECT COUNT(*) FROM scott.emp WHERE deptno = e.deptno AND sal >= 3000 ) = 0; DEPTNO COUNT(E.DEPTNO) ---------- --------------- 30 0 1 row selected. DEPTNO COUNT(E.DEPTNO) ---------- --------------- 30 0 1 row selected. DEPTNO EMPCOUNT ---------- 30 0 1 row selected. DEPTNO EMPCOUNT ---------- 30 0 1 row selected. Both of these require two passes over the emp table Another approach: use a scalar subquery in the WHERE clause to count recs which meet condition by deptno Instead…

31 Using a CASE Statement in HAVING SELECT e.deptno, COUNT ( CASE WHEN sal >= 3000 THEN 1 END ) empcount FROM scott.emp e GROUP BY deptno HAVING COUNT ( CASE WHEN sal >= 3000 THEN 1 END ) = 0; SUM ( CASE WHEN sal >= 3000 THEN 1 ELSE 0 END ) COUNT ( CASE WHEN sal >= 3000 THEN 1 END ) There is no ELSE on the CASE. Records that do not meet the test will be NULL, and will not be counted DEPTNO COUNT(E.DEPTNO) ---------- --------------- 30 0 1 row selected. DEPTNO COUNT(E.DEPTNO) ---------- --------------- 30 0 1 row selected. Only one pass!

32 Counting with CASE SELECT e.deptno, e.ename, e.sal, CASE WHEN sal >= 3000 THEN 1 ELSE 0 END case1, CASE WHEN sal >= 3000 THEN 1 END case2 FROM scott.emp e; DEPTNO ENAME SAL CASE1 CASE2 ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 0 10 KING 5000 1 1 10 MILLER 1300 0 20 ADAMS 1100 0 20 FORD 3000 1 1 20 JONES 2975 0 20 SCOTT 3000 1 1 20 SMITH 800 0 30 ALLEN 1600 0 30 BLAKE 2850 0 30 JAMES 950 0 30 MARTIN 1250 0 30 TURNER 1500 0 30 WARD 1250 0 14 rows selected. DEPTNO ENAME SAL CASE1 CASE2 ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 0 10 KING 5000 1 1 10 MILLER 1300 0 20 ADAMS 1100 0 20 FORD 3000 1 1 20 JONES 2975 0 20 SCOTT 3000 1 1 20 SMITH 800 0 30 ALLEN 1600 0 30 BLAKE 2850 0 30 JAMES 950 0 30 MARTIN 1250 0 30 TURNER 1500 0 30 WARD 1250 0 14 rows selected. SUMCOUNT

33 Can you find the employee with the lowest salary in each department?

34 Question………  Find the employee with the lowest salary in each department DeptnoEnameSal 10Clark2450 10King5000 10Miller1300 20Adams1100 20Ford3000 20Jones2975 20Scott3000 20Smith800 30Allen1600 30Blake2850 30James950 30Martin1250 30Turner1500 30Ward1250 SELECT deptno, MIN ( sal ) FROM scott.emp GROUP BY deptno; DEPTNO MIN(SAL) ---------- 10 1300 20 800 30 950 3 rows selected. DEPTNO MIN(SAL) ---------- 10 1300 20 800 30 950 3 rows selected.

35 Traditional techniques SELECT e.deptno, e.ename, e.sal FROM scott.emp e WHERE e.sal = ( SELECT MIN ( s.sal ) FROM scott.emp s WHERE s.deptno = e.deptno ); SELECT e.deptno, e.ename, e.sal FROM scott.emp e JOIN ( SELECT deptno, MIN ( sal ) sal FROM scott.emp GROUP BY deptno ) s ON e.deptno = s.deptno AND e.sal = s.sal; DEPTNO ENAME SAL ---------- ---------- ---------- 10 MILLER 1300 20 SMITH 800 30 JAMES 950 3 rows selected. DEPTNO ENAME SAL ---------- ---------- ---------- 10 MILLER 1300 20 SMITH 800 30 JAMES 950 3 rows selected. Correlated subquery Inline view

36 FIRST_VALUE Function  Returns the value found in one field of a record, based on the results when the results are ordered by a different column FIRST_VALUE ( ename ) OVER ( ORDER BY sal ) DeptnoEnameSal 30Allen1600 30Blake2850 30James950 30Martin1250 30Turner1500 30Ward1250 DeptnoEnameSal 30James950 30Martin1250 30Martin1250 30Turner1500 30Allen1600 30Blake2850 Order by this field Show value from this field

37 FIRST_VALUE SELECT deptno, ename, sal, FIRST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal) ename, MIN ( sal ) OVER ( PARTITION BY deptno ) sal FROM scott.emp; DEPTNO ENAME SAL ENAME SAL ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 MILLER 1300 10 KING 5000 MILLER 1300 10 MILLER 1300 MILLER 1300 20 ADAMS 1100 SMITH 800 20 FORD 3000 SMITH 800 20 JONES 2975 SMITH 800 20 SCOTT 3000 SMITH 800 20 SMITH 800 SMITH 800 30 ALLEN 1600 JAMES 950 30 BLAKE 2850 JAMES 950 30 JAMES 950 JAMES 950 30 MARTIN 1250 JAMES 950 30 TURNER 1500 JAMES 950 30 WARD 1250 JAMES 950 14 rows selected. DEPTNO ENAME SAL ENAME SAL ---------- ---------- ---------- ---------- ---------- 10 CLARK 2450 MILLER 1300 10 KING 5000 MILLER 1300 10 MILLER 1300 MILLER 1300 20 ADAMS 1100 SMITH 800 20 FORD 3000 SMITH 800 20 JONES 2975 SMITH 800 20 SCOTT 3000 SMITH 800 20 SMITH 800 SMITH 800 30 ALLEN 1600 JAMES 950 30 BLAKE 2850 JAMES 950 30 JAMES 950 JAMES 950 30 MARTIN 1250 JAMES 950 30 TURNER 1500 JAMES 950 30 WARD 1250 JAMES 950 14 rows selected. Lowest paid person and lowest salary in each department

38 FIRST_VALUE in WHERE clause SELECT deptno, ename, sal FROM scott.emp WHERE ename = FIRST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal); WHERE ename = FIRST_VALUE ( ename ) * ERROR at line 3: ORA-30483: window functions are not allowed here WHERE ename = FIRST_VALUE ( ename ) * ERROR at line 3: ORA-30483: window functions are not allowed here Order of evaluation in a single select query: 1.Table Joins 2.WHERE clause filters 3.GROUP BY 4.Analytic Functions 5.Ordering Why? Because analytic functions get evaluated after the WHERE clause

39 Solution in one pass SELECT DISTINCT deptno, FIRST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal) ename, MIN ( sal ) OVER ( PARTITION BY deptno ) sal FROM scott.emp; DEPTNO ENAME SAL ---------- ---------- ---------- 10 MILLER 1300 20 SMITH 800 30 JAMES 950 3 rows selected. DEPTNO ENAME SAL ---------- ---------- ---------- 10 MILLER 1300 20 SMITH 800 30 JAMES 950 3 rows selected. One solution: Use DISTINCT SELECT deptno, ename, sal FROM ( SELECT deptno, ename, sal, FIRST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal) lowest_ename FROM scott.emp ) WHERE ename = lowest_ename; Another solution: Nest the analytic in a subquery (inline view), and select the records where the ename matches the lowest paid emp

40 Can you find the employee with the highest salary in each department?

41 LAST_VALUE Function  Returns the value found in one field of a record, based on the results when the results are ordered by a different column LAST_VALUE ( ename ) OVER ( ORDER BY sal ) DeptnoEnameSal 30Allen1600 30Blake2850 30James950 30Martin1250 30Turner1500 30Ward1250 DeptnoEnameSal 30James950 30Ward1250 30Martin1250 30Turner1500 30Allen1600 30Blake2850 Order by this field Show value from this field

42 LAST_VALUE function SELECT deptno, ename, sal, LAST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal) lastv, MAX ( sal ) OVER ( PARTITION BY deptno ) maxsal FROM scott.emp WHERE deptno = 30; DEPTNO ENAME SAL LASTV MAXSAL ---------- ---------- ---------- ---------- ---------- 30 JAMES 950 JAMES 2850 30 MARTIN 1250 WARD 2850 30 WARD 1250 WARD 2850 30 TURNER 1500 TURNER 2850 30 ALLEN 1600 ALLEN 2850 30 BLAKE 2850 BLAKE 2850 6 rows selected. DEPTNO ENAME SAL LASTV MAXSAL ---------- ---------- ---------- ---------- ---------- 30 JAMES 950 JAMES 2850 30 MARTIN 1250 WARD 2850 30 WARD 1250 WARD 2850 30 TURNER 1500 TURNER 2850 30 ALLEN 1600 ALLEN 2850 30 BLAKE 2850 BLAKE 2850 6 rows selected. LAST_VALUE does not appear to be working as advertised

43 LAST_VALUE Function  The default windowing is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  Actually, this is the default for FIRST_VALUE, too DeptnoEnameSalLast Value 30James950James 30Ward1250Ward 30Martin1250Ward 30Turner1500Turner 30Allen1600Allen 30Blake2850Blake This means: “From the beginning of the partition up to this record.”

44 LAST_VALUE with Correct Windowing SELECT deptno, ename, sal, LAST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING )lastv, MAX ( sal ) OVER ( PARTITION BY deptno ) maxsal FROM scott.emp WHERE deptno = 30; DEPTNO ENAME SAL LASTV MAXSAL ---------- ---------- ---------- ---------- ---------- 30 JAMES 950 BLAKE 2850 30 WARD 1250 BLAKE 2850 30 MARTIN 1250 BLAKE 2850 30 TURNER 1500 BLAKE 2850 30 ALLEN 1600 BLAKE 2850 30 BLAKE 2850 BLAKE 2850 6 rows selected. DEPTNO ENAME SAL LASTV MAXSAL ---------- ---------- ---------- ---------- ---------- 30 JAMES 950 BLAKE 2850 30 WARD 1250 BLAKE 2850 30 MARTIN 1250 BLAKE 2850 30 TURNER 1500 BLAKE 2850 30 ALLEN 1600 BLAKE 2850 30 BLAKE 2850 BLAKE 2850 6 rows selected.

45 FIRST_VALUE as alternative SELECT deptno, ename, sal, LAST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING )lastv, FIRST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal DESC ) firstv, MAX ( sal ) OVER ( PARTITION BY deptno ) maxsal FROM scott.emp WHERE deptno = 30; DEPTNO ENAME SAL LASTV FIRSTV MAXSAL ---------- ---------- ---------- 30 JAMES 950 BLAKE BLAKE 2850 30 WARD 1250 BLAKE BLAKE 2850 30 MARTIN 1250 BLAKE BLAKE 2850 30 TURNER 1500 BLAKE BLAKE 2850 30 ALLEN 1600 BLAKE BLAKE 2850 30 BLAKE 2850 BLAKE BLAKE 2850 6 rows selected. DEPTNO ENAME SAL LASTV FIRSTV MAXSAL ---------- ---------- ---------- 30 JAMES 950 BLAKE BLAKE 2850 30 WARD 1250 BLAKE BLAKE 2850 30 MARTIN 1250 BLAKE BLAKE 2850 30 TURNER 1500 BLAKE BLAKE 2850 30 ALLEN 1600 BLAKE BLAKE 2850 30 BLAKE 2850 BLAKE BLAKE 2850 6 rows selected. Using FIRST_VALUE with a DESCending sort yields the same outcome --- without having to worry about the windowing problem

46 LAST_VALUE with Correct Windowing SELECT deptno, ename, sal, LAST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING )lastv, MAX ( sal ) OVER ( PARTITION BY deptno ) maxsal FROM scott.emp; DEPTNO ENAME SAL LASTV MAXSAL ---------- ---------- ---------- ---------- ---------- 10 MILLER 1300 KING 5000 10 CLARK 2450 KING 5000 10 KING 5000 KING 5000 20 SMITH 800 FORD 3000 20 ADAMS 1100 FORD 3000 20 JONES 2975 FORD 3000 20 SCOTT 3000 FORD 3000 20 FORD 3000 FORD 3000 30 JAMES 950 BLAKE 2850 30 MARTIN 1250 BLAKE 2850 30 WARD 1250 BLAKE 2850 30 TURNER 1500 BLAKE 2850 30 ALLEN 1600 BLAKE 2850 30 BLAKE 2850 BLAKE 2850 14 rows selected. DEPTNO ENAME SAL LASTV MAXSAL ---------- ---------- ---------- ---------- ---------- 10 MILLER 1300 KING 5000 10 CLARK 2450 KING 5000 10 KING 5000 KING 5000 20 SMITH 800 FORD 3000 20 ADAMS 1100 FORD 3000 20 JONES 2975 FORD 3000 20 SCOTT 3000 FORD 3000 20 FORD 3000 FORD 3000 30 JAMES 950 BLAKE 2850 30 MARTIN 1250 BLAKE 2850 30 WARD 1250 BLAKE 2850 30 TURNER 1500 BLAKE 2850 30 ALLEN 1600 BLAKE 2850 30 BLAKE 2850 BLAKE 2850 14 rows selected.

47 LAST_VALUE with Correct Windowing SELECT DISTINCT deptno, LAST_VALUE ( ename ) OVER ( PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) ename, MAX ( sal ) OVER ( PARTITION BY deptno ) sal FROM scott.emp; DEPTNO ENAME SAL ---------- ---------- ---------- 10 KING 5000 20 FORD 3000 30 BLAKE 2850 3 rows selected. DEPTNO ENAME SAL ---------- ---------- ---------- 10 KING 5000 20 FORD 3000 30 BLAKE 2850 3 rows selected. Our solution Or, is it…? Did you notice that there were two emps in deptno 20 who earned $3000? DeptnoEnameSal 20Smith800 20Adams1100 20Jones2975 20Ford3000 20Scott3000

48 RANK function SELECT e.deptno, e.ename, e.sal, RANK () OVER ( PARTITION BY e.deptno ORDER BY e.sal DESC) ranking FROM scott.emp e; DEPTNO ENAME SAL RANKING ---------- ---------- 10 KING 5000 1 10 CLARK 2450 2 10 MILLER 1300 3 20 SCOTT 3000 1 20 FORD 3000 1 20 JONES 2975 3 20 ADAMS 1100 4 20 SMITH 800 5 30 BLAKE 2850 1 30 ALLEN 1600 2 30 TURNER 1500 3 30 MARTIN 1250 4 30 WARD 1250 4 30 JAMES 950 6 14 rows selected. DEPTNO ENAME SAL RANKING ---------- ---------- 10 KING 5000 1 10 CLARK 2450 2 10 MILLER 1300 3 20 SCOTT 3000 1 20 FORD 3000 1 20 JONES 2975 3 20 ADAMS 1100 4 20 SMITH 800 5 30 BLAKE 2850 1 30 ALLEN 1600 2 30 TURNER 1500 3 30 MARTIN 1250 4 30 WARD 1250 4 30 JAMES 950 6 14 rows selected.

49 A solution for the tied records SELECT deptno, ename, sal FROM ( SELECT e.deptno, e.ename, e.sal, RANK () OVER ( PARTITION BY e.deptno ORDER BY e.sal DESC) ranking FROM scott.emp e ) WHERE ranking = 1; DEPTNO ENAME SAL ---------- ---------- ---------- 10 KING 5000 20 SCOTT 3000 20 FORD 3000 30 BLAKE 2850 4 rows selected. DEPTNO ENAME SAL ---------- ---------- ---------- 10 KING 5000 20 SCOTT 3000 20 FORD 3000 30 BLAKE 2850 4 rows selected. Nest query from prior slide as inline view And reference result from analytic function in WHERE clause

50 Ranking Functions FunctionPurposeTies allowed? After a tie RANKOrder records within partition based on ORDER BY YesNumbers skipped DENSE_RANKYesNo skipping ROW_NUMBERNoN/A Dept 30RANKDENSE_RANKROW_NUMBER James950111 Ward125022 3 * Martin125022 2 * Turner1500434 Allen1600545 Blake2850656 * These two could be in either order, when based only on a sort by sal

51 New Function!  NTH_VALUE  Works like FIRST_VALUE and LAST_VALUE  Second parameter to specify n  Same caveats apply:  Windowing issues  Deterministic sort

52 Nth Value example EMPNO ENAME JOB MGR SAL DEPTNO NTH_A NTH_B ---------- ------ --------- ---------- ---------- ---------- --------- ------ 7876 ADAMS CLERK 7788 1100 20 7499 ALLEN SALESMAN 7698 1600 30 SALESMAN ALLEN 7698 BLAKE MANAGER 7839 2850 30 SALESMAN ALLEN 7782 CLARK MANAGER 7839 2450 10 SALESMAN ALLEN 7902 FORD ANALYST 7566 3000 20 SALESMAN ALLEN 7900 JAMES CLERK 7698 950 30 SALESMAN ALLEN 7566 JONES MANAGER 7839 2975 20 SALESMAN ALLEN 7839 KING PRESIDENT 5000 10 SALESMAN ALLEN 7654 MARTIN SALESMAN 7698 1250 30 SALESMAN ALLEN 7934 MILLER CLERK 7782 1300 10 SALESMAN ALLEN 7788 SCOTT ANALYST 7566 3000 20 SALESMAN ALLEN 7369 SMITH CLERK 7902 800 20 SALESMAN ALLEN 7844 TURNER SALESMAN 7698 1500 30 SALESMAN ALLEN 7521 WARD SALESMAN 7698 1250 30 SALESMAN ALLEN 14 rows selected. EMPNO ENAME JOB MGR SAL DEPTNO NTH_A NTH_B ---------- ------ --------- ---------- ---------- ---------- --------- ------ 7876 ADAMS CLERK 7788 1100 20 7499 ALLEN SALESMAN 7698 1600 30 SALESMAN ALLEN 7698 BLAKE MANAGER 7839 2850 30 SALESMAN ALLEN 7782 CLARK MANAGER 7839 2450 10 SALESMAN ALLEN 7902 FORD ANALYST 7566 3000 20 SALESMAN ALLEN 7900 JAMES CLERK 7698 950 30 SALESMAN ALLEN 7566 JONES MANAGER 7839 2975 20 SALESMAN ALLEN 7839 KING PRESIDENT 5000 10 SALESMAN ALLEN 7654 MARTIN SALESMAN 7698 1250 30 SALESMAN ALLEN 7934 MILLER CLERK 7782 1300 10 SALESMAN ALLEN 7788 SCOTT ANALYST 7566 3000 20 SALESMAN ALLEN 7369 SMITH CLERK 7902 800 20 SALESMAN ALLEN 7844 TURNER SALESMAN 7698 1500 30 SALESMAN ALLEN 7521 WARD SALESMAN 7698 1250 30 SALESMAN ALLEN 14 rows selected. SELECT emp.*, NTH_VALUE ( job, 2 ) OVER ( ORDER BY ename ) nth_a, NTH_VALUE ( ename, 2 ) OVER ( ORDER BY ename ) nth_b FROM scott.emp e ;

53 What do you mean by “second highest”? Dept 30RANKDENSE_RANKROW_NUMBER James950111 Ward125022 3 * Martin125022 2 * Turner1500434 Allen1600545 Blake2850656 Dept 20RANKDENSE_RANKROW_NUMBER Ford300011 2 * Scott300011 1 * Jones2975323 Adams1100434 Smith800545

54 How can you compare a subset of the results to a larger set of results?

55  How can you compare a subset of the result set to the overall results? Slicing and Dicing SELECT channel_id, COUNT(*) FROM sh.sales WHERE TRUNC ( time_id, 'MM') = DATE '2001-05-01' GROUP BY channel_id; We’ll work on a query for the May numbers first CHANNEL_ID COUNT(*) ---------- 2 5823 3 9577 4 3788 3 rows selected. CHANNEL_ID COUNT(*) ---------- 2 5823 3 9577 4 3788 3 rows selected. Channel IDMay 20012001 – First Five Months NumberPercentNumberPercent 2582330.3%3080930.4% 3957749.9%4967949.0% 4378819.7%2097520.7%

56 Calculating the percentage  First, we need to know the total number of records SELECT channel_id, COUNT(*), COUNT(*) OVER () FROM sh.sales WHERE TRUNC ( time_id, 'MM') = DATE '2001-05-01' GROUP BY channel_id; CHANNEL_ID COUNT(*) COUNT(*)OVER() ---------- ---------- -------------- 2 5823 3 3 9577 3 4 3788 3 3 rows selected. CHANNEL_ID COUNT(*) COUNT(*)OVER() ---------- ---------- -------------- 2 5823 3 3 9577 3 4 3788 3 3 rows selected. An attempt using analytic functions The problem here is that the analytic function gets calculated AFTER the GROUP BY, so the analytic count returns the number of grouped items

57 Calculating the percentage – Try #2 SELECT channel_id, COUNT(*), SUM( COUNT(*) ) OVER () AS tot_recs, COUNT(*) / SUM( COUNT(*) ) OVER () AS pct FROM sh.sales WHERE TRUNC ( time_id, 'MM') = DATE '2001-05-01' GROUP BY channel_id Channel IDMay 20012001 – First Five Months NumberPercentNumberPercent 2582330.3%3080930.4% 3957749.9%4967949.0% 4378819.7%2097520.7% This time we nest the COUNT(*) inside of an analytic function. The COUNT(*) on each record returns the same number as the field above it -- - - - - - - - - - - - - - - - - - - - Then, the analytic SUM () OVER () adds those three counts This time we nest the COUNT(*) inside of an analytic function. The COUNT(*) on each record returns the same number as the field above it -- - - - - - - - - - - - - - - - - - - - Then, the analytic SUM () OVER () adds those three counts CHANNEL_ID COUNT(*) TOT_RECS PCT ---------- ---------- ---------- ------ 2 5823 19188 0.303 3 9577 19188 0.499 4 3788 19188 0.197 3 rows selected. CHANNEL_ID COUNT(*) TOT_RECS PCT ---------- ---------- ---------- ------ 2 5823 19188 0.303 3 9577 19188 0.499 4 3788 19188 0.197 3 rows selected. How’s that for SQL Magic?!

58 SELECT channel_id, 'MAY' AS period, COUNT(*) AS rec_cnt, SUM( COUNT(*) ) OVER () AS tot_recs, COUNT(*) / SUM( COUNT(*) ) OVER () AS pct FROM sh.sales WHERE TRUNC ( time_id, 'MM') = DATE '2001-05-01' GROUP BY channel_id Back to our problem…  Summary totals from two different time periods SELECT channel_id, 'MAY' AS period, COUNT(*) AS rec_cnt, SUM( COUNT(*) ) OVER () AS tot_recs, COUNT(*) / SUM( COUNT(*) ) OVER () AS pct FROM sh.sales WHERE TRUNC ( time_id, 'MM') = DATE '2001-05-01' GROUP BY channel_id UNION ALL SELECT channel_id, 'YTD', COUNT(*), SUM( COUNT(*) ) OVER (), COUNT(*) / SUM( COUNT(*) ) OVER () AS pct FROM sh.sales WHERE TRUNC ( time_id, 'MM') <= DATE '2001-05-01' AND TRUNC ( time_id, 'Y') = DATE '2001-01-01' GROUP BY channel_id; Counts from May (original query) YTD Counts With UNION ALL CHANNEL_ID PER COUNT(*) TOT_RECS PCT ---------- --- ---------- -------- ------ 2 MAY 5823 19188 0.303 3 MAY 9577 19188 0.499 4 MAY 3788 19188 0.197 2 YTD 30809 101463 0.304 3 YTD 49679 101463 0.490 4 YTD 20975 101463 0.207 6 rows selected. CHANNEL_ID PER COUNT(*) TOT_RECS PCT ---------- --- ---------- -------- ------ 2 MAY 5823 19188 0.303 3 MAY 9577 19188 0.499 4 MAY 3788 19188 0.197 2 YTD 30809 101463 0.304 3 YTD 49679 101463 0.490 4 YTD 20975 101463 0.207 6 rows selected.

59 SELECT channel_id, CASE WHEN period = 'MAY' THEN rec_cnt END may_rec_cnt, CASE WHEN period = 'MAY' THEN pct END may_pct, CASE WHEN period = 'YTD' THEN rec_cnt END ytd_rec_cnt, CASE WHEN period = 'YTD' THEN pct END ytd_pct FROM ( SELECT channel_id, 'MAY' AS period, COUNT(*) AS rec_cnt, SUM ( COUNT(*) ) OVER () AS tot_recs, COUNT(*) / SUM( COUNT(*) ) OVER () AS pct FROM sh.sales WHERE TRUNC ( time_id, 'MM') = DATE '2001-05-01' GROUP BY channel_id UNION ALL SELECT channel_id, 'YTD', COUNT(*), SUM( COUNT(*) ) OVER (), COUNT(*) / SUM( COUNT(*) ) OVER () FROM sh.sales WHERE TRUNC ( time_id, 'MM') <= DATE '2001-05-01' AND TRUNC ( time_id, 'Y') = DATE '2001-01-01' GROUP BY channel_id ); This nested query from is the query from the previous slide Use CASE statements to segregate the counts from the two periods CHANNEL_ID MAY_REC_CNT MAY_PCT YTD_REC_CNT YTD_PCT ---------- ----------- ------- ----------- ------- 2 5823 0.303 3 9577 0.499 4 3788 0.197 2 30809 0.304 3 49679 0.490 4 20975 0.207 6 rows selected. CHANNEL_ID MAY_REC_CNT MAY_PCT YTD_REC_CNT YTD_PCT ---------- ----------- ------- ----------- ------- 2 5823 0.303 3 9577 0.499 4 3788 0.197 2 30809 0.304 3 49679 0.490 4 20975 0.207 6 rows selected.

60 SELECT channel_id, MAX ( CASE WHEN period = 'MAY' THEN rec_cnt END ) may_rec_cnt, MAX ( CASE WHEN period = 'MAY' THEN pct END ) may_pct, MAX ( CASE WHEN period = 'YTD' THEN rec_cnt END ) ytd_rec_cnt, MAX ( CASE WHEN period = 'YTD' THEN pct END ) ytd_pct FROM ( SELECT channel_id, 'MAY' AS period, COUNT(*) AS rec_cnt, SUM ( COUNT(*) ) OVER () AS tot_recs, COUNT(*) / SUM( COUNT(*) ) OVER () AS pct FROM sh.sales WHERE TRUNC ( time_id, 'MM') = DATE '2001-05-01' GROUP BY channel_id UNION ALL SELECT channel_id, 'YTD', COUNT(*), SUM( COUNT(*) ) OVER (), COUNT(*) / SUM( COUNT(*) ) OVER () FROM sh.sales WHERE TRUNC ( time_id, 'MM') <= DATE '2001-05-01' AND TRUNC ( time_id, 'Y') = DATE '2001-01-01' GROUP BY channel_id ) GROUP BY channel_id; CHANNEL_ID MAY_REC_CNT MAY_PCT YTD_REC_CNT YTD_PCT ---------- ----------- ------- ----------- ------- 2 5823 0.303 30809 0.304 3 9577 0.499 49679 0.490 4 3788 0.197 20975 0.207 3 rows selected. CHANNEL_ID MAY_REC_CNT MAY_PCT YTD_REC_CNT YTD_PCT ---------- ----------- ------- ----------- ------- 2 5823 0.303 30809 0.304 3 9577 0.499 49679 0.490 4 3788 0.197 20975 0.207 3 rows selected. MAX () Here and GROUP BY at the bottom MAX () Here and GROUP BY at the bottom

61 Is there another way?  Of course! We need the magic…right?! Channel IDMay 20012001 – First Five Months NumberPercentNumberPercent 2582330.3%632530.4% 3957749.9%129249.0% 4378819.7%8620.7% May Jan Feb Mar Apr Execute just one query over the entire date range. For the smaller range, count only those records which are in that period

62 Simultaneous Counts SELECT channel_id, COUNT ( CASE WHEN TRUNC ( time_id, 'MM') = DATE '2001-05-01' THEN 1 END ) AS may_rec_cnt, COUNT ( CASE WHEN TRUNC ( time_id, 'MM') = DATE '2001-05-01' THEN 1 END ) / SUM( COUNT(CASE WHEN TRUNC ( time_id, 'MM') = DATE '2001-05-01' THEN 1 END ) ) OVER () AS may_pct, COUNT(*) ytd_rec_cnt, COUNT(*) / SUM( COUNT(*) ) OVER () ytd_pct FROM sh.sales WHERE TRUNC ( time_id, 'MM') <= DATE '2001-05-01' AND TRUNC ( time_id, 'Y') = DATE '2001-01-01' GROUP BY channel_id; CHANNEL_ID MAY_REC_CNT MAY_PCT YTD_REC_CNT YTD_PCT ---------- ----------- ------- ----------- ------- 2 5823 0.303 30809 0.304 3 9577 0.499 49679 0.490 4 3788 0.197 20975 0.207 3 rows selected. CHANNEL_ID MAY_REC_CNT MAY_PCT YTD_REC_CNT YTD_PCT ---------- ----------- ------- ----------- ------- 2 5823 0.303 30809 0.304 3 9577 0.499 49679 0.490 4 3788 0.197 20975 0.207 3 rows selected. WHERE clause covers entire period of interest CASE statements to sort out records pertaining to May Oh, oh… it’s Magic…

63 Find next order placed within One Month of a prior order by the same customer

64 Re-orders within 1 month SELECT oe1.customer_id, TRUNC ( oe1.order_date ) order_dt1, TRUNC ( oe2.order_date ) order_dt2 FROM oe.orders oe1 JOIN oe.orders oe2 ON ( oe1.customer_id = oe2.customer_id AND oe2.order_date BETWEEN oe1.order_date + (1/(24*60*60)) AND ADD_MONTHS ( oe1.order_date, 1 ) ); CUSTOMER_ID ORDER_DT1 ORDER_DT2 ----------- --------- --------- 103 13-SEP-99 02-OCT-99 105 08-JAN-00 26-JAN-00 145 28-AUG-99 20-SEP-99 146 15-MAY-99 13-JUN-99 148 06-DEC-99 17-DEC-99 149 13-SEP-99 06-OCT-99 6 rows selected. CUSTOMER_ID ORDER_DT1 ORDER_DT2 ----------- --------- --------- 103 13-SEP-99 02-OCT-99 105 08-JAN-00 26-JAN-00 145 28-AUG-99 20-SEP-99 146 15-MAY-99 13-JUN-99 148 06-DEC-99 17-DEC-99 149 13-SEP-99 06-OCT-99 6 rows selected. Self join. Ie: same table is included in FROM list more than once

65 Analytic Function  Use Range Windowing… Row windowing: Restricts window by records ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Range windowing: Restricts window by a period of time References field used in ORDER BY Analytic function will include all records within 10 days of the record in question RANGE BETWEEN INTERVAL '10' DAY PRECEDING AND INTERVAL '10' DAY FOLLOWING

66 Re-orders Within One Month SELECT customer_id, TRUNC ( order_date ) order_date, COUNT ( * ) OVER ( PARTITION BY customer_id ORDER BY order_date RANGE BETWEEN CURRENT ROW AND INTERVAL '1' MONTH FOLLOWING ) cnt FROM oe.orders WHERE customer_id IN ( 103, 105 ); CUSTOMER_ID ORDER_DAT CNT ----------- --------- ---------- 103 29-MAR-97 1 103 01-SEP-98 1 103 13-SEP-99 2 103 02-OCT-99 1 105 20-MAR-99 1 105 31-AUG-99 1 105 08-JAN-00 2 105 26-JAN-00 1 8 rows selected. CUSTOMER_ID ORDER_DAT CNT ----------- --------- ---------- 103 29-MAR-97 1 103 01-SEP-98 1 103 13-SEP-99 2 103 02-OCT-99 1 105 20-MAR-99 1 105 31-AUG-99 1 105 08-JAN-00 2 105 26-JAN-00 1 8 rows selected. Hard-coded two of the customers for demonstration of the concept RANGE looks 1 month forward, using “order_date” field These are the records with another order within one month Why do all of the records have a COUNT(*) of 1?

67 Single Pass Solution! SELECT customer_id, TRUNC ( order_date ) order_dt1, TRUNC ( next_order_dt ) order_dt2 FROM ( SELECT customer_id, order_date, MIN ( order_date ) OVER ( PARTITION BY customer_id ORDER BY order_date RANGE BETWEEN INTERVAL '1' SECOND FOLLOWING AND INTERVAL '1' MONTH FOLLOWING ) AS next_order_dt FROM oe.orders ) WHERE next_order_dt IS NOT NULL ORDER BY customer_id, order_date; CUSTOMER_ID ORDER_DT1 ORDER_DT2 ----------- --------- --------- 103 13-SEP-99 02-OCT-99 105 08-JAN-00 26-JAN-00 145 28-AUG-99 20-SEP-99 146 15-MAY-99 13-JUN-99 148 06-DEC-99 17-DEC-99 149 13-SEP-99 06-OCT-99 6 rows selected. CUSTOMER_ID ORDER_DT1 ORDER_DT2 ----------- --------- --------- 103 13-SEP-99 02-OCT-99 105 08-JAN-00 26-JAN-00 145 28-AUG-99 20-SEP-99 146 15-MAY-99 13-JUN-99 148 06-DEC-99 17-DEC-99 149 13-SEP-99 06-OCT-99 6 rows selected. Using one second after the current record, excludes the current record An additional advantage of this technique is that each order is matched only with its next closest order.

68 Matching only once Order dates for Customer_id 149 Mar 11, 1999 Sept 13, 1999 Oct 6, 1999 Nov 10, 1999 Jun 26, 2000  Consider an expansion of the requirement from 1 to 2 months SELECT oe1.customer_id, TRUNC ( oe1.order_date ) order_dt1, TRUNC ( oe2.order_date ) order_dt2 FROM oe.orders oe1 JOIN oe.orders oe2 ON ( oe1.customer_id = oe2.customer_id AND oe2.order_date BETWEEN oe1.order_date + (1/(24*60*60)) AND ADD_MONTHS ( oe1.order_date, 2 ) ) WHERE oe1.customer_id = 149; Three orders within 60 days CUSTOMER_ID ORDER_DT1 ORDER_DT2 ----------- --------- --------- 149 13-SEP-99 06-OCT-99 149 13-SEP-99 10-NOV-99 149 06-OCT-99 10-NOV-99 3 rows selected. CUSTOMER_ID ORDER_DT1 ORDER_DT2 ----------- --------- --------- 149 13-SEP-99 06-OCT-99 149 13-SEP-99 10-NOV-99 149 06-OCT-99 10-NOV-99 3 rows selected. The Sept 13 order appears twice in the results

69 With RANGE WINDOW Technique  Order appears only once with its nearest follower SELECT customer_id, TRUNC ( order_date ) order_dt1, TRUNC ( next_order_dt ) order_dt2 FROM ( SELECT customer_id, order_date, MIN ( order_date ) OVER ( PARTITION BY customer_id ORDER BY order_date RANGE BETWEEN INTERVAL '1' SECOND FOLLOWING AND INTERVAL '2' MONTH FOLLOWING ) AS next_order_dt FROM oe.orders WHERE customer_id = 149 ) WHERE next_order_dt IS NOT NULL; CUSTOMER_ID ORDER_DT1 ORDER_DT2 ----------- --------- --------- 149 13-SEP-99 06-OCT-99 149 06-OCT-99 10-NOV-99 2 rows selected. CUSTOMER_ID ORDER_DT1 ORDER_DT2 ----------- --------- --------- 149 13-SEP-99 06-OCT-99 149 06-OCT-99 10-NOV-99 2 rows selected.

70 Aside  Designating an Interval  An Interval is a period of time  Between two dates or timestamps 1. Why is the number enclosed in single quotes? 2. Why is the unit singular (“DAY” instead of “DAYS”)? INTERVAL '10' DAY

71 Designating an Interval INTERVAL '3' DAY Keyword Number of Units Unit Type(s)  Number of Units is a varchar string  (enclosed in single quotes)  Number of units can include values for more than one unit type  Multiple units: specify first and last, separated by keyword “TO” INTERVAL '7' HOUR INTERVAL '7:45' HOUR TO MINUTE INTERVAL '7:45' MINUTE TO SECOND INTERVAL '7:45:00' HOUR TO SECOND INTERVAL '3 7:45:00' DAY TO SECOND INTERVAL '3 7:45' DAY TO MINUTE Think of these units designations as akin to a format mask used with TO_DATE. You are specifying the significance of the numbers. Note that you include only the first and last units. Think of these units designations as akin to a format mask used with TO_DATE. You are specifying the significance of the numbers. Note that you include only the first and last units. Varchar

72 How does it work? WITH dta AS ( SELECT TO_DATE ( '02/12/2011 06:30:00', 'mm/dd/yyyy hh24:mi:ss' ) dt FROM dual ) SELECT dt, dt - ( 7.75 / 24 ) new_dt1, dt - INTERVAL '07:45' HOUR TO MINUTE new_dt2, TO_CHAR ( dt - ( 7.75 / 24 ), 'mm/dd/yyyy hh24:mi:ss' ) new_dt1_char, TO_CHAR ( dt - INTERVAL '07:45' HOUR TO MINUTE, 'mm/dd/yyyy hh24:mi:ss' ) new_dt2_char FROM dta; DT NEW_DT1 NEW_DT2 NEW_DT1_CHAR NEW_DT2_CHAR --------- --------- --------- ------------------- ------------------- 12-FEB-11 11-FEB-11 11-FEB-11 02/11/2011 22:45:00 02/11/2011 22:45:00 1 row selected. DT NEW_DT1 NEW_DT2 NEW_DT1_CHAR NEW_DT2_CHAR --------- --------- --------- ------------------- ------------------- 12-FEB-11 11-FEB-11 11-FEB-11 02/11/2011 22:45:00 02/11/2011 22:45:00 1 row selected. Subtracting 7 hours and 45 minutes Subtracting 7 hours and 45 minutes

73 Designating an Interval  These Interval fields are equivalent: SELECT INTERVAL '3' DAY AS interv_1, INTERVAL '3 00:00:00' DAY TO SECOND AS interv_2, INTERVAL '72' HOUR AS interv_3, INTERVAL '4320' MINUTE AS interv_4 FROM dual; INTERV_1 INTERV_2 INTERV_3 INTERV_4 ----------------- --------------------- ----------------- ------------------- +03 00:00:00 +03 00:00:00.000000 +03 00:00:00 +03 00:00:00 1 row selected. INTERV_1 INTERV_2 INTERV_3 INTERV_4 ----------------- --------------------- ----------------- ------------------- +03 00:00:00 +03 00:00:00.000000 +03 00:00:00 +03 00:00:00 1 row selected. All of these express the interval three days

74 Interval Error  This is a generic error, raised in many situations  But, one possibility with Intervals is… ORA-00923: FROM keyword not found where expected INTERVAL 3 DAY INTERVAL '3' DAY Results in ORA-00923 Solution

75 Interval Error ORA-30089: missing or invalid INTERVAL '03-04-05' YEAR TO DAY Results in ORA-30089 You cannot specify an interval than spans between months and days. The two valid ranges for interval units are: YEAR >> MONTH DAYS >> SECOND Solution

76 Interval Error  Don’t you love unhelpful error messages? ORA-01867: the interval is invalid INTERVAL '03:04:05' HOUR TO MINUTE Results in ORA-01867 Solution INTERVAL '03:04:05' HOUR TO SECOND The unit specification does not match the literal

77 Interval Error  Meaning: value specified exceeds the default precision specification for the interval component  Solution, specify a higher precision ORA-01873: the leading precision of the interval is too small INTERVAL '300' DAY INTERVAL '300' DAY(3) Results in ORA-01873 Solution Unit ComponentDefault Precision DAY2 HOUR3 MINUTE5 SECOND7

78 Can you find customers with “online” order but no “direct” orders?

79  Using a subquery Find records with A but not B SELECT customer_id FROM oe.orders WHERE order_mode = 'online' AND customer_id NOT IN ( SELECT customer_id FROM oe.orders WHERE order_mode = 'direct' ); CUSTOMER_ID ----------- 119 120 121 122 123 141 142 143 150 151 152 11 rows selected. CUSTOMER_ID ----------- 119 120 121 122 123 141 142 143 150 151 152 11 rows selected. This is the traditional manner that questions like this have been solved in SQL

80 Find records with A but not B SELECT customer_id, COUNT ( CASE WHEN order_mode = 'direct' THEN 1 END ) direct_cnt, COUNT ( CASE WHEN order_mode = 'online' THEN 1 END ) online_cnt FROM oe.orders WHERE order_mode = 'online' AND customer_id NOT IN ( SELECT customer_id FROM oe.orders WHERE order_mode = 'direct' ) OR customer_id IN ( 149, 170 ) GROUP BY customer_id; CUSTOMER_ID DIRECT_CNT ONLINE_CNT ----------- ---------- ---------- 119 0 1 120 0 1 121 0 1 122 0 1 123 0 1 141 0 1 142 0 1 143 0 1 150 0 1 151 0 1 152 0 1 149 3 2 170 1 0 13 rows selected. CUSTOMER_ID DIRECT_CNT ONLINE_CNT ----------- ---------- ---------- 119 0 1 120 0 1 121 0 1 122 0 1 123 0 1 141 0 1 142 0 1 143 0 1 150 0 1 151 0 1 152 0 1 149 3 2 170 1 0 13 rows selected. For purposes of this demonstration, I’ve hard- coded a couple of customers who DO have direct orders Can this be used to reduce this SQL to a single-pass query?

81  Place the statements in HAVING clause Find records with A but not B – one pass SELECT customer_id FROM oe.orders GROUP BY customer_id HAVING COUNT ( CASE WHEN order_mode = 'direct' THEN 1 END ) = 0 AND COUNT ( CASE when order_mode = 'online' THEN 1 END ) > 0; CUSTOMER_ID ----------- 119 120 121 122 123 141 142 143 150 151 152 11 rows selected. CUSTOMER_ID ----------- 119 120 121 122 123 141 142 143 150 151 152 11 rows selected. No “direct” orders And, at least one “online” order The same 11 customers we found with the subquery

82  But, what if we want to see details about the purchase? Find records with A but not B – one pass SELECT order_id, customer_id, order_status, order_total FROM ( SELECT o.*, COUNT ( DISTINCT order_mode ) OVER ( PARTITION BY customer_id ) cnt FROM oe.orders o WHERE order_mode = ANY ( 'direct', 'online') ) WHERE cnt = 1 AND order_mode = 'online'; ORDER_ID CUSTOMER_ID ORDER_STATUS ORDER_TOTAL ---------- ----------- ------------ ----------- 2372 119 9 16447.2 2373 120 4 416 2374 121 0 4797 2375 122 2 103834.4 2376 123 6 11006.2 2377 141 5 38017.8 2378 142 5 25691.3 2380 143 3 27132.6 2388 150 4 282694.3 2389 151 4 17620 2390 152 9 7616.8 11 rows selected. ORDER_ID CUSTOMER_ID ORDER_STATUS ORDER_TOTAL ---------- ----------- ------------ ----------- 2372 119 9 16447.2 2373 120 4 416 2374 121 0 4797 2375 122 2 103834.4 2376 123 6 11006.2 2377 141 5 38017.8 2378 142 5 25691.3 2380 143 3 27132.6 2388 150 4 282694.3 2389 151 4 17620 2390 152 9 7616.8 11 rows selected. This analytic function will return 1 or 2 for each customer The WHERE clause in the outer query, limit the returns only those records with “online” orders and with only one type of order.

83 How do I learn all of this stuff?  One source: Oracle’s OTN SQL & PLSQL forum  http://forums.oracle.com/forums/forum.jspa?forumID=75

84 Here’s one straight from the forums http://forums.oracle.com/forums/thread.jspa?threadID=2192356 Can you find the most common character in a string?

85 CONNECT BY LEVEL < x Most common character SELECT the_char FROM ( SELECT the_char, COUNT (*) the_count FROM ( SELECT SUBSTR (:the_string, LEVEL, 1) the_char FROM dual CONNECT BY LEVEL <= LENGTH (:the_string) ) GROUP BY the_char ORDER BY 2 DESC ) WHERE ROWNUM = 1; SELECT SUBSTR (:the_string, LEVEL, 1) the_char FROM dual CONNECT BY LEVEL <= LENGTH (:the_string) SELECT SUBSTR (:the_string, LEVEL, 1) the_char FROM dual CONNECT BY LEVEL <= LENGTH (:the_string) 1 2 3 4... 10 SUSBTR ( 'ABACADABRA', 1, 1) A SUSBTR ( 'ABACADABRA', 2, 1) B SUSBTR ( 'ABACADABRA', 3, 1) A SUSBTR ( 'ABACADABRA', 4, 1) C... SUSBTR ( 'ABACADABRA', 10, 1) A the_charCOUNT A5 B2 C1 D1 R1 COUNT / GROUP BY Rownum 1 Method 1

86 Most common character MISSISSIPPI SELECT the_char FROM ( SELECT the_char, COUNT (*) the_count FROM ( SELECT SUBSTR (:the_string, LEVEL, 1) the_char FROM dual CONNECT BY LEVEL <= LENGTH (:the_string) ) GROUP BY the_char ORDER BY 2 DESC ) WHERE ROWNUM = 1; T - I 1 row selected. T - I 1 row selected. variable the_string VARCHAR2(100); exec :the_string := 'MISSISSIPPI'; PL/SQL procedure successfully completed. variable the_string VARCHAR2(100); exec :the_string := 'MISSISSIPPI'; PL/SQL procedure successfully completed. “I” is the most common character But wait! What about “S”? 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 Method 1 ROWNUM = 1 returns only one row. It cannot handle the possibility of a tie

87 Most common character  Uses analytic function: DENSE_RANK  Eliminates a level of subquery nesting  Clever use of aggregate inside the ORDER BY for analytic WITH got_rnk AS ( SELECT SUBSTR ( :txt, LEVEL, 1 ) AS letter, DENSE_RANK () OVER ( ORDER BY COUNT (*) DESC ) AS rnk FROM dual CONNECT BY LEVEL <= LENGTH ( :txt ) GROUP BY SUBSTR (:txt, LEVEL, 1 ) ) SELECT letter FROM got_rnk WHERE rnk = 1; Improved txtCOUNT I4 S4 P2 M1 MISSISSIPPI DENSE_RANK 1 1 2 3 This column used for ORDER BY (DESC) Both values get selected

88 Most common character WITH got_rnk AS ( SELECT SUBSTR ( :txt, LEVEL, 1 ) AS letter, DENSE_RANK () OVER ( ORDER BY COUNT (*) DESC ) AS rnk FROM dual CONNECT BY LEVEL <= LENGTH ( :txt ) GROUP BY SUBSTR (:txt, LEVEL, 1 ) ) SELECT letter FROM got_rnk WHERE rnk = 1; variable txt varchar2(100); exec :txt := 'SAN FRANCISCO GIANTS'; PL/SQL procedure successfully completed. variable txt varchar2(100); exec :txt := 'SAN FRANCISCO GIANTS'; PL/SQL procedure successfully completed. L - A N S 3 rows selected. L - A N S 3 rows selected.

89 SPREADSHEET LOGIC

90 Can you calculate inventory usage?

91 Inventory Usage WITH inventories AS ( SELECT 0 wk, 350 end_inv FROM DUAL UNION ALL SELECT 1 wk, 750 end_inv FROM DUAL UNION ALL SELECT 2 wk, 150 end_inv FROM DUAL UNION ALL SELECT 3 wk, 450 end_inv FROM DUAL UNION ALL SELECT 4 wk, 125 end_inv FROM DUAL UNION ALL SELECT 5 wk, 50 end_inv FROM DUAL ), purchases AS ( SELECT 1 wk, 800 purch FROM DUAL UNION ALL SELECT 3 wk, 600 purch FROM DUAL ), net_transfers AS ( SELECT 2 wk, -250 trans FROM DUAL UNION ALL SELECT 5 wk, 300 trans FROM DUAL ) SELECT i.wk,p.purch, t.trans, i.end_inv FROM inventories i LEFT JOIN purchases p ON i.wk = p.wk LEFT JOIN net_transfers t ON i.wk = t.wk; WK PURCH TRANS END_INV ---------- ---------- 0 350 1 800 750 2 -250 150 3 600 450 4 125 5 300 50 6 rows selected. WK PURCH TRANS END_INV ---------- ---------- 0 350 1 800 750 2 -250 150 3 600 450 4 125 5 300 50 6 rows selected. Dataset

92 An Inventory Usage Problem Week Beginning InventoryPurchases Transfers In/Out Ending InventoryUsage 0350 1 800750400 2750-250150350 3150600450300 4450125325 512530050375 Beginning Inv + Purchases +/- Transfers -Ending Inv = USAGE Beginning Inv + Purchases +/- Transfers -Ending Inv = USAGE Ending Inventory from prior week Given

93 Inventory Usage SELECT i.wk, LAG ( end_inv ) OVER ( ORDER BY i.wk ) AS beg_inv, p.purch, t.trans, i.end_inv, LAG ( end_inv ) OVER ( ORDER BY i.wk ) + NVL( p.purch, 0 ) + NVL ( t.trans, 0 ) - i.end_inv AS usage FROM inventories i LEFT JOIN purchases p ON i.wk = p.wk LEFT JOIN net_transfers t ON i.wk = t.wk WK BEG_INV PURCH TRANS END_INV USAGE ---------- ---------- ---------- 0 350 1 350 800 750 400 2 750 -250 150 350 3 150 600 450 300 4 450 125 325 5 125 300 50 375 6 rows selected. WK BEG_INV PURCH TRANS END_INV USAGE ---------- ---------- ---------- 0 350 1 350 800 750 400 2 750 -250 150 350 3 150 600 450 300 4 450 125 325 5 125 300 50 375 6 rows selected. LAG ( end_inv ) OVER ( ORDER BY wk )

94 Can you compute a running balance?

95 CustIDOrder DateOrder Total 1018/16/199978279.60 10110/02/199929669.90 1013/29/200048552.00 1017/27/200033893.60 Account Balance CustIDOrder DateOrder TotalBalance 1018/16/199978279.60 10110/02/199929669.90107949.50 1013/29/200048552.00156501.50 1017/27/200033893.60190393.10 This is a derived column The sum of the current order and the preceding balance ABCD = C3 + D2 = C2 1 2 3 4 5 = C4 + D3 = C5 + D4 SELECT TRUNC ( order_date ) AS order_date, order_total, order_total + LAG ( balance ) OVER ( ORDER BY order_date ) AS balance FROM oe.orders WHERE customer_id = 101;, order_total + LAG ( balance ) OVER ( ORDER BY order_date ) AS balance * ERROR at line 3: ORA-00904: "BALANCE": invalid identifier, order_total + LAG ( balance ) OVER ( ORDER BY order_date ) AS balance * ERROR at line 3: ORA-00904: "BALANCE": invalid identifier Referencing the same column in the analytic function as is being calculated by the analytic is not permitted

96 Account Balance CustIDOrder DateOrder TotalBalance 1018/16/199978279.60 10110/02/199929669.90107949.50 1013/29/200048552.00156501.50 1017/27/200033893.60190393.10 The sum of the current order and all of the previous orders ABCD = C3 + C2 = C2 1 2 3 4 5 = C4 + C3 + C2 = C5 + C4 + C3 + C2

97 Account Balance SELECT x.*, CASE rec_cnt WHEN 1 THEN LAG ( order_total, 0 ) OVER ( ORDER BY order_date ) WHEN 2 THEN LAG ( order_total, 0 ) OVER ( ORDER BY order_date ) + CASE WHEN rn = 2 THEN LAG ( order_total, 1 ) OVER ( ORDER BY order_date ) ELSE 0 END WHEN 3 THEN LAG ( order_total, 0 ) OVER ( order by order_date ) + CASE WHEN rn > 1 THEN LAG ( order_total, 1 ) OVER ( ORDER BY order_date ) ELSE 0 END + CASE WHEN rn > 2 THEN LAG ( order_total, 2 ) OVER ( ORDER BY order_date ) ELSE 0 END WHEN 4 THEN LAG ( order_total, 0 ) OVER ( order by order_date ) + CASE WHEN rn > 1 THEN LAG ( order_total, 1 ) OVER ( ORDER BY order_date ) ELSE 0 END + CASE WHEN rn > 2 THEN LAG ( order_total, 2 ) OVER ( ORDER BY order_date ) ELSE 0 END + CASE WHEN rn > 3 THEN LAG ( order_total, 3 ) OVER ( ORDER BY order_date ) ELSE 0 END END AS prior_total FROM ( SELECT TRUNC ( order_date ) AS order_date, order_total, ROW_NUMBER () OVER ( ORDER BY order_date ) AS rn, COUNT (*) OVER ( ) AS rec_cnt FROM oe.orders WHERE customer_id = 101 ) x ORDER_DAT ORDER_TOTAL RN REC_CNT PRIOR_TOTAL --------- ----------- ---------- ---------- ----------- 16-AUG-99 78279.6 1 4 78279.6 02-OCT-99 29669.9 2 4 107949.5 29-MAR-00 48552 3 4 156501.5 27-JUL-00 33893.6 4 4 190395.1 4 rows selected. ORDER_DAT ORDER_TOTAL RN REC_CNT PRIOR_TOTAL --------- ----------- ---------- ---------- ----------- 16-AUG-99 78279.6 1 4 78279.6 02-OCT-99 29669.9 2 4 107949.5 29-MAR-00 48552 3 4 156501.5 27-JUL-00 33893.6 4 4 190395.1 4 rows selected. It works, but…. It’s not really scalable, and it would be a bear to debug!

98 MODEL Clause SELECT order_date, order_total, running_total FROM (SELECT TRUNC ( order_date ) AS order_date, order_total, ROW_NUMBER () OVER ( ORDER BY order_date ) rn FROM oe.orders WHERE customer_id = 101) MODEL DIMENSION BY ( rn ) MEASURES (order_date, order_total, 0 running_total) RULES AUTOMATIC ORDER ( running_total [ 1 ] = order_total [ 1 ], running_total [ rn > 1 ] = order_total [ cv() ] + running_total [ cv() - 1]); ORDER_DAT ORDER_TOTAL RUNNING_TOTAL --------- ----------- ------------- 16-AUG-99 78279.6 78279.6 02-OCT-99 29669.9 107949.5 29-MAR-00 48552 156501.5 27-JUL-00 33893.6 190395.1 4 rows selected. ORDER_DAT ORDER_TOTAL RUNNING_TOTAL --------- ----------- ------------- 16-AUG-99 78279.6 78279.6 02-OCT-99 29669.9 107949.5 29-MAR-00 48552 156501.5 27-JUL-00 33893.6 190395.1 4 rows selected.

99 Account Balance - MODEL SELECT order_date, order_total, running_total FROM (SELECT TRUNC ( order_date ) AS order_date, order_total, ROW_NUMBER () OVER ( ORDER BY order_date ) rn FROM oe.orders WHERE customer_id = 101) MODEL DIMENSION BY ( rn ) MEASURES (order_date, order_total, 0 running_total) RULES AUTOMATIC ORDER ( running_total [ 1 ] = order_total [ 1 ], running_total [ rn > 1 ] = order_total [ cv() ] + running_total [ cv() - 1]); ORDER_DAT ORDER_TOTAL RUNNING_TOTAL --------- ----------- ------------- 16-AUG-99 78279.6 78279.6 02-OCT-99 29669.9 107949.5 29-MAR-00 48552 156501.5 27-JUL-00 33893.6 190395.1 4 rows selected. ORDER_DAT ORDER_TOTAL RUNNING_TOTAL --------- ----------- ------------- 16-AUG-99 78279.6 78279.6 02-OCT-99 29669.9 107949.5 29-MAR-00 48552 156501.5 27-JUL-00 33893.6 190395.1 4 rows selected. The field “running_total” is not a field in the dataset. DIMENSION BY ( rn ) MEASURES (order_date, order_total, 0 running_total) DIMENSION BY ( rn ) MEASURES (order_date, order_total, 0 running_total) “running_total” is a field created and defined within the MODEL clause But, once it is defined in the MODEL clause, it can be referenced in the SELECT statement

100 Account Balance - MODEL SELECT order_date, order_total, running_total FROM (SELECT TRUNC ( order_date ) AS order_date, order_total, ROW_NUMBER () OVER ( ORDER BY order_date ) rn FROM oe.orders WHERE customer_id = 101) MODEL DIMENSION BY ( rn ) MEASURES (order_date, order_total, 0 running_total) RULES AUTOMATIC ORDER ( running_total [ 1 ] = order_total [ 1 ], running_total [ rn > 1 ] = order_total [ cv() ] + running_total [ cv() - 1]); running_total [ 1 ] = order_total [ 1 ] Analytic function nested in a subquery because MODEL happens before analytic Formulas indicating how “running_total” should be calculated – in row #1 and for the rows greater than one. cv(), inside the square brackets [ ], is a function used inside of MODEL clause to indicate “current_value”of the DIMENSION BY field running_total [ rn > 1 ] = order_total [ cv() ] + running_total [ cv() – 1 ] running_total [ rn > 1 ] = order_total [ cv() ] + running_total [ cv() – 1 ]

101 Account Balance CustIDOrder DateOrder AmtBalance 1018/16/199978279.60 10110/02/199929669.90107949.50 1013/29/200048552.00156501.50 1017/27/200033893.60190395.50 1 2 3 4 5 ABCD =SUM(C$2 : C3) =SUM(C$2 : C4) =SUM(C$2 : C5) =SUM(C$2 : C2)  There is another way to do this in a spreadsheet  Use the SUM function  Absolute references make it easy to copy down

102 More than one customer =SUM(C$2:C3) =SUM(C$2:C4) =SUM(C$2:C5) =SUM(C$6:C6) =SUM(C$6:C7) =SUM(C$2:C2) =SUM(C$6:C8) =SUM(C$6:C9) =SUM(C$10:C10) CustIDOrder Date Balance 1018/16/199978279.60 10110/02/199929669.90107949.50 1013/29/200048552.00156501.50 1017/27/200033893.60190393.10 1029/14/19985610.60 1023/29/199910794.6016405.20 1029/14/199910523.0026928.20 10211/19/199942283.2069211.40 1039/01/1998310.00 ABCD 1 2 3 4 5 6 7 8 9 10 Even absolute references must be reset for a new customer, when working in a spreadsheet…

103 Multiple Customers - Another solution CustIDOrder Date Balance 1018/16/199978279.60 10110/02/199929669.90107949.50 1013/29/200048552.00156501.50 1017/27/200033893.60190393.10 1029/14/19985610.60 1023/29/199910794.6016405.20 1029/14/199910523.0026928.20 10211/19/199942283.2069211.40 1039/01/1998310.00 =IF(A2=A1, D1 + C2, C2) ABCD 1 2 3 4 5 6 7 8 9 10 =IF(A3=A2, D2 + C3, C3) =IF(A4=A3, D3 + C4, C4) =IF(A5=A4, D4 + C5, C5) =IF(A6=A5, D5 + C6, C6) =IF(A7=A6, D6 + C7, C7) =IF(A8=A7, D7 + C8, C8) =IF(A9=A8, D8 + C9, C9) =IF(A10=A9, D9 + C10, C10)

104 Using SUM and Windowing SELECT customer_id, TRUNC ( order_date ) AS order_date, order_total, SUM ( order_total ) OVER ( PARTITION BY customer_id ORDER BY order_date ) AS running_total FROM oe.orders WHERE customer_id <= 103; CUSTOMER_ID ORDER_DAT ORDER_TOTAL RUNNING_TOTAL ----------- --------- ----------- ------------- 101 16-AUG-99 78279.6 78279.6 101 02-OCT-99 29669.9 107949.5 101 29-MAR-00 48552 156501.5 101 27-JUL-00 33893.6 190395.1 102 14-SEP-98 5610.6 5610.6 102 29-MAR-99 10794.6 16405.2 102 14-SEP-99 10523 26928.2 102 19-NOV-99 42283.2 69211.4 103 29-MAR-97 310 310 103 01-SEP-98 13550 13860 103 13-SEP-99 78 13938 103 02-OCT-99 6653.4 20591.4 12 rows selected. CUSTOMER_ID ORDER_DAT ORDER_TOTAL RUNNING_TOTAL ----------- --------- ----------- ------------- 101 16-AUG-99 78279.6 78279.6 101 02-OCT-99 29669.9 107949.5 101 29-MAR-00 48552 156501.5 101 27-JUL-00 33893.6 190395.1 102 14-SEP-98 5610.6 5610.6 102 29-MAR-99 10794.6 16405.2 102 14-SEP-99 10523 26928.2 102 19-NOV-99 42283.2 69211.4 103 29-MAR-97 310 310 103 01-SEP-98 13550 13860 103 13-SEP-99 78 13938 103 02-OCT-99 6653.4 20591.4 12 rows selected. Remember: the default windowing when ORDER BY is used with with analytic SUM is: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. So, this is a running total – everything up to this row

105 Find department tenure records

106 Employee returns to same dept empnodeptnostart dateend_date 1012001/1/200610/31/2006 10130011/1/20062/29/2008 1012003/1/2008 empnodeptnostart dateend_date 1012001/1/20063/31/2006 1012004/1/200610/31/2006 10130011/1/200612/31/2006 1013001/1/20076/30/2007 1013007/1/200711/30/2007 10130012/1/200712/31/2007 1013001/1/20082/29/2008 1012003/1/20086/30/2009 1012007/1/2009 Dimension Table Three different department tenures Each tenure period is found in multiple records How can you find the start and end date of each dept tenure?

107 Here is the dataset WITH emp_dim AS ( SELECT 101 empno, 200 deptno, DATE '2006-01-01' start_date, DATE '2006-03-31' end_date FROM dual UNION ALL SELECT 101, 200, DATE '2006-04-01', DATE '2006-10-31' FROM dual UNION ALL SELECT 101, 300, DATE '2006-11-01', DATE '2006-12-31' FROM dual UNION ALL SELECT 101, 300, DATE '2007-01-01', DATE '2007-06-30' FROM dual UNION ALL SELECT 101, 300, DATE '2007-07-01', DATE '2007-11-30' FROM dual UNION ALL SELECT 101, 300, DATE '2007-12-01', DATE '2007-12-31' FROM dual UNION ALL SELECT 101, 300, DATE '2008-01-01', DATE '2008-02-29' FROM dual UNION ALL SELECT 101, 200, DATE '2008-03-01', DATE '2009-06-30' FROM dual UNION ALL SELECT 101, 200, DATE '2009-07-01', NULL FROM dual ) SELECT * FROM emp_dim EMPNO DEPTNO START_DAT END_DATE ---------- ---------- --------- --------- 101 200 01-JAN-06 31-MAR-06 101 200 01-APR-06 31-OCT-06 101 300 01-NOV-06 31-DEC-06 101 300 01-JAN-07 30-JUN-07 101 300 01-JUL-07 30-NOV-07 101 300 01-DEC-07 31-DEC-07 101 300 01-JAN-08 29-FEB-08 101 200 01-MAR-08 30-JUN-09 101 200 01-JUL-09 9 rows selected. EMPNO DEPTNO START_DAT END_DATE ---------- ---------- --------- --------- 101 200 01-JAN-06 31-MAR-06 101 200 01-APR-06 31-OCT-06 101 300 01-NOV-06 31-DEC-06 101 300 01-JAN-07 30-JUN-07 101 300 01-JUL-07 30-NOV-07 101 300 01-DEC-07 31-DEC-07 101 300 01-JAN-08 29-FEB-08 101 200 01-MAR-08 30-JUN-09 101 200 01-JUL-09 9 rows selected.

108 Using MIN and MAX SELECT empno, deptno, MIN ( start_date ) AS dept_start, MAX ( end_date ) AS dept_end FROM emp_dim GROUP BY empno, deptno; empnodeptnostart dateend_date 1012001/1/20063/31/2006 1012004/1/200610/31/2006 10130011/1/200612/31/2006 1013001/1/20076/30/2007 1013007/1/200711/30/2007 10130012/1/200712/31/2007 1013001/1/20082/29/2008 1012003/1/20086/30/2009 1012007/1/2009 EMPNO DEPTNO DEPT_STAR DEPT_END ---------- ---------- --------- --------- 101 200 01-JAN-06 30-JUN-09 101 300 01-NOV-06 29-FEB-08 2 rows selected. EMPNO DEPTNO DEPT_STAR DEPT_END ---------- ---------- --------- --------- 101 200 01-JAN-06 30-JUN-09 101 300 01-NOV-06 29-FEB-08 2 rows selected. The last record is not included because it is active – NULL end date The two records overlap

109 Solving the first issue SELECT empno, deptno, MIN ( start_date ) AS dept_start, MAX ( NVL ( end_date, DATE '4712-12-31' ) ) AS dept_end FROM emp_dim GROUP BY empno, deptno; empnodeptnostart dateend_date 1012001/1/20063/31/2006 1012004/1/200610/31/2006 10130011/1/200612/31/2006 1013001/1/20076/30/2007 1013007/1/200711/30/2007 10130012/1/200712/31/2007 1013001/1/20082/29/2008 1012003/1/20086/30/2009 1012007/1/2009 EMPNO DEPTNO DEPT_STAR DEPT_END ---------- ---------- --------- --------- 101 200 01-JAN-06 31-DEC-12 101 300 01-NOV-06 29-FEB-08 2 rows selected. EMPNO DEPTNO DEPT_STAR DEPT_END ---------- ---------- --------- --------- 101 200 01-JAN-06 31-DEC-12 101 300 01-NOV-06 29-FEB-08 2 rows selected. We have to find a way to group by each department tenure period But how do we overcome the problem of the overlapping records? Using NVL to supply an end_date value for the active record

110 Conceptual Solution empnodeptnogroupingstart dateend_date 10120011/1/20063/31/2006 10120014/1/200610/31/2006 101300211/1/200612/31/2006 10130021/1/20076/30/2007 10130027/1/200711/30/2007 101300212/1/200712/31/2007 10130021/1/20082/29/2008 10120033/1/20086/30/2009 10120037/1/2009 If each tenure period can be assigned a unique identifier, then that identifier can be used in the GROUP BY with MIN and MAX

111 Creating a “marker”  A derived field to indicate every time the department number is different from the record before it SELECT empno, deptno, start_date, NVL ( end_date, DATE '4712-12-31' ) AS end_date, CASE WHEN deptno != LAG ( deptno, 1, 0 ) OVER ( PARTITION BY empno ORDER by start_date ) THEN 1 END AS marker FROM emp_dim; EMPNO DEPTNO START_DAT END_DATE MARKER ---------- ---------- --------- --------- ---------- 101 200 01-JAN-06 31-MAR-06 1 101 200 01-APR-06 31-OCT-06 101 300 01-NOV-06 31-DEC-06 1 101 300 01-JAN-07 30-JUN-07 101 300 01-JUL-07 30-NOV-07 101 300 01-DEC-07 31-DEC-07 101 300 01-JAN-08 29-FEB-08 101 200 01-MAR-08 30-JUN-09 1 101 200 01-JUL-09 31-DEC-12 9 rows selected. EMPNO DEPTNO START_DAT END_DATE MARKER ---------- ---------- --------- --------- ---------- 101 200 01-JAN-06 31-MAR-06 1 101 200 01-APR-06 31-OCT-06 101 300 01-NOV-06 31-DEC-06 1 101 300 01-JAN-07 30-JUN-07 101 300 01-JUL-07 30-NOV-07 101 300 01-DEC-07 31-DEC-07 101 300 01-JAN-08 29-FEB-08 101 200 01-MAR-08 30-JUN-09 1 101 200 01-JUL-09 31-DEC-12 9 rows selected. Now, we have the basis of a field that can be used for grouping

112 …and now, a “grouper”  This derived field will group together all of the records from the same tenure period SELECT empno, deptno, start_date, end_date, marker, COUNT ( marker ) OVER ( PARTITION BY empno ORDER BY start_date ) AS grouper FROM ( SELECT empno, deptno, start_date, NVL ( end_date, DATE '4712-12-31' ) AS end_date, CASE WHEN deptno != LAG ( deptno, 1, 0 ) OVER ( PARTITION BY empno ORDER by start_date ) THEN 1 END AS marker FROM emp_dim ); EMPNO DEPTNO START_DAT END_DATE MARKER GROUPER ---------- ---------- --------- --------- ---------- ---------- 101 200 01-JAN-06 31-MAR-06 1 1 101 200 01-APR-06 31-OCT-06 1 101 300 01-NOV-06 31-DEC-06 1 2 101 300 01-JAN-07 30-JUN-07 2 101 300 01-JUL-07 30-NOV-07 2 101 300 01-DEC-07 31-DEC-07 2 101 300 01-JAN-08 29-FEB-08 2 101 200 01-MAR-08 30-JUN-09 1 3 101 200 01-JUL-09 31-DEC-12 3 9 rows selected. EMPNO DEPTNO START_DAT END_DATE MARKER GROUPER ---------- ---------- --------- --------- ---------- ---------- 101 200 01-JAN-06 31-MAR-06 1 1 101 200 01-APR-06 31-OCT-06 1 101 300 01-NOV-06 31-DEC-06 1 2 101 300 01-JAN-07 30-JUN-07 2 101 300 01-JUL-07 30-NOV-07 2 101 300 01-DEC-07 31-DEC-07 2 101 300 01-JAN-08 29-FEB-08 2 101 200 01-MAR-08 30-JUN-09 1 3 101 200 01-JUL-09 31-DEC-12 3 9 rows selected. Finally, we have separate identifiers for the two different tenure periods in deptno 200

113 The solution!  Include the « grouper » among the GROUP BY fields SELECT empno, deptno, MIN ( start_date ) AS start_date, MAX ( end_date ) AS end_date FROM ( SELECT empno, deptno, start_date, end_date, marker, COUNT ( marker ) OVER ( PARTITION BY empno ORDER by start_date ) AS grouper FROM ( SELECT empno, deptno, start_date, NVL ( end_date, DATE '4712-12-31' ) AS end_date, CASE WHEN deptno != LAG ( deptno, 1, 0 ) OVER ( PARTITION BY empno ORDER by start_date ) THEN 1 END AS marker FROM emp_dim )) GROUP BY empno, deptno, grouper; EMPNO DEPTNO START_DAT END_DATE ---------- ---------- --------- --------- 101 200 01-JAN-06 31-OCT-06 101 300 01-NOV-06 29-FEB-08 101 200 01-MAR-08 31-DEC-12 3 rows selected. EMPNO DEPTNO START_DAT END_DATE ---------- ---------- --------- --------- 101 200 01-JAN-06 31-OCT-06 101 300 01-NOV-06 29-FEB-08 101 200 01-MAR-08 31-DEC-12 3 rows selected.

114 TRICKY QUERIES

115 A function you’ve probably never heard of

116 Cleaning up imported data SELECT * FROM dan_test; Script to re-create this data is in the notes to this slide F1 ---------------------------------------- Se¿or Ju¿rez ¿When in the course of human events¿ 2 rows selected. F1 ---------------------------------------- Se¿or Ju¿rez ¿When in the course of human events¿ 2 rows selected. Using TOADUsing SQL Developer Both of these records contain unknown or unprintable characters

117 Deciphering the mystery characters SELECT f1, ASCII ( SUBSTR ( f1, 3, 1 )) char1, ASCII ( SUBSTR ( f1, 9, 1 )) char2 FROM dan_test WHERE f1 LIKE 'Se%' UNION ALL SELECT f1, ASCII ( SUBSTR ( f1, 1, 1 )) char1, ASCII ( SUBSTR ( f1, 36, 1 )) char2 FROM dan_test WHERE f1 LIKE '%course%'; F1 CHAR1 CHAR2 ---------------------------------------- ---------- ---------- Se¿or Ju¿rez 241 225 ¿When in the course of human events¿ 147 148 2 rows selected. F1 CHAR1 CHAR2 ---------------------------------------- ---------- ---------- Se¿or Ju¿rez 241 225 ¿When in the course of human events¿ 147 148 2 rows selected. Record Char Position Char NumberCharacter Se¿or Ju¿rez 3241 ñ Se¿or Ju¿rez 9225 á ¿When in the... 1147 “ ¿When in the... 36148 ”

118 Unlocking the mystery characters SELECT f1, DUMP ( f1 ) FROM dan_test; F1 DUMP(F1) ---------------------------------------- ------------------------------------------------------------ Se¿or Ju¿rez Typ=1 Len=12: 83,101,241,111,114,32,74,117,225,114,101,122 ¿When in the course of human events¿ Typ=1 Len=36: 147,87,104,101,110,32,105,110,32,116,104,101,3 2,99,111,117,114,115,101,32,111,102,32,104,117,109,97,110,32,101,118,101,110,116,115,148 2 rows selected. F1 DUMP(F1) ---------------------------------------- ------------------------------------------------------------ Se¿or Ju¿rez Typ=1 Len=12: 83,101,241,111,114,32,74,117,225,114,101,122 ¿When in the course of human events¿ Typ=1 Len=36: 147,87,104,101,110,32,105,110,32,116,104,101,3 2,99,111,117,114,115,101,32,111,102,32,104,117,109,97,110,32,101,118,101,110,116,115,148 2 rows selected. Record Char Position Char Number Character Se¿or Ju¿rez 3241 ñ Se¿or Ju¿rez 9225 á ¿When in the... 1147 “ ¿When in the... 36148 ”

119 Even easier!  DUMP has a second parameter SELECT DUMP ( f1, 17) FROM dan_test; DUMP(F1,17) --------------------------------------------------------------------------------------- Typ=1 Len=12: S,e,f1,o,r,,J,u,e1,r,e,z Typ=1 Len=36: 93,W,h,e,n,,i,n,,t,h,e,,c,o,u,r,s,e,,o,f,,h,u,m,a,n,,e,v,e,n,t,s,94 2 rows selected. DUMP(F1,17) --------------------------------------------------------------------------------------- Typ=1 Len=12: S,e,f1,o,r,,J,u,e1,r,e,z Typ=1 Len=36: 93,W,h,e,n,,i,n,,t,h,e,,c,o,u,r,s,e,,o,f,,h,u,m,a,n,,e,v,e,n,t,s,94 2 rows selected. 10 = Decimal (Default) 16 = Hexadecimal 17 = Printed Characters But, what about those codes we see? “f1”, “e1”, “93”, “94” Those are the hexadecimal representation for the code of the characters But, what about those codes we see? “f1”, “e1”, “93”, “94” Those are the hexadecimal representation for the code of the characters =HEX2DEC ( "f1") =HEX2DEC ( "e1") =HEX2DEC ( "93") =HEX2DEC ( "94") For a long time, I thought the easiest way to do this conversion was using Excel 241 225 147 148

120 DUMP function  Converting Hex to Decimal using Oracle SELECT TO_NUMBER ( 'f1', 'XX'), TO_NUMBER ( 'e1', 'XX'), TO_NUMBER ( '93', 'XX'), TO_NUMBER ( '94', 'XX') FROM DUAL; TO_NUMBER('F1','XX') TO_NUMBER('E1','XX') TO_NUMBER('93','XX') TO_NUMBER('94','XX') -------------------- -------------------- 241 225 147 148 1 row selected. TO_NUMBER('F1','XX') TO_NUMBER('E1','XX') TO_NUMBER('93','XX') TO_NUMBER('94','XX') -------------------- -------------------- 241 225 147 148 1 row selected. http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:7041297073738 First answered:Dec 2002 Last updated Oct 2004 http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:7041297073738 First answered:Dec 2002 Last updated Oct 2004

121 A Real-World Example SELECT job_comments_txt, LENGTH ( job_comments_txt ), COUNT(*) FROM hrp.assignment_dim_prod WHERE job_comments_txt LIKE 'CO only%' GROUP BY job_comments_txt; Notice that there are two groups of records that start with “CO only”

122 A Real-World Example SELECT job_comments_txt, LENGTH ( job_comments_txt ) len, ASCII ( SUBSTR (job_comments_txt, 1, 1)) pos1, ASCII ( SUBSTR (job_comments_txt, 2, 1)) pos2, ASCII ( SUBSTR (job_comments_txt, 3, 1)) pos3, ASCII ( SUBSTR (job_comments_txt, 4, 1)) pos4, ASCII ( SUBSTR (job_comments_txt, 5, 1)) pos5, ASCII ( SUBSTR (job_comments_txt, 6, 1)) pos6, ASCII ( SUBSTR (job_comments_txt, 7, 1)) pos7, ASCII ( SUBSTR (job_comments_txt, 8, 1)) pos8, COUNT(*) FROM hrp.assignment_dim_prod WHERE job_comments_txt LIKE 'CO only%' GROUP BY job_comments_txt; Eight substring statements to get the ASCII value for each character in each of the strings The extra character is a linefeed character at the end of the string

123 SELECT job_comments_txt, LENGTH ( job_comments_txt ) len, DUMP (job_comments_txt ), COUNT(*) FROM hrp.assignment_dim_prod WHERE job_comments_txt LIKE 'CO only%' GROUP BY job_comments_txt; Using the DUMP Function

124 MAKE YOUR QUERY RUN MORE SLOWLY

125 Different SELECT lists  These two queries will return the same records, but one of them results in a full scan SELECT cust_id FROM sh.customers WHERE cust_id NOT IN ( SELECT cust_id FROM sh.sales ) SELECT * FROM sh.customers WHERE cust_id NOT IN ( SELECT cust_id FROM sh.sales ) For the top query, everything needed can be satisfied by the index. There is no need to go to the table at all

126 Ignore datatypes

127 Are these two queries equal? SELECT * FROM oe.orders_dan WHERE order_status = 1; SELECT * FROM oe.orders_dan WHERE order_status = '1'; Accesses table by means of an index Full table scan Why?

128 Index Organization  For the Index to be effective, we have to use it as it exists  This includes calling functions against it What if we want to find entries LIKE ‘Ulster%’? What if we want to find entries LIKE ‘%School%’?

129 Will Oracle Always Use the Index?  Steve Catmull: The Index Tipping Point Yeats Like ‘T%’

130 The Index Tipping Point TABLE oe.orders 319 records account_mgr_id145111 records 14776 records 14858 records 14974 records SELECT * FROM oe.customers WHERE account_mgr_id = xxx;...WHERE account_mgr_id = 147;...WHERE account_mgr_id = 149; 74 records: uses index 74 records: uses index 76 records: Full table scan 76 records: Full table scan

131 Real-Life Example  Sometimes you inherit queries, or are asked to tune queries  Knowing what to look for for can help SELECT DISTINCT ef.job_cd FROM element_fact ef WHERE run_effective_dt > ( SELECT TO_CHAR ( TRUNC ( SYSDATE ) - 1200, 'DD-MON-YYYY') FROM dual ); What do you see? Use of subquery, where none is needed Conversion of date to VARCHAR Why are they looking back 1,200 days? What do you see? Use of subquery, where none is needed Conversion of date to VARCHAR Why are they looking back 1,200 days?

132 Ignore correlated columns

133 Correlated Columns SELECT * FROM hrp.date_dim_prod edd WHERE edd.yr_no BETWEEN 2001 AND EXTRACT ( YEAR FROM SYSDATE ); SELECT * FROM hrp.date_dim_prod edd WHERE edd.date_dt >= DATE '2001-01-01' AND edd.date_dt < ADD_MONTHS ( TRUNC ( SYSDATE, 'Y'), 12 ) ( 11 / 200 ) × 73049 = 4017 This calculation is based on a histogram

134 Correlated Columns  When two fields from the same table are referenced, the selectivity of each are used to calculate selectivity SELECT * FROM hrp.date_dim_prod edd WHERE edd.yr_no BETWEEN 2001 AND EXTRACT ( YEAR FROM SYSDATE) AND edd.date_dt >= DATE '2001-01-01' AND edd.date_dt < ADD_MONTHS ( TRUNC ( SYSDATE, 'Y'), 12 ); yr_no: 3962 / 73049 yr_no: 3962 / 73049 date_dt: 4018 / 73049 date_dt: 4018 / 73049 Cardinality calculation with both columns: ( 3962 / 73049 ) × ( 4018 / 73049 ) = selectivity ( 0.002983… ) Selectivity × 73049 = 217.926… Cardinality calculation with both columns: ( 3962 / 73049 ) × ( 4018 / 73049 ) = selectivity ( 0.002983… ) Selectivity × 73049 = 217.926… The solution is to create “extended stats” on the corallated columns The solution is to create “extended stats” on the corallated columns

135 Review 1. Review of Analytic Functions 2. Single Pass Queries  Don’t hit those big tables more than once 3. Spreadsheet Logic  Excel can do it, can SQL do it? 4. Trickery  Don’t be fooled  Things you can do to make your query run slower

136 SQL Magic!  Whenever you get in a fix  Reach into your bag of SQL tricks! Dan Stober (801) 442-3470 dan.stober@utoug.org

137 Copyright  All material contained herein is owned by Daniel Stober, the author of this presentation. This presentation and the queries, examples, and original sample data may be shared with others for educational purposes only in the following circumstances: 1. That the person or organization sharing the information is not compensated, or 2. If shared in a circumstance with compensation, that the author has granted written permission to use this work  Using examples from this presentation, in whole or in part, in another work without proper source attribution constitutes plagiarism.  Use of this work implies acceptance of these terms


Download ppt "Copyright  All material contained herein is owned by Daniel Stober, the author of this presentation. This presentation and the queries, examples, and."

Similar presentations


Ads by Google