Oracle8i Analytical SQL Features

Oracle8i Analytical SQL Features

Analytical SQL Features Overview
Available in Oracle and above Analytical Features/Enhancements GROUP BY Extensions SQL GROUP BY clause has been augmented to make querying and reporting easier Analytical SQL Functions Analytical functions enabling rankings, moving window calculations, lead/lag analysis CASE Expressions Increased/Efficient ‘if-then-else’ capabilities provided Willie Albino 11/28/2018 Willie Albino

GROUP BY Extensions

GROUP BY Extensions ROLLUP CUBE
Calculates subtotals at increasing levels of aggregation from the most detail to a grand total used to generate simple cross tabular reports CUBE Calculates all possible combinations of subtotals used to generate full cross tabular reports Willie Albino 11/28/2018

GROUP BY Extensions Syntax
ROLLUP: SELECT <column list> FROM <table…> GROUP BY ROLLUP(column_list); CUBE: GROUP BY CUBE(column_list); Willie Albino 11/28/2018 Willie Albino

Standard GROUP BY Example
select region, product, SUM(amount) from region, product, product_sales where region_id = reg_id and region = 'East' and product_id = prod_id group by region,product REGION PRODUCT SUM(AMOUNT) East Hats East Jackets East Pants East Shirts East Shoes East Suits East Sweaters East T-Shirts East Ties Willie Albino 11/28/2018

GROUP BY with ROLLUP select region, product, SUM(amount)
from region, product, product_sales where region_id = reg_id and region = 'East' and product_id = prod_id group by ROLLUP(region,product) REGION PRODUCT SUM(AMOUNT) East Hats East Jackets East Pants East Shirts East Shoes East Suits East Sweaters East T-Shirts East Ties East Willie Albino 11/28/2018 Willie Albino

GROUP BY with CUBE select region, product, SUM(amount)
from region, product, product_sales where region_id = reg_id and region = 'East' and product_id = prod_id group by CUBE(region,product) REGION PRODUCT SUM(AMOUNT) : : : East Ties East Hats Jackets Pants Shirts Shoes Suits Sweaters T-Shirts Ties Willie Albino 11/28/2018

Comments about ROLLUP/CUBE
ROLLUP: creates subtotals at n+1 levels where n equals the number of grouping columns CUBE: creates 2n combinations of subtotals where n equals the number of grouping columns Sub-total generation more efficient than equivalent SQL code (a 4 column CUBE grouping, has a 93.75% reduction in table access, ROLLUP has 80%) Willie Albino 11/28/2018

Comments about ROLLUP/CUBE
Partial rollups/cubes can be specified GROUP BY exp1, CUBE(exp2, exp3, ....) ROLLUP/CUBE can be used with all aggregating functions (MAX, MIN, AVG, etc.) HAVING clause applies to all the data returned NULLs are generated for dimensions at subtotal levels Willie Albino 11/28/2018

GROUPING() Function Used to distinguish between NULLs in data and NULLs generated by ROLLUP/CUBE extensions GROUPING() return values: 1 for extension-generated NULLs, 0 for NULL data values Can be passed to DECODE for custom interpretation SYNTAX: SELECT .. GROUPING(column name) .. GROUP BY .. SELECT .. DECODE(GROUPING(col), 1, ‘Sub’, col)) … Willie Albino 11/28/2018

GROUPING() Function Example
select DECODE(GROUPING(region), 1, 'All Regions', 0, region) region, DECODE(GROUPING(product), 1, 'All Products', 0, product) product, SUM(amount) from region, product, product_sales where region_id = reg_id and region = 'East' and product_id = prod_id group by CUBE(region,product) REGION PRODUCT SUM(AMOUNT) : : : East Ties East All Products All Regions Hats All Regions Jackets All Regions Pants All Regions Shirts All Regions Shoes All Regions Suits All Regions Sweaters All Regions T-Shirts All Regions Ties All Regions All Products Willie Albino 11/28/2018

Functions

Analytical SQL Functions
Analytical Function Categories: Ranking Functions Windowing Functions Reporting Functions Lag/Lead Functions Statistics Functions Functions are applied after all joins, WHERE, GROUP BY and HAVING clauses are performed, but before the ORDER BY clause is applied Willie Albino 11/28/2018

Basic Analytical Function Syntax
<function_name>() OVER ( [PARTITION BY <exp1> [, …]] ORDER BY <exp2> [ASC|DESC] [NULLS FIRST|NULLS LAST] ) Example: SELECT RANK(amount) OVER (PARTITION BY region ORDER BY amount) FROM REG_SALES; Willie Albino 11/28/2018

Function Syntax Comments
PARTITION BY <exp> [, …] - this clause divides the query result into groups within which the analytical function operates If the PARTITION BY clause is missing, the function operates over the entire dataset <exp> can be any valid expression involving column references Willie Albino 11/28/2018

ORDER BY <exp>[ASC|DESC] [NULLS FIRST|NULLS LAST] specifies how the data is ordered within a group (partition) ASC|DESC specifies the sorting order for the grouping. The default sorting order is ASC. The presence of ORDER BY affects the outcome of analytical functions With ORDER BY, the set of rows used is the current row and all preceding rows in the partition (a growing window) Without ORDER BY, all the rows in the partition will be used The ORDER BY clause can be used to resolve ties between repeated values in a set. Willie Albino 11/28/2018

ORDER BY <exp>[ASC|DESC] [NULLS FIRST|NULLS LAST] The NULLS FIRST|NULLS LAST clause determines the position of NULLs in the ordered sequence. If omitted, the position depends on the ASC, DESC arguments. NULLs are considered to be larger than any other values. It is not guaranteed that the data will be sorted on the measures. Use the ORDER BY clause to specify the ordering sequence. Willie Albino 11/28/2018

Ranking Functions Computes the rank of a record with respect to other records in the dataset based on the values of a set of measures Ranking Functions: RANK() and DENSE_RANK() CUME_DIST() and PERCENT_RANK() NTILE() ROW_NUMBER() Willie Albino 11/28/2018

RANK() and DENSE_RANK() Functions
The RANK() and DENSE_RANK() functions allow you to rank items in a dataset or sub-group. The RANK() function leaves gaps in the ranking sequence when there are ties in the rankings. The DENSE_RANK() function does not leave gaps in the ranking sequence when there are ties in the rankings. Willie Albino 11/28/2018

Example of RANK()/DENSE_RANK()
select amount, RANK() OVER (ORDER BY amount) AS rank_asc, DENSE_RANK() OVER AS dense_rank from product_sales, product, region where prod_id=product_id and region_id=reg_id and region='East' order by amount; AMOUNT RANK_ASC DENSE_RANK NOTES: - Ranking value will repeat (leaving gaps) when the same data values are found in the dataset - The order or the rows with repeated values is non-deterministic - DENSE_RANK() does not leave gaps in the rank values for repeated data values (RANK() does) - The largest rank value produced by DENSE_RANK equals the number of distinct values in the dataset Willie Albino 11/28/2018

Using RANK() For Top-N Values List
select * from ( select region, product,SUM(amount) amt, SUM(profit) profit, RANK() OVER (PARTITION BY region ORDER BY SUM(amount) DESC) AS rank from product_sales, product, region where prod_id=product_id and region_id=reg_id GROUP BY region, product ) where rank_sum_amt < 4 Region Product Amt Profit Rank Central Sweaters Central Shirts Central Pants East Shoes East Jackets East Pants West Shoes West Jackets West T-Shirts NOTES: - Using RANK() or DENSE_RANK(), you can get the top N ranks within a dataset - DENSE_RANK() could yield different results than RANK() depending on repeated values within the dataset. (The function ROW_NUMBER() would provide more accurate results.) - The bottom N rankings can be generated by changing the ordering sequence within the rank expression (e.g., ORDER BY SUM(amount) ASC)) Willie Albino 11/28/2018

CUME_DIST() and PERCENT_RANK()
CUME_DIST() computes the position of a specified value relative to the set of values (also known as inverse of percentile in statistics books) CD = (# values different from or equal to x)/(total # of values) Return values are between 0 and 1 PERCENT_RANK() returns the percent rank of a value relative to a group of values PR = (rank of row in partition - 1)/(# of rows in the partition - 1) Willie Albino 11/28/2018

Example of CUME_DIST() & PERCENT_RANK()
select region, product, SUM(amount), CUME_DIST() OVER (PARTITION BY region ORDER BY SUM(amount) ASC) AS cume_dist, PERCENT_RANK() OVER (PARTITION BY region ORDER BY SUM(amount) ASC) AS pct_rnk from product_sales, product, region where prod_id=product_id and region_id=reg_id GROUP BY region, product Region Product Amt Cume_Dist Pct_Rnk Central Belts Central Suits Central Ties Central Hats Central Pants Central Shirts Central Sweaters East T-Shirts East Ties East Hats East Shirts East Sweaters East Suits East Jackets East Pants East Shoes : : : : : Willie Albino 11/28/2018

NTILE(n) and ROW_NUMBER()
Divides dataset into a specified number of buckets Takes the number of buckets as an argument ROW_NUMBER() assigns a unique number to each row within a partition row numbers start with 1 and increase sequentially within each partition better than RANK() or DENSE_RANK() for top-N queries Rows with rankings that are ties will not necessarily be assigned to the same bucket (if they span buckets) or the same row number in subsequent runs of the query using the same dataset Willie Albino 11/28/2018

Example of NTILE() & ROW_NUMBER()
select region, product, SUM(amount), NTILE(3) OVER (PARTITION BY region ORDER BY SUM(amount) ASC) AS bucket, ROW_NUMBER() OVER (PARTITION BY region ORDER BY SUM(amount) ASC) AS row from product_sales, product, region where prod_id=product_id and region_id=reg_id GROUP BY region, product Region Product Amt Bucket Row Central Belts Central Suits Central Ties Central Hats Central Pants Central Shirts Central Sweaters East T-Shirts East Ties East Hats East Shirts East Sweaters East Suits East Jackets East Pants East Shoes Willie Albino 11/28/2018

Windowing Functions

Windowing Functions Used to compute cumulative, moving or centered aggregates Returns a value for each row in a dataset which depends on other rows in the corresponding window Windowing functions include moving sum, moving average, moving min/max, cumulative sum and statistical functions, first and last value in window Willie Albino 11/28/2018

Windowing Function Syntax
{SUM|AVG|MAX|MIN|COUNT|FIRST_VALUE|LAST_VALUE} OVER ( PARTITION BY <exp1> [, …] ORDER BY <exp2> [ASC|DESC] [NULLS FIRST|NULLS LAST] ROWS | RANGE { UNBOUNDED PRECEDING | <exp3> PRECEDING} | BETWEEN {UNBOUNDED PRECEDING | <exp4> PRECEDING} AND {CURRENT ROW | <exp5> FOLLOWING} } ) Willie Albino 11/28/2018

Windowing Function Syntax Comments
<exp> Must be a constant or an expression which evaluates to a positive value If ROWS was specified, it’s a physical offset which represents number of rows in the window If RANGE was specified, it’s a logical offset (value or interval literal) An interval literal is specified as follows: RANGE INTERVAL n DAYS|MONTHS|YEARS RANGE x PRECEDING|FOLLOWING Willie Albino 11/28/2018

ROWS | RANGE ROWS specifies the window in physical units RANGE specifies the window as a logical offset BETWEEN … AND … Specifies the start and end point of the window If BETWEEN is omitted and an end point is specified, that point will be considered the start point and the current row will be used as the end point Willie Albino 11/28/2018

UNBOUNDED PRECEDING Specifies that the window starts at the first row of the partition, (or the start of the dataset, if the PARTITION BY clause is omitted) UNBOUNDED FOLLOWING Specifies that the window ends at the last row of the partition, (or the last row of the dataset, if the PARTITION BY clause is omitted) Willie Albino 11/28/2018

CURRENT_ROW As a start point: If ROWS was specified, makes the current row the start of the window. If RANGE was specified, then the current value is the start of the window As an end point: If ROWS was specified, makes the current row the end of the window. If RANGE was specified, then the current value is the end of the window Willie Albino 11/28/2018

<exp> FOLLOWING If this is the start point, then the end point must be <exp> FOLLOWING or UNBOUNDED FOLLOWING <exp> PRECEDING If this is the end point, then the start point must be <exp> PRECEDING or UNBOUNDED PRECEDING This applies whether ROWS or RANGE was specified Willie Albino 11/28/2018

Example of a Partition (Sub-Grouping) Based Moving Window
QUERY: Calculate a running total of the amount of sales by region select region, amount, SUM(amount) OVER (PARTITION BY region ORDER BY amount ROWS UNBOUNDED PRECEDING) as mov_amt_sum from product_sales, product, region where prod_id=product_id and region_id=reg_id REGION AMOUNT MOV_AMT_SUM Central Central Central Central Central Central Central East East East East East East East East East Willie Albino 11/28/2018

Example of Date Based Moving Window Summaries
select cust_id, trans_dt, amt, sum(amt) over (partition by cust_id order by trans_dt range interval '1' month preceding) sum_1_mnth, range between interval '1' month preceding and interval '1' month following) sum_2_mnth, sum(amt) over (partition by cust_id order by trans_dt range between interval '7' DAY preceding and interval '7' DAY following) sum_wk from cust_daily_summary order by cust_id, trans_dt cust_id Trans_dt Amt Sum_1_mth Sum_2_mth Sum_wk /15/ /15/ /11/ /12/ /13/ /14/ /15/ /11/ /12/ /13/ /14/ /15/ /11/ /12/ /13/ /14/ /15/ Willie Albino 11/28/2018

Reporting Functions

Reporting Functions Allow for the calculation of aggregate values within a data partition Return the same aggregate value for every row in a partition Syntax: {SUM | AVG | MAX | MIN | COUNT | STDDEV | VARIANCE} ([ALL | DISTINCT] {<value expression1> | *}) OVER ([PARTITION BY <value expression2>[,...]]) Willie Albino 11/28/2018

Example of Using Reporting Functions
Query: Find the region where each product was best seller SELECT product, region, sum_amt FROM (SELECT product, region, SUM(amount) AS sum_amt, MAX(SUM(amount)) OVER (PARTITION BY product) AS max_sum_amt FROM product_sales, region, product WHERE region_id=reg_id AND product_id=prod_id GROUP BY product, region) WHERE sum_amt = max_sum_amt REGION REGION SUM_AMT BELTS Central HATS Central JACKETS East JEANS West PANTS Central SHIRTS Central SHOES East SOCKS West SUITS Central SWEATERS Central T-SHIRTS West TIES Central Willie Albino 11/28/2018

New Reporting Functions
RATIO_TO_REPORT(exp) Computes the ratio of a value to the sum of a set of values LEAD() and LAG() Useful for comparing values in different time period Allows access to more than one row in a table without a self-join LAG() provides access to a prior row (at a given offset) LEAD() provides access to a row after the current position These functions are position, not value based Willie Albino 11/28/2018

New Reporting Function Syntax
RATIO_TO_REPORT(<exp1>) OVER ([PARTITION BY <exp2> [,…]]) LEAD | LAG (<exp1> [,<offset> [, <default>]]) OVER ORDER BY <exp3> [ASC|DESC] [NULLS FIRST | NULLS LAST] [,…]) <offset> is optional and defaults to 1 <default> is optional and is the value returned if the <offset> falls outside the bounds of the dataset Willie Albino 11/28/2018

Example of RATIO_TO_REPORT()
Query: Find ratio of total sales per product to total sales SELECT product, SUM(amount) AS sum_amt, SUM(SUM(amount)) OVER() AS total_amt, RATIO_TO_REPORT(SUM(amount)) OVER () AS ratio FROM product_sales, product, region WHERE prod_id = product_id AND region='East' GROUP BY product PRODUCT SUM_AMT TOTAL_AMT RATIO Belts Hats Jackets Jeans Pants Shirts Shoes Socks Suits Sweaters T-Shirts Ties Willie Albino 11/28/2018

Example of LAG Function
Query: Compare cust’s present amount to the amount 2 days ago SELECT cust_id, acct_date, sum(amt_usd) amount, LAG(SUM(amt_usd),2,-999) OVER (PARTITION BY cust_id ORDER BY acct_trans_date ) AS old_amt FROM acct WHERE acct_date > '01-NOV-00' GROUP BY cust_id, acct_date cust_ID ACCT_DATE AMOUNT OLD_AMT /11/ /12/ /13/ /14/ /15/ /11/ /12/ /13/ /14/ /15/ /11/ /12/ /13/ Willie Albino 11/28/2018

CASE Expressions

CASE Expressions Used for bucketing data
allows for differently sized buckets Very similar to DECODE statement Provides more flexibility and logical power Offers better performance and is easier to read Syntax: CASE WHEN <cond1> THEN <v1> WHEN <cond2> THEN <v2> … [ ELSE <vn> ] END Willie Albino 11/28/2018

Example #1 of Using CASE Expressions
SELECT SUM(CASE WHEN SUM(amount) BETWEEN 0 AND 50 THEN 1 ELSE 0 END) AS "0-50", BETWEEN 51 AND 150 THEN 1 ELSE 0 END) AS "51-150", SUM(CASE WHEN SUM(amount) BETWEEN 151 AND 250 THEN 1 ELSE 0 END) AS " ", SUM(CASE WHEN SUM(amount) > 251 THEN 1 ELSE 0 END) "251+" FROM product_sales, product, region WHERE prod_id = product_id AND region='East' GROUP BY product Willie Albino 11/28/2018

Example #2 of Using CASE Expressions
SELECT CASE WHEN amount BETWEEN 0 AND 50 THEN ' 0-50' WHEN amount BETWEEN 51 AND 150 THEN ' ' WHEN amount BETWEEN 151 AND 250 THEN ' ' WHEN amount > 250 THEN '251+' END bucket, COUNT(*) cnt, SUM(amount) amt FROM product_sales, product, region WHERE prod_id = product_id AND region='East' GROUP BY CASE WHEN amount BETWEEN 0 AND 50 THEN ’ 0-50' WHEN amount BETWEEN 51 AND 150 THEN ’ ' WHEN amount BETWEEN 151 AND 250 THEN ' ' WHEN amount > 250 THEN '251+' END ORDER BY bucket BUCKET CNT AMT Willie Albino 11/28/2018

Summary New analytical functionality in Oracle 8.1.6 (+)
Makes it easier to code certain types of SQL Allows for more efficient SQL code when compared to the equivalent pure SQL implementation Enhancements include SQL GROUP BY clause has been augmented to make querying and reporting easier Analytical functions enabling rankings, moving window calculations, lead/lag analysis Better ‘if-then-else’ capabilities provided through CASE Willie Albino 11/28/2018

Oracle8i Analytical SQL Features

Similar presentations

Presentation on theme: "Oracle8i Analytical SQL Features"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Oracle8i Analytical SQL Features

Similar presentations

Presentation on theme: "Oracle8i Analytical SQL Features"— Presentation transcript:

Similar presentations

About project

Feedback