Download presentation
Presentation is loading. Please wait.
Published byEric Stone Modified over 7 years ago
1
Analyze This! Leveraging Oracle 12c Analytic Functions
Jim Czuprynski Strategic Solutions Architect December 7, 2016
2
My Credentials 35+ years of database-centric IT experience
Oracle DBA since 2001 Oracle 10g, 11g, 12c OCP Oracle ACE Director 100+ articles on databasejournal.com and ioug.org Regular speaker at Oracle OpenWorld, IOUG COLLABORATE, and OTN ACE Tours Oracle-centric blog (Generally, It Depends)
3
Who We Are Solution Integrator focused on
Technology Solutions Comprehensive Service Offerings Extensive reach — U.S., Canada, and the U.K. $720M annual revenue with over $260M in Services Industry Certifications across a broad selection of best-in-class IT manufacturers and technologies Significant investment in technical staff and facilities to support our clients’ projects
4
Limiting Results: OFFSET and FETCH
5
Limiting Rows with OFFSET and FETCH
Previous to 12cR1, it was necessary to leverage in-line views to isolate Top-N query results Example: Show the first five people earning the highest salary Even more difficult: Finding the second set of values! Example: Show the next five (e.g. sixth through tenth) people earning the highest salary Starting with 12cR1, the new FETCH syntax makes short work of these types of requests: {OFFSET [n] ROWS} FETCH [FIRST | NEXT ] n [PERCENT] ROWS {ONLY | WITH TIES}
6
Top 10 Invoices With a Pending Balance
Top N Query: Before 12c Show just the first ten invoices with largest balances: SQL> SELECT cust_last_name, invoice_id, balance_due FROM ( SELECT C.cust_last_name ,I.invoice_id ,I.balance_due FROM sh.customers C ,ap.invoices I WHERE C.cust_id = I.customer_id AND I.balance_due > 0 ORDER BY balance_due DESC ) WHERE ROWNUM <= 10; Top 10 Invoices With a Pending Balance Customer Name Invoice # Balance Due Grey Tien Stengard Stengard Stengard Valentino Bartlett Chin Chin Stengard
7
Leveraging FETCH for Top N Queries
Show just the first ten invoices with largest balances: SQL> SELECT C.cust_last_name ,I.invoice_id ,I.balance_due FROM sh.customers C ,ap.invoices I WHERE C.cust_id = I.customer_id AND I.balance_due > 0 ORDER BY balance_due DESC FETCH FIRST 10 ROWS ONLY; Top 10 Invoices With a Pending Balance Customer Name Invoice # Balance Due Grey Tien Stengard Stengard Stengard Valentino Bartlett Chin Chin Stengard
8
Leveraging OFFSET and FETCH for Top N Queries
Show the next 1.25% of all entries … but exclude the first 5! The first five invoices are not shown … SQL> SELECT C.cust_last_name ,I.invoice_id ,I.balance_due FROM sh.customers C ,ap.invoices I WHERE C.cust_id = I.customer_id AND I.balance_due > 0 ORDER BY balance_due DESC OFFSET 5 ROWS FETCH NEXT 1.25 PERCENT ROWS ONLY; Second Group of Invoices With Largest Balance Customer Name Invoice # Balance Due Valentino Bartlett Chin Chin Stengard . . . Rice Chin Chin Chin Chin 250 rows selected. … while the next 1.25% of remaining 19,995 rows are included!
9
When Close Enough Is Good Enough: APPROX_COUNT_DISTINCT
10
When Close Enough Is Good Enough
Starting in 12cR1, estimated results can be produced with reasonable accuracy Useful for returning results whenever less than 100% of data is needed Graphic distributions (histograms or pie charts) Dashboards (KPIs) Once trend is identified, application user may decide to drill deeper Business Intelligence reporting Chi-square Relevance reporting
11
APPROX_COUNT_DISTINCT: A Simple Example
Comparison of actual vs. approximate counts of DISTINCT values: SELECT prod_id ,COUNT (DISTINCT(cust_id)) true_count ,APPROX_COUNT_DISTINCT(cust_id) appx_count ,((COUNT (DISTINCT(cust_id)) – APPROX_COUNT_DISTINCT(cust_id)) / COUNT(DISTINCT(cust_id)) var_pct FROM sh.sales GROUP BY prod_id ORDER BY prod_id; Demonstration of APPROX_COUNT_DISTINCT: DISTINCT Customer Counts w/in Product (From SH.SALES) Approx Count PROD_ID Actual Count Distinct Variance , , , , , , , , , , . . . , , , , , , , , , , Plan Hash Value : | Id | Operation | Name | Rows | Bytes | Cost | Time | | 0 | SELECT STATEMENT | | | | 536 | 00:00:01 | | 1 | SORT GROUP BY APPROX | | | | 536 | 00:00:01 | | 2 | PARTITION RANGE ALL | | | | 514 | 00:00:01 | | 3 | TABLE ACCESS FULL | SALES | | | 514 | 00:00:01 |
12
PIVOTing for Fun and Profit
13
PIVOT and UNPIVOT Functions
PIVOT is just like PIVOTTABLE function in Microsoft Excel spreadsheets PIVOT should really be called what it is: CROSSTAB Leverages standard Oracle aggregate functions Efficient way to obtain totals, counts, averages, minimums, and maximums Row and column totalling also available Also possible to UNPIVOT an already-pivoted row source Useful for turning a denormalized table back into row and column format
14
PIVOT Function: Setup Create view SH.VW_SALES from 4 dimensions and fact table: SQL> CREATE OR REPLACE VIEW sh.vw_sales AS SELECT P.prod_name product ,N.country_name country ,S.channel_id channel ,SUBSTR(T.calendar_quarter_desc, 6,2) quarter ,SUM(S.amount_sold) amount_sold ,SUM(S.quantity_sold) quantity_sold FROM sh.sales S, sh.times T ,sh.customers C, sh.countries N ,sh.products P WHERE S.time_id = T.time_id AND S.prod_id = P.prod_id AND S.cust_id = C.cust_id AND C.country_id = N.country_id GROUP BY P.prod_name ,N.country_name ,S.channel_id ,SUBSTR(T.calendar_quarter_desc, 6, 2);
15
PIVOT Function: An Example
Create PIVOTed output for Channels within Products: SQL> SELECT * FROM ( SELECT product ,channel ,amount_sold FROM sh.vw_sales ) S PIVOT (SUM(amount_sold) FOR channel IN ( 3 AS DIRECT_SALES ,4 AS INTERNET_SALES ,9 AS TELESALES) ) ORDER BY product; Pivot Table Example (from SH.VW_SALES) Direct Internet Tele Product Name Sales Sales Sales 1.44MB External 3.5" Diskette , ,167.94 128MB Memory Card , ,044.53 17" LCD w/built-in HDTV Tuner ,442, ,056,793.79 18" Flat Panel Graphics Monitor ,017, ,148, ,297.73 256MB Memory Card , ,203.21 3 1/2" Bulk diskettes, Box of , ,518.02 . . . SIMM- 16MB PCMCIAII card ,787, , ,697.82 SIMM- 8MB PCMCIAII card ,546, ,999.42 Smash up Boxing , ,858.84 Standard Mouse , , ,172.75 Unix/Windows 1-user pack ,999, ,071.62 Xtend Memory , ,553.93 Y Box ,081, , 71 rows selected.
16
UNPIVOT: Setup Create PIVOTed table SH.PT_SALES to demonstrate UNPIVOT: SQL> CREATE TABLE sh.pt_sales AS SELECT * FROM (SELECT product ,quarter ,quantity_sold ,amount_sold FROM sh.vw_sales) PIVOT ( SUM(quantity_sold) AS sumq ,SUM(amount_sold) AS suma FOR quarter IN ( '01' AS Q1, '02' AS Q2, '03' AS Q3, '04' AS Q4)); Product Name Q1_SUMQ Q1_SUMA Q4_SUMQ Q4_SUMA 1.44MB External 3.5" Diskette 128MB Memory Card 17" LCD w/built-in HDTV Tuner 18" Flat Panel Graphics Monitor 256MB Memory Card 3 1/2" Bulk diskettes, Box of . . . SIMM- 8MB PCMCIAII card Smash up Boxing Standard Mouse Unix/Windows 1-user pack Xtend Memory Y Box
17
UNPIVOT Function: An Example
UNPIVOTed output “renormalizes” a denormalized row source: SQL> SELECT product ,DECODE(quarter, 'Q1_SUMQ', 'Q1' ,'Q2_SUMQ', 'Q2' ,'Q3_SUMQ', 'Q3' ,'Q4_SUMQ', 'Q4') AS quarter ,quantity_sold FROM sh.pt_sales UNPIVOT INCLUDE NULLS (quantity_sold FOR quarter IN (Q1_SUMQ, Q2_SUMQ ,Q3_SUMQ, Q4_SUMQ) ) ORDER BY product, quarter; UNPIVOTing Demonstration (Source: SH.PT_SALES) Product Name Qtr Quantity 1.44MB External 3.5" Diskette Q ,098.00 1.44MB External 3.5" Diskette Q ,112.00 1.44MB External 3.5" Diskette Q ,050.00 1.44MB External 3.5" Diskette Q ,848.00 128MB Memory Card Q ,963.00 128MB Memory Card Q ,361.00 128MB Memory Card Q ,069.00 128MB Memory Card Q ,832.00 . . . Xtend Memory Q ,146.00 Xtend Memory Q ,121.00 Xtend Memory Q ,122.00 Xtend Memory Q ,802.00 Y Box Q ,455.00 Y Box Q ,766.00 Y Box Q ,716.00 Y Box Q ,992.00
18
Searching Intelligently for Patterns: MATCH_RECOGNIZE
19
MATCH_RECOGNIZE: Intelligent Pattern Matching
Leverages powerful pattern matching techniques Patterns are described with regular expressions As simple or as complex as analyses require Useful for: Identifying events based on established algorithms “V” or “W” stock price fluctuations “Sessionization” of multiple user events Often-redialed telephone numbers Dropped connections within online transactions Detecting fraudulent behavior Suspicious account access patterns
20
Scenario: Identifying Fraudulent Activity
A major consulting company has reported suspected fraudulent activity among its recently-hired consultants: Candidates presented valid certification credentials at interviews Certifications were used to screen and ultimately hire Oracle DBA FTE consultants However, something is desperately wrong with the hiring process! New DBAs are making serious errors at client sites, causing unnecessary application downtime … and even loss of production data Deeper technical interviews reveal that candidates may have deliberately exaggerated experience levels Fraud is strongly suspected for specific certification tests: Oracle Certified Associate (OCA) Oracle Certified Professional (OCP) Exadata Certified Installers
21
Identifying Suspicious Behavior: Raw Data
Tables HR.CERT_* capture student information and historical exam test scores for valuable Oracle certifications proctored at various sites around the world: Raw Test Data (from HR.CERT_*) Stu Stu Exam Home Exam Exam ID # Cty Cty TZR TZR Exam Date Score 8046 US IN -05:00 +08: : 8046 US IN -05:00 +08: : 8046 US IN -05:00 +08: : 8374 US UK -06:00 +00: : 8374 US UK -06:00 +00: : 8374 US UK -06:00 +00: : 8374 US SA -06:00 +02: : 8374 US SA -06:00 +02: : 8619 US IN -08:00 +08: : 8619 US IN -08:00 +08: : 8619 US IN -08:00 +08: : 8619 US IN -08:00 +08: : 8632 US IN -05:00 +08: : 8632 US IN -05:00 +08: : 8632 US ML -05:00 +08: : 8632 US ML -05:00 +08: : 8632 US ML -05:00 +08: : 9910 US IN -05:00 +08: : 9910 US IN -05:00 +08: : 9910 US IN -05:00 +08: : 9910 US IN -05:00 +08: : SQL> SELECT CC.stu_id stu_id ,TR.exam_id exam_id ,CC.stu_name stu_name ,CC.stu_home_tz stu_tzn ,CC.stu_home_country_id stu_cty ,TL.loc_country test_location ,TO_CHAR(TR.test_dtm, 'yyyy-mm-dd hh24:mi') test_dtc ,TO_DATE(TO_CHAR(TR.test_dtm, 'yyyy-mm-dd hh24:mi'),'yyyy-mm-dd hh24:mi') test_dte ,TO_CHAR(TR.test_dtm, 'TZH:TZM') test_tzr ,TR.test_score test_score FROM hr.cert_candidates CC ,hr.cert_test_results TR ,hr.cert_exam_locations TL WHERE TR.stu_id = CC.stu_id AND TR.loc_id = TL.loc_id ORDER BY 1,2,7; How could someone take these tests half a continent away … in just one day? Is it reasonable that someone could have taken these particularly difficult tests within just three days?
22
MATCH_RECOGNIZE: Syntax
MATCH_RECOGNIZE analyzes data within specified groups of data to look for patterns and, if patterns are detected, return desired results: SQL> SELECT * FROM <row source> MATCH_RECOGNIZE ( PARTITION BY <grouping column> ORDER BY <ordering_column> MEASURES <measure1, measure2 … ,measuren> ,MATCH_NUMBER() as mtc# ,CLASSIFIER() as cls$ ONE ROW PER MATCH AFTER MATCH SKIP TO NEXT ROW PATTERN ( A+ B+ ) DEFINE A AS (Boolean condition) ,B AS (Boolean condition) , . . . ); IMCS: Session-Level Execution Statistics (from V$MYSTAT) Statistic Name Value IM repopulate CUs requested IM scan CUs columns accessed IM scan CUs columns theoretical max IM scan CUs memcompress for query low . . . IM space private journal bytes allocated IM space private journal bytes freed IM space private journal extents allocated IM space private journal extents freed IM space private journal segments allocated IM space private journal segments freed IM transactions IM transactions rows invalidated IM transactions rows journaled IMU Flushes IMU Redo allocation size IMU undo allocation size session logical reads - IM table scan disk non-IMC rows gotten table scans (IM) Row Source Capture Measurements Rules for Handling Matches Patterns and Evaluation Order
23
Identifying Suspicious Behavior: MATCH_RECOGNIZE
Applying the appropriate MATCH_RECOGNIZE logic against source row sets makes short work of finding suspicious data: SQL> SELECT * FROM ( SELECT TR.stu_id stu_id ,CC.stu_name stu_name ,CC.stu_home_tz home_tzr ,CC.stu_home_country_id home_cty ,TR.exam_id exam_id ,TL.loc_country test_location ,TO_DATE(TO_CHAR(TR.test_dtm, 'yyyy-mm-dd hh24:mi') ,'yyyy-mm-dd hh24:mi') test_dte ,TO_CHAR(TR.test_dtm, 'TZH:TZM') test_tzr ,TR.test_score test_score FROM hr.cert_candidates CC ,hr.cert_test_results TR ,hr.cert_exam_locations TL WHERE TR.stu_id = CC.stu_id AND TR.loc_id = TL.loc_id ORDER BY 1,2,7 ) ... Home Exam Stdnt Exam Home Exam Time Time Exam ID # ID # Cty Cty Zone Zone Exam Date Score 8046 IZ US IN -05:00 +08: : 8046 IZ US IN -05:00 +08: : 8374 IZ US UK -06:00 +00: : 8374 IZ US UK -06:00 +00: : 8374 IZ US UK -06:00 +00: : 8374 IZ US SA -06:00 +02: : 8619 IZ US IN -08:00 +08: : 8619 IZ US IN -08:00 +08: : 8619 IZ US IN -08:00 +08: : 8632 IZ US IN -05:00 +08: : 8632 IZ US IN -05:00 +08: : 8632 IZ US ML -05:00 +08: : 8632 IZ US ML -05:00 +08: : 9910 IZ US IN -05:00 +08: : 9910 IZ US IN -05:00 +08: : 9910 IZ US IN -05:00 +08: : . . . MATCH_RECOGNIZE ( PARTITION BY stu_id ORDER BY test_dte MEASURES DTL.exam_id AS exam_id ,DTL.home_cty AS home_cty ,DTL.test_location AS exam_cty ,DTL.home_tzr AS home_tzr ,DTL.test_tzr AS exam_tzr ,DTL.test_dte AS test_dte ,DTL.test_score AS test_score ONE ROW PER MATCH AFTER MATCH SKIP TO NEXT ROW PATTERN (DTL LOCVAR+ DTEVAR*) DEFINE DTL AS (1=1) ,LOCVAR AS (LOCVAR.home_cty <> LOCVAR.test_location) ,DTEVAR AS ( (DTEVAR.test_dte - PREV(DTEVAR.test_dte)) <= 1 );
24
Q & A
25
Thank You For Your Attention!
If you have any questions or comments, feel free to: me at Follow my blog (Generally, It Depends): Follow me on Twitter Connect with me on LinkedIn (Jim Czuprynski)
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.