Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database 12c Row Pattern Matching Beating the Best Pre-12c Solutions [CON3450] Stew ASHTON Oracle OpenWorld 2014.

Similar presentations


Presentation on theme: "Database 12c Row Pattern Matching Beating the Best Pre-12c Solutions [CON3450] Stew ASHTON Oracle OpenWorld 2014."— Presentation transcript:

1 Database 12c Row Pattern Matching Beating the Best Pre-12c Solutions [CON3450] Stew ASHTON Oracle OpenWorld 2014

2 Photo Opportunity Presentation available on http://www.slideshare.net/stewashton/row-patternmatching12coow14 For exact link: – See @StewAshton on Twitter – Or see http://stewashton.wordpress.comhttp://stewashton.wordpress.com 2

3 Agenda Who am I? Pre-12c solutions compared to row pattern matching with MATCH_RECOGNIZE – For all sizes of data – Thinking in patterns Watch out for “catastrophic backtracking” Other things to keep in mind (time permitting) OOW CON3450, Stew Ashton 3

4 Who am I? 33 years in IT – Developer, Technical Sales Engineer, Technical Architect – Aeronautics, IBM, Finance – Mainframe, client-server, Web apps 25 years as an American in Paris 9 years using Oracle database – Performance analysis – Replace Java with SQL 2 years as internal “Oracle Development Expert” OOW CON3450, Stew Ashton 4

5 1) “Fixed Difference” OOW CON3450, Stew Ashton 5 PAGE 1 2 3 5 6 7 10 11 12 42

6 1) Pre-12c OOW CON3450, Stew Ashton 6 PAGE[RN]GRP_ID 110 220 330 541 651 761 1073 1183 1293 421032 PAGE[RN]GRP_ID 110 220 330 541 651 761 1073 1183 1293 421032 FIRSTPAGELASTPAGECNT 133 573 10123 42 1

7 Think “match a row pattern” PATTERN – Uninterrupted series of input rows – Described as a list of conditions (“regular expressions”) PATTERN (A B*) "A" : 1 row, "B" : 0 or more rows, as many as possible DEFINE each row condition [A undefined = TRUE] B AS page = PREV(page)+1 Each series that matches the pattern is a “match” – "A" and "B" identify the rows that meet their conditions OOW CON3450, Stew Ashton 7

8 Input, Processing, Output 1.Define input 2.Order input 3.Process pattern 4.using defined conditions 5.Output: rows per match 6.Output: columns per row 7.Go where after match? OOW CON3450, Stew Ashton 8 SELECT * FROM t MATCH_RECOGNIZE ( ORDER BY page PATTERN (A B*) DEFINE B AS page = PREV(page)+1 ONE ROW PER MATCH MEASURES A.page firstpage, LAST(page) lastpage, COUNT(*) cnt AFTER MATCH SKIP PAST LAST ROW ); SELECT * FROM t MATCH_RECOGNIZE ( ORDER BY page MEASURES A.page firstpage, LAST(page) lastpage, COUNT(*) cnt ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B*) DEFINE B AS page = PREV(page)+1 );

9 1) Run_Stats comparison OOW CON3450, Stew Ashton 9 For one million rows: “Latches” are serialization devices: fewer means more scalable StatPre 12cMatch_RPct Latches40904079100% Elapsed Time5.515.56101% CPU used by this session5.55.55101%

10 Id Operation Name Starts E-Rows A-Rows A-Time Buffers OMem 1Mem Used-Mem 0 SELECT STATEMENT 1 400K 00:00:01.831594 1 HASH GROUP BY 1 1000K 400K 00:00:01.831594 41M 5035K 40M (0) 2 VIEW 1 1000K 00:00:12.691594 3 WINDOW SORT 1 1000K 00:00:03.461594 22M 1749K 20M (0) 4 TABLE ACCESS FULL T1 1000K 00:00:02.531594 Id Operation Name Starts E-Rows A-Rows A-Time Buffers OMem 1Mem Used-Mem 0 SELECT STATEMENT 1 400K 00:00:03.451594 1 VIEW 1 1000K 400K 00:00:03.451594 2 MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO 1 1000K 400K 00:00:01.871594 22M 1749K 20M (0) 3 TABLE ACCESS FULL T1 1000K 00:00:02.091594 1) Execution Plans OOW CON3450, Stew Ashton 10 Operation Used-Mem SELECT STATEMENT HASH GROUP BY 40M (0) VIEW WINDOW SORT 20M (0) TABLE ACCESS FULL Operation Used-Mem SELECT STATEMENT VIEW MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO 20M (0) TABLE ACCESS FULL

11 2) “Start of Group” Identify group boundaries, often using LAG() 3 steps instead of 2: 1.For each row: if start of group, assign 1 Else assign 0 2.Running total of 1s and 0s produces a group identifier 3.Group by the group identifier OOW CON3450, Stew Ashton 11

12 GROUP_NAMEEFF_DATETERM_DATE X2014-01-01 00:002014-02-01 00:00 X2014-03-01 00:002014-04-01 00:00 X 2014-05-01 00:00 X2014-06-01 00:002014-06-01 01:00 X 2014-06-01 02:00 X 2014-06-01 03:00 Y 2014-06-01 04:00 Y 2014-06-01 05:00 Y2014-07-03 08:002014-09-29 17:00 2) Requirement OOW CON3450, Stew Ashton 12 Merge contiguous date ranges in same group

13 OOW CON3450, Stew Ashton 13 1 2 2 3 3 3 1 1 2 X01-01 00:0002-01 00:001 X03-01 00:0004-01 00:001 X 05-01 00:000 X06-01 00:0006-01 01:001 X 06-01 02:000 X 06-01 03:000 Y 06-01 04:001 Y 06-01 05:000 Y07-03 08:0009-29 17:001 X01-01 00:0002-01 00:00 X03-01 00:0005-01 00:00 X06-01 00:0006-01 03:00 Y 06-01 05:00 Y07-03 08:0009-29 17:00 with grp_starts as ( select a.*, case when start_ts = lag(end_ts) over( partition by group_name order by start_ts ) then 0 else 1 end grp_start from t a ), grps as ( select b.*, sum(grp_start) over( partition by group_name order by start_ts ) grp_id from grp_starts b) select group_name, min(start_ts) start_ts, max(end_ts) end_ts from grps group by group_name, grp_id;

14 2) Match_Recognize OOW CON3450, Stew Ashton 14 SELECT * FROM t MATCH_RECOGNIZE( PARTITION BY group_name ORDER BY start_ts MEASURES A.start_ts start_ts, end_ts end_ts, next(start_ts) - end_ts gap PATTERN(A B*) DEFINE B AS start_ts = prev(end_ts) ); New this time: Added PARTITION BY MEASURES added gap using row outside the match! ONE ROW PER MATCH and SKIP PAST LAST ROW are the defaults One solution replaces two methods: simple!

15 Which row do we mean? OOW CON3450, Stew Ashton 15 ExpressionDEFINE MEASURES ALL ROWS…ONE ROW… start_tscurrent rowlast row of match FIRST(start_ts)First row of match LAST(end_ts)current rowlast row of match FINAL LAST(end_ts) ORA-62509last row of match B.start_tsmost recent B rowlast B row PREV(), NEXT()Physical offset from referenced row COUNT(*)from first to current rowall rows in match COUNT(B.*)B rows including current rowall B rows

16 2) Run_Stats comparison OOW CON3450, Stew Ashton 16 For 500,000 rows: StatPre 12cMatch_RPct Latches10165806679% Elapsed Time32,1620,5864% CPU used by this session31,9419,6762%

17 Operation Used-Mem SELECT STATEMENT HASH GROUP BY 20M (0) VIEW WINDOW BUFFER 32M (0) VIEW WINDOW SORT 27M (0) TABLE ACCESS FULL Operation Used-Mem SELECT STATEMENT VIEW MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO 27M (0) TABLE ACCESS FULL 2) Execution Plans OOW CON3450, Stew Ashton 17

18 2) Predicate pushing OOW CON3450, Stew Ashton 18 Select * from where group_name = 'X' OperationNameA-RowsBuffers SELECT STATEMENT 34 VIEW 34 MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO 34 TABLE ACCESS BY INDEX ROWID BATCHED T64 INDEX RANGE SCAN TI63

19 3) “Bin fitting”: fixed size Requirement – Order by study_site – Put in “bins” with size = 65,000 max OOW CON3450, Stew Ashton 19 STUDY_SITECNT STUDY_SITECNT 10013407 1026137 10024323 10286005 10041623 102976 10081991 10314599 1011885 10321989 101211597 10343427 10141989 1036879 10155282 10386485 10172841 10393 10185183 10401105 10206176 10416460 10222784 1042968 102325865 1044471 10243734 10453360 FIRST_SITELAST_SITESUM_CNT 1001102248081 1023104462203 1045 3360

20 20 SELECT s first_site, MAX(e) last_site, MAX(sm) sum_cnt FROM ( SELECT s, e, cnt, sm FROM t MODEL MEASURES (study_site s, study_site e, cnt, cnt sm) RULES ( sm[ > 1] = CASE WHEN sm[cv() - 1] + cnt[cv()] > 65000 OR cnt[cv()] > 65000 THEN cnt[cv()] ELSE sm[cv() - 1] + cnt[cv()] END, s[ > 1] = CASE WHEN sm[cv() - 1] + cnt[cv()] > 65000 OR cnt[cv()] > 65000 THEN s[cv()] ELSE s[cv() - 1] END ) GROUP BY s; DIMENSION with row_number orders data and processing rn can be used like a subscript cv() means current row cv()-1 means previous row DIMENSION BY (row_number() over(order by study_site) rn) rn [cv() – 1] [cv()] [cv()] [cv()] [cv() – 1] [cv()] rn [cv() - 1] [cv()] [cv()] [cv()] [cv() – 1]

21 OOW CON3450, Stew Ashton 21 SELECT * FROM t MATCH_RECOGNIZE ( ORDER BY study_site MEASURES FIRST(study_site) first_site, LAST(study_site) last_site, SUM(cnt) sum_cnt PATTERN (A+) DEFINE A AS SUM(cnt) <= 65000 ); New this time: PATTERN (A+) replaces (A B*) means 1 or more rows Why? In previous examples I used PREV(), which returns NULL on the first row. One solution replaces 3 methods: simpler!

22 3) Run_Stats comparison OOW CON3450, Stew Ashton 22 For one million rows: StatPre 12cMatch_RPct Latches35744846221% Elapsed Time32.852.99% CPU used by this session31.312.889%

23 Id Operation Used-Mem 0 SELECT STATEMENT 1 HASH GROUP BY 7534K (0) 2 VIEW 3 SQL MODEL ORDERED 105M (0) 4 WINDOW SORT 27M (0) 5 TABLE ACCESS FULL Id Operation Used-Mem 0 SELECT STATEMENT 1 VIEW 2 MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO 27M (0) 3 TABLE ACCESS FULL 3) Execution Plans OOW CON3450, Stew Ashton 23

24 NameVal BIN1BIN2BIN3 1110 229 9 338 98 447 915 5561015 665 7741915 883191815 992191817 10 11918 4) “Bin fitting”: fixed number OOW CON3450, Stew Ashton 24 Requirement – Distribute values in 3 “bins” as equally as possible “Best fit decreasing” – Sort values in decreasing order – Put each value in least full bin

25 4) Brilliant pre 12c solution OOW CON3450, Stew Ashton 25 SELECT bin, Max (bin_value) bin_value FROM ( SELECT * FROM items MODEL DIMENSION BY (Row_Number() OVER (ORDER BY item_value DESC) rn) MEASURES ( item_name, item_value, Row_Number() OVER (ORDER BY item_value DESC) bin, item_value bin_value, Row_Number() OVER (ORDER BY item_value DESC) rn_m, 0 min_bin, Count(*) OVER () - 3 - 1 n_iters ) RULES ITERATE(100000) UNTIL (ITERATION_NUMBER >= n_iters[1]) ( min_bin[1] = Min(rn_m) KEEP (DENSE_RANK FIRST ORDER BY bin_value)[rn<= 3], bin[ITERATION_NUMBER + 3 + 1] = min_bin[1], bin_value[min_bin[1]] = bin_value[CV()] + Nvl(item_value[ITERATION_NUMBER+4], 0)) ) WHERE item_name IS NOT NULL group by bin;

26 OOW CON3450, Stew Ashton 26 SELECT * from items MATCH_RECOGNIZE ( ORDER BY item_value desc MEASURES sum(bin1.item_value) bin1, sum(bin2.item_value) bin2, sum(bin3.item_value) bin3 PATTERN ((bin1|bin2|bin3)+) DEFINE bin1 AS count(bin1.*) = 1 OR sum(bin1.item_value)-bin1.item_value <= least( sum(bin2.item_value), sum(bin3.item_value) ), bin2 AS count(bin2.*) = 1 OR sum(bin2.item_value)-bin2.item_value <= sum(bin3.item_value) ); ()+ = 1 or more of whatever is inside '|' = alternatives, “preferred in the order specified” Bin1 condition: No rows here yet, Or this bin least full Bin2 condition No rows here yet, or This bin less full than 3 PATTERN ((bin1|bin2|bin3)+) bin1 AS count(bin1.*) = 1 OR sum(bin1.item_value)-bin1.item_value <= least( sum(bin2.item_value), sum(bin3.item_value) ), bin2 AS count(bin2.*) = 1 OR sum(bin2.item_value)-bin2.item_value <= sum(bin3.item_value)

27 4) Run_Stats comparison OOW CON3450, Stew Ashton 27 For 10,000 rows: StatPre 12cMatch_RPct Latches3124472% Elapsed Time280.020% CPU used by this session26.390.030%

28 4) Execution Plans OOW CON3450, Stew Ashton 28 Id Operation Used-Mem 0 SELECT STATEMENT 1 HASH GROUP BY 817K (0) 2 VIEW 3 SQL MODEL ORDERED 1846K (0) 4 WINDOW SORT 424K (0) 5 TABLE ACCESS FULL Id Operation Used-Mem 0 SELECT STATEMENT 1 VIEW 2 MATCH RECOGNIZE SORT 330K (0) 3 TABLE ACCESS FULL

29 Backtracking What happens when there is no match??? “Greedy” quantifiers - * + {2,} – are not that greedy – Take all the rows they can, BUT give rows back if necessary – one at a time Regular expression engines will test all possible combinations to find a match OOW CON3450, Stew Ashton 29

30 Repeating conditions select 'match' from ( select level n from dual connect by level <= 100 ) match_recognize( pattern(a b* c) define b as n > prev(n), c as n = 0 ); Runs in 0.005 secs select 'match' from ( select level n from dual connect by level <= 100 ) match_recognize( pattern(a b* b* b* c) define b as n > prev(n), c as n = 0 ); Runs in 5.4 secs OOW CON3450, Stew Ashton 30

31 SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(tstamp) AS start_tstamp, LAST(tstamp) AS end_tstamp AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+ DOWN+ UP+) DEFINE DOWN AS price < PREV(price), UP AS price > PREV(price), STRT AS price >= nvl(PREV(PRICE),0) ); Runs in 0.02 seconds Imprecise Conditions CREATE TABLE Ticker ( SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER ); insert into ticker select 'ACME', sysdate + level/24/60/60, 10000-level from dual connect by level <= 5000; 31 SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(tstamp) AS start_tstamp, LAST(tstamp) AS end_tstamp AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+ DOWN+ UP+) DEFINE DOWN AS price < PREV(price), UP AS price > PREV(price) ); Runs in 24 seconds INMEMORY: 13 seconds

32 Keep in Mind Backtracking – Precise conditions – Test data with no matches To debug: Measures classifier() cl, match_number() mn All rows per match with unmatched rows No DISTINCT, no LISTAGG MEASURES columns must have aliases “Reluctant quantifier” = ? = JDBC bind variable “Pattern variables” are range variables, not bind variables OOW CON3450, Stew Ashton 32

33 Output Row “shape” Per MatchPARTITION BYORDER BYMEASURESOther input ONE ROWXOmittedXomitted ALL ROWSXXXX OOW CON3450, Stew Ashton 33 ORA-00918, anyone?

34 Questions? More details at: stewashton.wordpress.com stewashton.wordpress.com 34


Download ppt "Database 12c Row Pattern Matching Beating the Best Pre-12c Solutions [CON3450] Stew ASHTON Oracle OpenWorld 2014."

Similar presentations


Ads by Google