ISO/IEC JTC1/SC32 WG3:URC-nnn ANSI NCITS H nnn

Slides:



Advertisements
Similar presentations
Shailendra Mishra Director (CEP).
Advertisements

WHERE Clause Chapter 2. Objectives Limit rows by using a WHERE clause Use the LIKE operator Effect of NULL values Use compound conditions Use the BETWEEN.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
CS240B Midterm Spring 2013 Your Name: and your ID: Problem Max scoreScore Problem 140% Problem 232% Problem 228% Total 100%
TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)
IS698: Database Management Min Song IS NJIT. The Relational Data Model.
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
7 7 Chapter 7 Structured Query Language (SQL) Database Systems: Design, Implementation, and Management 7th Edition Peter Rob & Carlos Coronel.
ORACLE TRANSACTIONS A transaction begins with the first executable SQL statement after a commit, rollback or connection made to the Oracle engine. All.
Chapter 7: User-Defined Functions II Instructor: Mohammad Mojaddam.
Hwrk Week 5: solutions CS240A Fall 2014.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Introduction to Structured Query Language (SQL)
Introduction to Structured Query Language (SQL)
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Structured Query Language Part I Chapter Three CIS 218.
Database Systems More SQL Database Design -- More SQL1.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Introduction to Structured Query Language (SQL)
Concepts of Database Management Sixth Edition
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
Microsoft Access 2010 Chapter 7 Using SQL.
SQL Operations Aggregate Functions Having Clause Database Access Layer A2 Teacher Up skilling LECTURE 5.
Chapter 3 Single-Table Queries
CS&E 1111 AcQueries Querying in Access Sorting data Aggregating Data Performing Calculations Objectives: Learn how to use the Access Query Design Tool.
STAT02 - Descriptive statistics (cont.) 1 Descriptive statistics (cont.) Lecturer: Smilen Dimitrov Applied statistics for testing and evaluation – MED4.
SQL Unit 5 Aggregation, GROUP BY, and HAVING Kirk Scott 1.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Banner and the SQL Select Statement: Part Four (Multiple Connected Select Statements) Mark Holliday Department of Mathematics and Computer Science Western.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Concepts of Database Management Seventh Edition
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
SQL-5 (Group By.. Having). Group By  Need: To apply the aggregate functions to subgroups of tuples in a relation, where the subgroups are based on some.
Chapter 8: SQL. Data Definition Modification of the Database Basic Query Structure Aggregate Functions.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
Database Management COP4540, SCS, FIU Structured Query Language (Chapter 8)
Fushen Wang, XinZhou, Carlo Zaniolo Using XML to Build Efficient Transaction- Time Temporal Database Systems on Relational Databases In Time Center, 2005.
Queries SELECT [DISTINCT] FROM ( { }| ),... [WHERE ] [GROUP BY [HAVING ]] [ORDER BY [ ],...]
Concepts of Database Management Seventh Edition Chapter 3 The Relational Model 2: SQL.
A Guide to SQL, Eighth Edition Chapter Eight SQL Functions and Procedures.
© Jalal Kawash Database Queries Peeking into Computer Science.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
#N14 Pattern Value (aka Substring attribute) SDD 1.1 Initial Discussion XXX = [Proposal | Initial Discussion | General Direction Proposal]
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
CS240A: Databases and Knowledge Bases TSQL2 Carlo Zaniolo Department of Computer Science University of California, Los Angeles Notes From Chapter 6 of.
1 Chapter 3 Single Table Queries. 2 Simple Queries Query - a question represented in a way that the DBMS can understand Basic format SELECT-FROM Optional.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Chapter # 6 The Relational Algebra and Calculus
UCLA, Winter Sample from CS240B Past Midterms
PL/SQL LANGUAGE MULITPLE CHOICE QUESTION SET-1
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
ICS 353: Design and Analysis of Algorithms
Restricting and Sorting Data
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
Reporting Aggregated Data Using the Group Functions
Contents Preface I Introduction Lesson Objectives I-2
ISO/IEC JTC1/SC32 WG3:URC-nnn ANSI NCITS H nnn
CS240B: Assignment1 Winter 2016.
CS240B Midterm: Winter 2017 Your Name: and your ID:
The New and Improved SQL:2016 Standard
CS240A: Databases and Knowledge Bases TSQL2
Shelly Cashman: Microsoft Access 2016
Presentation transcript:

Pattern Matching in Sequences of Rows March 2, 2007 Change Proposal (for SQL standards) ISO/IEC JTC1/SC32 WG3:URC-nnn ANSI NCITS H2-2006-nnn Authors: Fred Zemke (Oracle), Andrew Witkowski (Oracle), Mitch Cherniak (Streambase),Latha Colby (IBM) CS240B Notes by: Carlo Zaniolo Computer Science Department UCLA

Match_Recognize Inspired by SQL-TS, but more verbose and more options. For instance: * — 0 or more matches + — 1 or more matches ? — 0 or 1 match { n } — exactly n matches { n, m } — between n and m (inclusive) matches Alternation: indicated by a vertical bar ( | ). More ...

Example Let Ticker (Symbol, Tstamp, Price) be a table with three columns representing historical stock prices. Symbol is a character column, Tstamp is a timestamp column (for simplicity shown as increasing integers) and Price is a numeric column. We want to partition the data by Symbol, sort it into increasing Tstamp order, and then detect the following pattern in Price: a falling price, followed by a rise in price that goes higher than the price was when the fall began. After finding such patterns, it is desired to report the starting time, starting price, inflection time (last time duringthe decline phase), low price, end time, and end price.

Example FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno SELECT a_symbol, a_tstamp, /* start time */ a_price, /* start price */ max_c_tstamp, /* inflection time */ last_c_price, /* low price */ max_f_tstamp, /* end time */ last_c_price, /* end price */ Matchno ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price))

Measures: Naming and renaming SELECT a_symbol, a_tstamp, /* start time */ a_price, /* start price */ max_c_tstamp, /* inflection time */ last_c_price, /* low price */ max_f_tstamp, /* end time */ last_c_price, /* end price */ Matchno FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno Measures: Naming and renaming ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price))

SELECT a_symbol, a_tstamp, /* start time */ a_price, /* start price */ max_c_tstamp, /* inflection time */ last_c_price, /* low price */ max_f_tstamp, /* end time */ last_c_price, /* end price */ Matchno FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price)) Define the pattern and te conditions which must be satisfied in each state of the pattern No condition on A

SELECT a_symbol, a_tstamp, /* start time */ a_price, /* start price */ max_c_tstamp, /* inflection time */ last_c_price, /* low price */ max_f_tstamp, /* end time */ last_c_price, /* end price */ Matchno FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price)) { ONE ROW | ALL ROWS } PER MATCH { MAXIMAL | INCREMENTAL } MATCH AFTER MATCH SKIP { TO NEXT ROW | PAST LAST ROW | TO LAST<variable> | TO FIRST <variable> }

ALL ROWS PER MATCH :one row for each row in the pattern. FROM Ticker MATCH_RECOGNIZE ( PARTITION BY Symbol ORDER BY Tstamp MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno CLASSIFIER AS Classy SELECT T.Symbol, /* row’s symbol/ * T.Tstamp, /* row’s time */ T.Price, /* row’s price */ T.classy /* row’s classifier */ T.a_tstamp, /* start time */ T.a_price, /* start price */ T.max_c_tstamp, /*inflection time*/ T.last_c_price, /* low price */ T.max_f_tstamp, /* end time */ end price */ ALL ROWS PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE /* A defaults to True, matches any row */ B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price) ) T ALL ROWS PER MATCH :one row for each row in the pattern. In addition to partitioning, ordering and measure columns we can reference other columns. (via T) CLASSIFIER component that may be used to declare a character result column whose contents on each row is the variable name that the row matched with.

Syntactic Sugar Variables can be repeated in the pattern clause SUBSET: to rename a set of variables Portion of the pattern can be excluded (when returning all rows) Special construct to define alternations obtained as permutations of variables

Singletons and group variables FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(a.time) a_firsttime, LAST(d.time) d_lasttime, AVG(b.price) b_avgprice, AVG(d.price) d_avgprice PATTERN ( A B+ C+ D ) DEFINE A AS A.price > 100, B AS B.price > A.price, C AS C.price < AVG (B.price), D AS D.price > PREV(D.price) ) If a variable is a singleton, then only individual columns may be referenced, not aggregates. If the variable is used in an aggregate, then the aggregate is performed over all rows that have matched the variable so far. If desired, we can construe this as providing running aggregates with no special syntax, when a variable is referenced in an aggregate in its own definition, or we can continue to require special syntax to highlight that a running aggregate is meant.

More ALL ROWS PER MATCH—only CLASSIFIER is used to specify the name of a character string column, called the classifier column. In each row of output, the classifier column is set to the variable name in the PATTERN that the row matched. MATCH_NUMBER Matches within a partition are numbered sequentially starting with 1 in the order they are chosenin the previous section. The MATCH_NUMBER component is used to specify a column name for an extra column of output from the MATCH_RECOGNIZE construct. The extra column is an exact numeric with scale 0, and provides the MATCH_NUMBER within a partition, starting with 1 for the first match, 2 for the second, etc. FIRST and LAST special aggregates for group variables

Windows SELECT sum_yprice OVER W, x_time OVER W, AVG(Y.Price) FROM T WINDOW W AS (PARTITION BY .. ORDER BY.. MEASURES SUM(Y.price) AS sum_yprice x.time AS x_time (PATTERN (X Y+ Z)...) )

Some Queries Task 1.1. Assume that you have the following temporal table: emp(Eno, Project, Tstart, Tend) Denoting periods of time during which an employee has worked on a project. The closed intervals denoting these periods could overlap, and thus you need to coalesced them into maximal periods. Suggestion, sort all events in a sequence, and then use SQL-MR to do the actual coalescing, and reconstruct the original table with the intervals coalesced. Task 1.2. Sensors have detected locations of objects at certain time. So items( itemNo, SensorNo, Time) Write an SQL-MR query to detect objects that are going around in a cycle, i.e., they have returned to the same location withing one day. Many objects do not move fast, so the sensor might produce consecutive readings of the same object even if this is not in a cycle. Task 1.3. Same as 1.2 but in SQL-TS

Coalescing emp(Eno, Project, Tstart, Tend) Several overlapping intervals for each employee and project. SELECT c_Eno, c_Project, first_Tstart, max_Tend FROM emp MATCH_RECOGNIZE ( PARTITION BY Eno, Project ORDER BY Tstart MEASURES Z.Eno as c_Eno, Z.Project as c_Project, Z.Tstart as c_Tstart, First(Z.Tstart) AS first_Tstart, MAX (Z.Tend) AS max_Tend, ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (Z+) DEFINE Z as (c_Tstart <= max_Tend) )

Cycles (task 1.2) Sensors: items(itemNo, SensorNo, Time) SELECT T.itemNo, T.SensorNo, T.Time FROM items MATCH_RECOGNIZE ( PARTITION BY ItemNo ORDER BY Time MEASURES A.SensorNo as A_SensorNo, Z.SensorNo as Z_SensorNo, B.SensorNo as B_SensorNo, ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A+, Z+, C) DEFINE Z as (Z_sensorNo <> A_SensorNO), B as (B_sensorNo = A_SensorNO)) as T

Task 1.3. SQL-TS SELECT a.symbol, a.tstamp, /* start time */ a.price, /* start price */ max(tstamp), /* inflection time */ last(f.price), /* low price */ maxtstamp, /* end time */ LAST (C.Price), /* end price */ MAX (F.Tstamp) FROM Ticker AS (A B C* D E* F+) CLUSTER BY Symbol SEQUENCE BY Tstamp % ONE ROW PER MATCH %AFTER MATCH SKIP PAST LAST ROW %MAXIMAL MATCH WHERE B.price < PREV(B.price) AND C.price <= PREV(C.price) AND D.Price > PREV(D.price) AND E.Price >= PREV(E.Price) AND F.price <= A.price F.Price >= PREV(F.price) AND F.price > A.price The green condition of SQL-MR must now be replaced with the blue one.

Blocking and Non-Blocking Queries Blocking (fully): no result till the end is detected---e.g., sum and count. Blocking (partially): some results are only returned at the end—others can be returned early. E.g., coalescing Non-Blocking (NB): all results before the end is detected Claims: Maximal Match Patterns ending with a plus, or a star are not NB in general (i.e., some are but others are not) Patterns with different ending are NB Also all patterns that are not Maximal Match are NB.

Conclusions Specs proposed by 2 DBMS vendors (Oracle & IBM) and 2 DSMS startups (Coral8 and Streambase) Very powerful: capabilities of SQL-TS plus several new constructs of convenience—particularly in controlling output. Optimization techniques developed for SQL-TS could also be critical here.