Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS240B: Assignment1 Winter 2016.

Similar presentations


Presentation on theme: "CS240B: Assignment1 Winter 2016."— Presentation transcript:

1 CS240B: Assignment1 Winter 2016

2 SQL-MR vs SQL-TS SQL-TS is a minimal extension of SQL to demonstrate the power, usability and optimizability of REGEX in SQL SQL-MR adds many useful constructs. E.g.: SQL-TS only supports +, SQL-MR also support * Maximal and incremental matches Permute Window Skip to first, last, next, <var>, past last Subset Define One row, all rows per match Classifier Match_number

3 OLAP Functions The new olap functions to support order-dependent functions, such as: ROWNUMBER, RANK, DENSE_RANK, CUME_DIST, PERCENT_RANK, NTILE, RATIO_TO_REPORT, … These functions produce one value for each tuple Traditional aggregates, such as sum, max and count and their rollups and cubes, operate on sets (order independent). But an input order must be defined to compute: the continuous (a.k.a. cumulative) version of count, COUNT, SUM, MIN, MAX , AVG, and other set aggregates such as: VAR(X). STDDEV(X), COV(X,Y) CORR(X,Y), REGR_SLOPE, REGR_INTERCEPT on windows--logical, physical, and UNBOUNDED-PRECEDING which expresses their cumulative version

4 Why Not Widows on Rollups and Cubes ?
Because they will return several tuples for each new input tuple … For the same reason most (all?) vendors do not support TOP-K with windows... Also very costly ...

5 Which OLAP functions are blocking?
If the input is not ordered as specified by the over (order by ...) clause, they are all blocking. When the input data is already in the right order, some functions such as PERCENT_RANK, and NTILE require we see whole set. So these are still blocking. LAST is also blocking. The implementations of PERCENT_RANK, and NTILE must see and store and all tuples to compute the result in TERMINATE. The above is also true for the basic COUNT, SUM, MAX, etc. But the for the window version of these aggregates we do not need to see all the tuples, we only need to see those before them not those that follow (with unlimited preceding we do not even store the previous tuples) Conclusion: A and B are blocking. Cannot be used on Data Streams. Only C can.

6 Which OLAP on Data Streams?
All those that support windows (group 2) But also ROWNUMBER, RANK and DENSE_RANK which are non-blocking if we assume that the input is properly ordered. Name Row# Marks Rank DenseRank Tom Jeff Mary Alex With unlimited preceeding same as basic. With finite window, a bit more complex...

7 Using Windows PERCENT_RANK, and NTILE
There, order is used to define the aggregate itself, thus it cannot used to define a window. But in real life we might want , to e.g. follow the raising popularity (pop) of ta movie or a song over time. Thus we need a function such as PERCENT_RANK(Pop) Over ( … time window)


Download ppt "CS240B: Assignment1 Winter 2016."

Similar presentations


Ads by Google