Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arvind Arasu, Brian Babcock

Similar presentations


Presentation on theme: "Arvind Arasu, Brian Babcock"— Presentation transcript:

1 Characterizing Memory Requirements for Queries over Continuous Data Streams
Arvind Arasu, Brian Babcock*, Shivnath Babu, Jon McAlister, Jennifer Widom Stanford University *Speaker

2 Continuous Data Streams
Network traffic data Transaction logs Call records, Web logs, ... Financial data Sensor networks Scientific data Astronomy, Biology, ...

3 A DBMS for Data Streams? Lots of existing work in data streams
Mostly special-purpose applications We’re building a general-purpose “data stream management system” (DSMS)

4 RBDMS DSMS Relations stored on disk Tuples read once & discarded
Data access patterns controlled by RDBMS Data must be processed as it arrives Query answers are relations Query answers are streams

5 Query Execution Model ? Client DSMS 1. Client registers query
…and answers returned to client ? 2. Tuples arrive on streams... ...are read and discarded... S T Limited-size “scratch space” available Memory DSMS

6 Our Problem: Given a data stream query, determine how much memory is required to evaluate it.

7 Queries We Consider SPJ Queries: L(P (S1 x S2 x … x Sn))
Projection is either duplicate-preserving or duplicate-eliminating Selection predicates are conjunctions of: Si.A Op Sj.B -or- Si.A Op k Op {>, > , =, <, < } All attributes are integers

8 An example with no joins
SELECT cust_id FROM orders WHERE amt > 5 DISTINCT Requires bounded memory Remember cust_ids from AND cust_id > 1000 AND cust_id < 9999 Requires unbounded memory All cust_ids must be remembered Requires no “scratch” memory Each tuple is independent Tuples in the answer are streamed away

9 An example with an equijoin
SELECT R.prod_id FROM orders O, returns R WHERE O.order_num = R.order_num AND R.prod_id >= 100 AND R.prod_id < 199 AND O.order_num > 1000 AND O.order_num < 1103

10 An example with an inequality
SELECT FROM orders O, inventory I WHERE O.amt > I.qty AND O.prod_id >= 100 AND O.prod_id < 300 DISTINCT O.prod_id O.prod_id 100 101 299 ... 45 2 prod_id MAX(amt) MIN(amt) 17 12 36 21

11 “Locally Totally Ordered” Queries
LTO Queries: SPJ queries with additional predicates applied For each stream, stipulate a total order for all attributes in the stream & all constants Only allow tuples whose attribute values follow that ordering All SPJ queries can be written as a union of LTO queries

12 Example of an LTO query Stream S: (A, B) Stream T: (C, D)
SELECT S.A, T.C FROM S, T WHERE S.B > 12 SELECT S.A, T.C FROM S, T WHERE S.B > 12 AND S.A = S.B AND T.D < T.C AND T.C < 12

13 MinRef and MaxRef For each stream S in the query:
MinRef(S) = { S.A : S.A < T.B is a necessary inequality in the predicate} SELECT T.C FROM S, T WHERE S.A < 5 AND 5 < T.C AND S.A < T.C Example: { S.A < 5, 5 < T.C} => S.A < T.C, so S.A < T.C is not a necessary inequality.

14 Bounded-Memory Conditions
1. All attributes in the projection list must be bounded. 2. All attributes participating in equijoins must be bounded. 3. In each stream S, |MinRef(S)| + |MaxRef(S)|: = 0, for SELECT < 1, for SELECT DISTINCT

15 An unbounded example SELECT DISTINCT T.E FROM S, T WHERE T.E = 10
AND S.A < T.C AND S.B < T.D (0, 0) (-1, 1) (-2, 2) (-c, c) ... S: (A,B) T: (C,D,E) (1-c, 1+c, 10)

16 Conclusion We consider SPJ queries over data streams
We identify which queries can and cannot be evaluated using bounded memory For queries than can, we provide an execution strategy based on synopses. For queries that cannot, we provide examples of “bad” input streams. Full paper at


Download ppt "Arvind Arasu, Brian Babcock"

Similar presentations


Ads by Google