Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,

Similar presentations


Presentation on theme: "1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,"— Presentation transcript:

1

2 1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem, Fall 2002

3 2 Contents of the lecture Introduction Proposed Architecture of Data Stream Management System Research problems Query Optimization Bibliography

4 3 Data Streams vs. Data Sets Data Sets:Data Streams:  Updates infrequent  Data changed constantly (sometimes additions only)  Old data required many times  Mostly only freshest data used  Example: employees personal data table  Examples: financial tickers, data feeds from sensors, network monitoring, etc

5 4 Using Traditional Database User/ApplicationUser/Application LoaderLoader QueryResult Result…Query…

6 5 Data Streams Paradigm User/ApplicationUser/Application Register Query Stream Query Processor Result

7 6 Data Streams Paradigm User/ApplicationUser/Application Register Query Stream Query Processor Result Scratch Space (Memory and/or Disk) Data Stream Management System (DSMS)

8 7 What Is A Continuous Query ? Query which is issued once and logically run continuously.

9 8 What is Continuous Query ? Query which is issued once and run continuously. Example: detect abnormalities in network traffic behavior in real-time and their cause -- like link congestion due to hardware failure.

10 9 What is Continuous Query ? Query which is issued once and run continuously. More examples: Continues queries used to support load balancing, online automatic trading at Stock Exchange

11 10 Special Challenges Timely online answers even for rapid data streams Timely online answers even for rapid data streams Ability of fast access to large portions of data Ability of fast access to large portions of data Processing of multiple streams simultaneously Processing of multiple streams simultaneously

12 11 Making Things Concrete Outgoing (call_ID, caller, time, event) Incoming (call_ID, callee, time, event) event = start or end Central Office Central Office DSMS BOBALICE

13 12 Making Things Concrete Database = two streams of mobile call records Database = two streams of mobile call records  Outgoing(connectionID, caller, start, end)  Incoming(connectionID, callee, start, end) Query language = SQL Query language = SQL FROM clauses can refer to streams and/or relations

14 13 Query 1 (self-join) Find all outgoing calls longer than 2 minutes SELECT O1.call_ID, O1.caller FROM Outgoing O1, Outgoing O2 WHERE (O2.time – O1.time > 2 AND O1.call_ID = O2.call_ID AND O1.call_ID = O2.call_ID AND O1.event = start AND O1.event = start AND O2.event = end) AND O2.event = end)  Result requires unbounded storage  Can provide result as data stream  Can output after 2 min, without seeing end

15 14 Query 2 (join) Pair up callers and callees SELECT O.caller, I.callee FROM Outgoing O, Incoming I WHERE O.call_ID = I.call_ID  Can still provide result as data stream  Requires unbounded temporary storage …  … unless streams are near-synchronized

16 15 Query 3 (group-by aggregation) Total connection time for each caller SELECT O1.caller, sum(O2.time – O1.time) FROM Outgoing O1, Outgoing O2 WHERE (O1.call_ID = O2.call_ID AND O1.event = start AND O1.event = start AND O2.event = end) AND O2.event = end) GROUP BY O1.caller  Cannot provide result in (append-only) stream. Alternatives: Alternatives: Output stream with updates Output stream with updates Provide current value on demand Provide current value on demand Keep answer in memory Keep answer in memory

17 16 Conclusions  Conventional DBMS technology is inadequate  We need reconsider all aspects of data management and processing in presence of data streams

18 17 DBMS versus DSMS Persistent relationsPersistent relations Transient streams (and persistent relations)

19 18 DBMS versus DSMS Persistent relationsPersistent relationsTransient streams (and persistent relations) One-time queriesOne-time queries Continuous queriesContinuous queries

20 19 DBMS versus DSMS Persistent relationsPersistent relations Transient streams (and persistent relations) One-time queriesOne-time queries Continuous queriesContinuous queries Random accessRandom access Sequential accessSequential access

21 20 DBMS versus DSMS Persistent relationsPersistent relations Transient streams (and persistent relations) One-time queriesOne-time queries Continuous queriesContinuous queries Random accessRandom access Sequential accessSequential access Access plan determined by query processor and physical DB designAccess plan determined by query processor and physical DB design Unpredictable data arrival and characteristicsUnpredictable data arrival and characteristics

22 21 DBMS versus DSMS Persistent relationsPersistent relations Transient streams (and persistent relations) One-time queriesOne-time queries Continuous queriesContinuous queries Random accessRandom access Sequential accessSequential access Access plan determined by query processor and physical DB designAccess plan determined by query processor and physical DB design Unpredictable data arrival and characteristicsUnpredictable data arrival and characteristics “Unbounded” disk store“Unbounded” disk store Bounded main memoryBounded main memory

23 22 Related work Tapestry system Content-based filtering of email messages. Restricted subset of SQL append-only query results Content-based filtering of email messages. Restricted subset of SQL append-only query results Cronicle data model Cronicle data model Append-only ordered sequences of tuples restricted view-definition language doesnt store any cronicles Append-only ordered sequences of tuples restricted view-definition language doesnt store any cronicles Alert system Alert system Event-condition Action triggers in conventional SQL DB Continuous Queries over append-only "active tables". Event-condition Action triggers in conventional SQL DB Continuous Queries over append-only "active tables".

24 23 Related work Materialized Views  Materialized Views are queries which need to be reevaluated whenever database changes.  Materialized Views vs. Continuous Queries: Continuous Queries  May stream rather then store result  May deal with append only relations  May provide approximate answers  Processing strategy may adapt characteristics of data stream

25 24 Architecture for continuous queries Single stream of tuples D, single continuous Query Q and Answer to the query A Q is issued once and operates continuously Q Data Stream Continuous Query A? Answer

26 25 Architecture for continuous queries We consider data streams that adhere to the relation model (i. e. streams of tuples), although many of the ideas and techniques are independent of the data model being considered Q Data Stream Continuous Query A? Answer

27 26 Architecture for continuous queries Scenario 1 (simplest): Data stream D is append only - no updates or deletions. How to handle Q? 1) Always store current answer A to Q. 1) Always store current answer A to Q. D is of unbounded size => A may be too. D is of unbounded size => A may be too. 2) Not to store A, but make new tuples in A available as another continuous stream. 2) Not to store A, but make new tuples in A available as another continuous stream. No need for unbounded storage for A, but may need unbounded storage to determine new tuples in A. No need for unbounded storage for A, but may need unbounded storage to determine new tuples in A.

28 27 Architecture for continuous queries Scenario 2 Input stream is append-only, but may cause updates and deletions in answer A. Input stream is append-only, but may cause updates and deletions in answer A. => May need to update/delete tuples in output data stream => May need to update/delete tuples in output data stream Scenario3 (most general) Input stream D includes updates and deletions. Input stream D includes updates and deletions. => Much data of stream should be stored to determine answer. => Much data of stream should be stored to determine answer.

29 28 Architecture for continuous queries How to solve? 1) Restrict expressiveness of Q. 1) Restrict expressiveness of Q. 2) Impose constrains on data stream to 2) Impose constrains on data stream to guarantee that answer to Q is bounded guarantee that answer to Q is bounded and amount of data needed to compute Q. and amount of data needed to compute Q. 3) Provide approximate answer. 3) Provide approximate answer.

30 29 Arcitecture for processing continuous queries Stream Query Processor Processor Stream 1 Stream 2 Stream N...... Throw Scratch Store Stream

31 30 Architecture for continuous queries STREAM is data stream containing tuples appended to A. It is append-only stream (shouldnt include updates/deletions) STREAM is data stream containing tuples appended to A. It is append-only stream (shouldnt include updates/deletions) STREAM and STORE define current answer A. STREAM and STORE define current answer A.

32 31 Architecture for continuous queries When query Q is notified of new tuple t in a relevant data stream, it can perform number of actions, which are not mutually exclusive 1) t causes new tuples in A 1) t causes new tuples in A if tuple a will remain in A forever: if tuple a will remain in A forever: send a to STREAM send a to STREAM 2) if a should be in A, but may be removed at some moment: add a to STORE 2) if a should be in A, but may be removed at some moment: add a to STORE Stream Query Processor Processor Throw ScratchStore Stream

33 32 Architecture for continuous queries When query Q is notified of new tuple t in a relevant data stream, it can perform number of actions, which are not mutually exclusive 3) t may cause update or deletion 3) t may cause update or deletion of answer tuples in Store. Answer of answer tuples in Store. Answer tuples may be moved from tuples may be moved from STORE to STREAM STORE to STREAM 4) May need to save t or derived data to ensure in future can compute data to ensure in future can compute query result send t to SCRATCH query result send t to SCRATCH Stream Query Processor Processor Throw ScratchStore Stream

34 33 Architecture for continuous queries When query Q is notified of new tuple t in a relevant data stream, it can perform number of actions, which are not mutually exclusive 5) t not needed and will not be needed. Send it to THROW needed. Send it to THROW (unless we like to archive it) 6) As a result of t we may move data from STORE or SCRATCH data from STORE or SCRATCH to THROW Stream Query Processor Processor Throw ScratchStore Stream

35 34 Architecture for continuous queries Scenario1 Data stream D is append only - no updates or deletions. Always store current answer A to Q. STREAM empty STORE always contain A SCRATCH contains whatever needed to to keep answer in STORE up to date


Download ppt "1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,"

Similar presentations


Ads by Google