1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
Maintaining Variance over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O ’ Callaghan, Stanford University ACM Symp. on Principles.
A Data Stream Management System for Network Traffic Management Shivnath Babu Stanford University Lakshminarayanan Subramanian Univ. California, Berkeley.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Data Stream Computation Lecture Notes in COMP 9314 modified from those by Nikos Koudas (Toronto U), Divesh Srivastava (AT & T), and S. Muthukrishnan (Rutgers)
File Management Chapter 12. File Management File management system is considered part of the operating system Input to applications is by means of a file.
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
1 Stream-based Data Management IS698 Min Song 2 Characteristics of Data Streams  Data Streams Data streams — continuous, ordered, changing, fast, huge.
Chapter 8 : Transaction Management. u Function and importance of transactions. u Properties of transactions. u Concurrency Control – Meaning of serializability.
Building a Data Stream Management System Prof. Jennifer Widom Joint project with Prof. Rajeev Motwani and a team of graduate studentshttp://www-db.stanford.edu/stream.
1 PODS 2002 Motivation. 2 PODS 2002 Data Streams data sets Traditional DBMS – data stored in finite, persistent data sets data streams New Applications.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Lecture 3&4 Components of the computer. Computer components.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Chapter 1 An Overview of Database Management. 1-2 Topics in this Chapter What is a Database System? What is a Database? Why Database? Data Independence.
The Stanford Data Streams Research Project Profs. Rajeev Motwani & Jennifer Widom And a cast of full- and part-time students: Arvind Arasu, Brian Babcock,
Transaction Management WXES 2103 Database. Content What is transaction Transaction properties Transaction management with SQL Transaction log DBMS Transaction.
Objectives of the Lecture :
IT – DBMS Concepts Relational Database Theory.
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
CPS 216: Advanced Database Systems Shivnath Babu.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
Database Design – Lecture 16
CPS 216: Advanced Database Systems Shivnath Babu Fall 2006.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
2. Database System Concepts and Architecture
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Database Management 9. course. Execution of queries.
Data Stream Systems Reynold Cheng 12 th July, 2002 Based on slides by B. Babcock et.al, “Models and Issues in Data Stream Systems”, PODS’02.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Overview – Chapter 11 SQL 710 Overview of Replication
PODS Models and Issues in Data Stream Systems Rajeev Motwani Stanford University (with Brian Babcock, Shivnath Babu, Mayur Datar, and Jennifer Widom)
Delivery, Forwarding, and Routing of IP Packets
Hidemoto Nakada, Hirotaka Ogawa and Tomohiro Kudoh National Institute of Advanced Industrial Science and Technology, Umezono, Tsukuba, Ibaraki ,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Data Stream Management Systems
Aum Sai Ram Security for Stream Data Modified from slides created by Sujan Pakala.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Data Mining: Concepts and Techniques Mining data streams
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L12_JDBC_MySQL 1 Transations.
W. Hong & S. Madden – Implementation and Research Issues in Query Processing for Wireless Sensor Networks, ICDE 2004.
CS 540 Database Management Systems
DAY 14: ACCESS CHAPTER 1 RAHUL KAVI October 8,
Agenda  Quick Review  Finish Introduction  Java Threads.
SQL Triggers, Functions & Stored Procedures Programming Operations.
Understanding DBMSs. Data Management Data Query Application DataBase Management System (DBMS)
Streaming Semantic Data COMP6215 Semantic Web Technologies Dr Nicholas Gibbins –
Operating Systems Distributed-System Structures. Topics –Network-Operating Systems –Distributed-Operating Systems –Remote Services –Robustness –Design.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing.
Advanced Database Systems: DBS CB, 2nd Edition
Table General Guidelines for Better System Performance
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
The Design of an Acquisitional Query Processor For Sensor Networks
Table General Guidelines for Better System Performance
Chapter 8 Advanced SQL.
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
LO3 – Understand Business IT Systems
Adaptive Query Processing (Background)
Presentation transcript:

1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem, Fall 2002

2 Contents of the lecture Introduction Proposed Architecture of Data Stream Management System Research problems Query Optimization Bibliography

3 Data Streams vs. Data Sets Data Sets:Data Streams:  Updates infrequent  Data changed constantly (sometimes additions only)  Old data required many times  Mostly only freshest data used  Example: employees personal data table  Examples: financial tickers, data feeds from sensors, network monitoring, etc

4 Using Traditional Database User/ApplicationUser/Application LoaderLoader QueryResult Result…Query…

5 Data Streams Paradigm User/ApplicationUser/Application Register Query Stream Query Processor Result

6 Data Streams Paradigm User/ApplicationUser/Application Register Query Stream Query Processor Result Scratch Space (Memory and/or Disk) Data Stream Management System (DSMS)

7 What Is A Continuous Query ? Query which is issued once and logically run continuously.

8 What is Continuous Query ? Query which is issued once and run continuously. Example: detect abnormalities in network traffic behavior in real-time and their cause -- like link congestion due to hardware failure.

9 What is Continuous Query ? Query which is issued once and run continuously. More examples: Continues queries used to support load balancing, online automatic trading at Stock Exchange

10 Special Challenges Timely online answers even for rapid data streams Timely online answers even for rapid data streams Ability of fast access to large portions of data Ability of fast access to large portions of data Processing of multiple streams simultaneously Processing of multiple streams simultaneously

11 Making Things Concrete Outgoing (call_ID, caller, time, event) Incoming (call_ID, callee, time, event) event = start or end Central Office Central Office DSMS BOBALICE

12 Making Things Concrete Database = two streams of mobile call records Database = two streams of mobile call records  Outgoing(connectionID, caller, start, end)  Incoming(connectionID, callee, start, end) Query language = SQL Query language = SQL FROM clauses can refer to streams and/or relations

13 Query 1 (self-join) Find all outgoing calls longer than 2 minutes SELECT O1.call_ID, O1.caller FROM Outgoing O1, Outgoing O2 WHERE (O2.time – O1.time > 2 AND O1.call_ID = O2.call_ID AND O1.call_ID = O2.call_ID AND O1.event = start AND O1.event = start AND O2.event = end) AND O2.event = end)  Result requires unbounded storage  Can provide result as data stream  Can output after 2 min, without seeing end

14 Query 2 (join) Pair up callers and callees SELECT O.caller, I.callee FROM Outgoing O, Incoming I WHERE O.call_ID = I.call_ID  Can still provide result as data stream  Requires unbounded temporary storage …  … unless streams are near-synchronized

15 Query 3 (group-by aggregation) Total connection time for each caller SELECT O1.caller, sum(O2.time – O1.time) FROM Outgoing O1, Outgoing O2 WHERE (O1.call_ID = O2.call_ID AND O1.event = start AND O1.event = start AND O2.event = end) AND O2.event = end) GROUP BY O1.caller  Cannot provide result in (append-only) stream. Alternatives: Alternatives: Output stream with updates Output stream with updates Provide current value on demand Provide current value on demand Keep answer in memory Keep answer in memory

16 Conclusions  Conventional DBMS technology is inadequate  We need reconsider all aspects of data management and processing in presence of data streams

17 DBMS versus DSMS Persistent relationsPersistent relations Transient streams (and persistent relations)

18 DBMS versus DSMS Persistent relationsPersistent relationsTransient streams (and persistent relations) One-time queriesOne-time queries Continuous queriesContinuous queries

19 DBMS versus DSMS Persistent relationsPersistent relations Transient streams (and persistent relations) One-time queriesOne-time queries Continuous queriesContinuous queries Random accessRandom access Sequential accessSequential access

20 DBMS versus DSMS Persistent relationsPersistent relations Transient streams (and persistent relations) One-time queriesOne-time queries Continuous queriesContinuous queries Random accessRandom access Sequential accessSequential access Access plan determined by query processor and physical DB designAccess plan determined by query processor and physical DB design Unpredictable data arrival and characteristicsUnpredictable data arrival and characteristics

21 DBMS versus DSMS Persistent relationsPersistent relations Transient streams (and persistent relations) One-time queriesOne-time queries Continuous queriesContinuous queries Random accessRandom access Sequential accessSequential access Access plan determined by query processor and physical DB designAccess plan determined by query processor and physical DB design Unpredictable data arrival and characteristicsUnpredictable data arrival and characteristics “Unbounded” disk store“Unbounded” disk store Bounded main memoryBounded main memory

22 Related work Tapestry system Content-based filtering of messages. Restricted subset of SQL append-only query results Content-based filtering of messages. Restricted subset of SQL append-only query results Cronicle data model Cronicle data model Append-only ordered sequences of tuples restricted view-definition language doesnt store any cronicles Append-only ordered sequences of tuples restricted view-definition language doesnt store any cronicles Alert system Alert system Event-condition Action triggers in conventional SQL DB Continuous Queries over append-only "active tables". Event-condition Action triggers in conventional SQL DB Continuous Queries over append-only "active tables".

23 Related work Materialized Views  Materialized Views are queries which need to be reevaluated whenever database changes.  Materialized Views vs. Continuous Queries: Continuous Queries  May stream rather then store result  May deal with append only relations  May provide approximate answers  Processing strategy may adapt characteristics of data stream

24 Architecture for continuous queries Single stream of tuples D, single continuous Query Q and Answer to the query A Q is issued once and operates continuously Q Data Stream Continuous Query A? Answer

25 Architecture for continuous queries We consider data streams that adhere to the relation model (i. e. streams of tuples), although many of the ideas and techniques are independent of the data model being considered Q Data Stream Continuous Query A? Answer

26 Architecture for continuous queries Scenario 1 (simplest): Data stream D is append only - no updates or deletions. How to handle Q? 1) Always store current answer A to Q. 1) Always store current answer A to Q. D is of unbounded size => A may be too. D is of unbounded size => A may be too. 2) Not to store A, but make new tuples in A available as another continuous stream. 2) Not to store A, but make new tuples in A available as another continuous stream. No need for unbounded storage for A, but may need unbounded storage to determine new tuples in A. No need for unbounded storage for A, but may need unbounded storage to determine new tuples in A.

27 Architecture for continuous queries Scenario 2 Input stream is append-only, but may cause updates and deletions in answer A. Input stream is append-only, but may cause updates and deletions in answer A. => May need to update/delete tuples in output data stream => May need to update/delete tuples in output data stream Scenario3 (most general) Input stream D includes updates and deletions. Input stream D includes updates and deletions. => Much data of stream should be stored to determine answer. => Much data of stream should be stored to determine answer.

28 Architecture for continuous queries How to solve? 1) Restrict expressiveness of Q. 1) Restrict expressiveness of Q. 2) Impose constrains on data stream to 2) Impose constrains on data stream to guarantee that answer to Q is bounded guarantee that answer to Q is bounded and amount of data needed to compute Q. and amount of data needed to compute Q. 3) Provide approximate answer. 3) Provide approximate answer.

29 Arcitecture for processing continuous queries Stream Query Processor Processor Stream 1 Stream 2 Stream N Throw Scratch Store Stream

30 Architecture for continuous queries STREAM is data stream containing tuples appended to A. It is append-only stream (shouldnt include updates/deletions) STREAM is data stream containing tuples appended to A. It is append-only stream (shouldnt include updates/deletions) STREAM and STORE define current answer A. STREAM and STORE define current answer A.

31 Architecture for continuous queries When query Q is notified of new tuple t in a relevant data stream, it can perform number of actions, which are not mutually exclusive 1) t causes new tuples in A 1) t causes new tuples in A if tuple a will remain in A forever: if tuple a will remain in A forever: send a to STREAM send a to STREAM 2) if a should be in A, but may be removed at some moment: add a to STORE 2) if a should be in A, but may be removed at some moment: add a to STORE Stream Query Processor Processor Throw ScratchStore Stream

32 Architecture for continuous queries When query Q is notified of new tuple t in a relevant data stream, it can perform number of actions, which are not mutually exclusive 3) t may cause update or deletion 3) t may cause update or deletion of answer tuples in Store. Answer of answer tuples in Store. Answer tuples may be moved from tuples may be moved from STORE to STREAM STORE to STREAM 4) May need to save t or derived data to ensure in future can compute data to ensure in future can compute query result send t to SCRATCH query result send t to SCRATCH Stream Query Processor Processor Throw ScratchStore Stream

33 Architecture for continuous queries When query Q is notified of new tuple t in a relevant data stream, it can perform number of actions, which are not mutually exclusive 5) t not needed and will not be needed. Send it to THROW needed. Send it to THROW (unless we like to archive it) 6) As a result of t we may move data from STORE or SCRATCH data from STORE or SCRATCH to THROW Stream Query Processor Processor Throw ScratchStore Stream

34 Architecture for continuous queries Scenario1 Data stream D is append only - no updates or deletions. Always store current answer A to Q. STREAM empty STORE always contain A SCRATCH contains whatever needed to to keep answer in STORE up to date