NiagaraCQ : A Scalable Continuous Query System for Internet Databases Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD.

Slides:



Advertisements
Similar presentations
Implementation of Relational Operations (Part 2) R&G - Chapters 12 and 14.
Advertisements

CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
Data Stream Management Yanlei Diao University of Massachusetts Amherst Some slide content courtesy of Michael Franklin and Jianjun Chen.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
NiagaraCQ A Scalable Continuous Query System for Internet Databases.
1 NiagaraCQ: A Scalable Continuous Query System for Internet Databases CS561 Presentation Xiaoning Wang.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
Event-Condition-Action Rule Languages over Semistructured Data George Papamarkos.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Database Management 9. course. Execution of queries.
Data Streams and Continuous Query Systems
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
Retrospective computation makes past states available inline with current state in a live system What is the language for retrospective computation? What.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Lecture 24 Query Execution Monday, November 28, 2005.
CS4432: Database Systems II Query Processing- Part 2.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Query Processing CS 405G Introduction to Database Systems.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
CS 540 Database Management Systems
Chapter 13: Query Processing
1 CS122A: Introduction to Data Management Lecture #15: Physical DB Design Instructor: Chen Li.
Chiu Luk CS257 Database Systems Principles Spring 2009
CS 540 Database Management Systems
CS 440 Database Management Systems
Parallel Databases.
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases
CPSC-608 Database Systems
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Join Processing in Database Systems with Large Main Memories (part 2)
Database Applications (15-415) DBMS Internals- Part VII Lecture 19, March 27, 2018 Mohammad Hammoud.
Hash-Based Indexes Chapter 10
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Evaluation of Relational Operations: Other Techniques
CPSC-608 Database Systems
Yan Huang - CSCI5330 Database Implementation – Query Processing
Chapter 11 Instructor: Xin Zhang
Query Processing.
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

NiagaraCQ : A Scalable Continuous Query System for Internet Databases Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD 2000 Talk by S. Sudarshan

Continuous Queries Example Inform me when there is a new publication related to multi-query optimization A broad classification  Change based  Timer based

NiagaraCQ A CQ system for the Internet Continuous Queries on XML data sets Scalable CQ processing Incremental group optimization Handles both change based and timer based queries in a uniform way

NiagaraCQ command language Creating a CQ Create CQ_name XML-QL query Do action { START start_time} { EVERY time_interval} { EXPIRE expiration_time} Delete CQ_name

Expression Signature Query examples Where INTC element_as $g in “ construct $g Where MSFT element_as $g in “ construct $g Expression signatures Quotes.Quote.Symbol in quotes.xml constant =

Query plans Trigger Action ITrigger Action J File Scan Select Symbol = “MSFT” Select Symbol = “INTC” File Scan quotes.xml

Group Group Signature  Common signature of all queries in the group Group constant table Constant_value Dest_buffer INTC Dest. I MSFT Dest. J

The group plan

Incremental Grouping Algo When a new query is submitted If the expression signature of the new query matches that of existing groups Break the query plan into two parts Remove the lower part Add the upper part onto the group plan else create a new group

Query split with materialized intermediate files Why not use a pipeline scheme ?  Split operator may block simple queries  Gives a single complicated execution plan  A large portion of query plan may not need to be executed at each invocation  Does not work for grouping timer based queries Using intermediate files  Cut query plan into 2 parts at split operator  Add a file scan operator to upper part to read intermediate file  Intermediate files are monitored just like other data sources

The query split scheme

Trade-offs Other advantages of materialized intermediate files  Only the necessary queries are executed  Uniform handling of intermediate files and original data source files Disadvantages  Split operator becomes a blocking operator  Extra disk I/Os

Range Predicates E.g. R.a < val or val1 < R.a < val2 Multiple such ranges Problem  Intermediate files may contain duplicate tuples Idea: Virtual intermediate files Use an index to implement this

Incremental grouping of selection predicates Multiple selection predicates in a query CNF for predicates on same data source Incremental grouping  Choose the most selective conjunct and implement virtual file on this conjunct Evaluation of other predicates  Upper levels of continuous query Example query Where ”INTC” $p element_as $g in “quotes.xml”, $p < 100 Construct $g

Incremental grouping of join operators A join query Quotes.Quote.Change_Ratio constant in “quotes.xml” Where $s element_as $g in “quotes.xml”, $s element_as $t in “companies.xml” construct $g, $t

Queries that contain both join and selection Example query : Where $s ”Computer Service” element_as $g in “quotes.xml”, $s element_as $t in “companies.xml” construct $g, $t Where to place the selection operator ?  Below the join Eliminates irrelevant tuples  Above the join Allows sharing Pick based on cost model

Grouping timer-based queries Challenge  Sharing common computation Event List  Stores time events sorted in time order

Incremental evaluation Invoke queries only on changed data For each source file, NiagaraCQ keeps a delta file Also for the intermediate files Time stamp store the each tuple Incremental evaluation of join operators requires complete data files

Memory Caching Thousands of continuous queries can’t fit in memory What should we cache ?  Grouped query plans What about non-grouped queries ?  Favor small delta files  Front part of the event list

System Architecture

CQ processing

Experimental Results Example query : Where ”INTC” element_as $g in “quotes.xml”, construct $g N = number of installed queries F= number of fired queries C = number of tuples modified

Performance Results Case 1: F=N, C=1000 Case 2: F=100, C=1000

Performance Results F=N=2000, vary data size

Thank You

Outline General strategy of incremental group optimization Query split with materialized intermediate files Incremental grouping of selection and join operators System architecture Experimental results

Incremental group optimization General Strategy Why can’t we regroup all queries when a new query is added ? Use of expression signatures for grouping  Same syntax structure  Different constant values