Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Slides:



Advertisements
Similar presentations
Overview of Query Evaluation (contd.) Chapter 12 Ramakrishnan and Gehrke (Sections )
Advertisements

CS 540 Database Management Systems
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
Achieving Adaptivity for OLAP-XML Federations Torben Bach Pedersen Aalborg University Joint work with Dennis Pedersen, TARGIT.
Information Retrieval in Practice
Search Engines and Information Retrieval
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Chapter 8 File organization and Indices.
Scalable Trigger Processing* - Eric N. Hanson et al. CSCi8701: Overview of Database Research Paper Presentation Group 4: Betsy George, Vijay Gandhi *International.
NiagaraCQ A Scalable Continuous Query System for Internet Databases.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Chapter 3 Data Storage and Access Methods Title: Operating Systems Support for Database Management Author: Michael Stonebraker Pages: 217 – 223 Group 01:
1 NiagaraCQ: A Scalable Continuous Query System for Internet Databases CS561 Presentation Xiaoning Wang.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
Chapter 1.3: Data Models and DBMS Architecture Title: Anatomy of a Database System Authors: J. Hellerstein, M. Stonebraker Pages:
An Intelligent Broker Approach to Semantics-based Service Composition Yufeng Zhang National Lab. for Parallel and Distributed Processing Department of.
Chapter 3: Data Storage and Access Methods
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Chapter 4: Transaction Management
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Scalable Trigger Processing G04 Betsy George Vijay Gandhi.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
Overview of Search Engines
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
CSC271 Database Systems Lecture # 30.
Search Engines and Information Retrieval Chapter 1.
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
Database Management 9. course. Execution of queries.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 10Slide 1 Architectural Design l Establishing the overall structure of a software system.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
Architectural Design lecture 10. Topics covered Architectural design decisions System organisation Control styles Reference architectures.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Presentation Template KwangSoo Yang Florida Atlantic University College of Engineering & Computer Science.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 7 SQL HUANG XUEHUA. Chapter Objectives Specification of more general constraints via assertions SQL facilities for defining views (virtual.
Chapter 13: Query Processing
Information Retrieval in Practice
NiagaraCQ : A Scalable Continuous Query System for Internet Databases
Part 3 Design What does design mean in different fields?
Introduction to Query Optimization
Evaluation of Relational Operations: Other Operations
Overview of Query Evaluation
Evaluation of Relational Operations: Other Techniques
Information Retrieval and Web Design
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt, Feng Tian, Yuan Wang

NiagaraCQ: A Scalable Continuous Query System for Internet Databases Problem –Problem Statement - –Why is this problem important? –Why is this problem hard? Approaches –Approach description, key concepts –Contributions (novelty, improved) –Assumptions

Why is this problem important? Transform Internet environment –From passive pages, e.g. Google search Pull by users –Active contents, e.g. triggers Push new events to users Example: Notify me whenever the price of Dell or Micron stock drops by more than 5% and the price of Intel stock remains unchanged over next three month Continuous queries –to deliver the large amounts of frequently changing information. –Question? Is this query change-based or timer-based?

Problem Statement Given –Frequently changing information –A large group of Continuous queries Find: –Method to answer all queries Objectives –Scalability to a very large set of continuous queries Constraints –Many queries are similar (especially in Internet). –Information required by the continuous queries and intermediate results do not fit in memory. –XML dataset, XML QL query language –Queries may be added/removed asynchronously

Why is this problem Hard? Scale of Internet –Support millions of continuous queries –Potentially large number of web users Complex queries –Support a large number of triggers, –Expressed as complex queries, –Against web-resident data sets. Example: Notify me whenever the price of Dell or Micron stock drops by more than 5% and the price of Intel stock remains unchanged over next three month

Novelty of Contribution Related Work –Simple approach: optimizing queries independently –Grouping Approach: Group similar queries Limitations of Previous work –Focused on optimal plan for a small # of similar queries. –Too expensive for large # of continuous queries. –Not designed for web. Contributions –A novel grouping approach for scalability Incremental group optimization strategy w/ dynamic re-grouping New query usually does not require re-grouping –Query-split scheme requires minimal changes to a query engine. –Support change-based & timer-based queries in a uniform way.

Key Ideas - Expression Signature Expression signature: –Mechanism to identify queries sharing monitored data –Same syntax structure, but different constant values across queries Example - Two queries sharing events on stock quotes –Replaces the constants in the predicates with a place holder

Key Ideas - Query Groups Basic Ides –Group queries by expression signatures –Query group signature = union of signature of all queries in a group –Group constant table Keep signature constants for group –Group execution plans Shared by all queries in a group Split operation in group plans –Distribute result tuples to destinations –Using destination buffer name in the tuple of Constant table (Fig. 3.4). –Pros: Reduce number of output buffers –Cons: Split may become a bottleneck High variance of update-rate in a group

Key Ideas – Incremental Group Assignment Incremental Group Optimization –Assign group(s) for a new query –match query signature to group signatures in a bottom-up fashion. –Ex. Consider query in Fig. 3.6 Its plan is in Fig. 3.7 Add lower part to group plan Add upper part to group constant table

Key Ideas - Decomposition, Materialized Int. Files Limitation of Split Operation –Potential Bottleneck –If update rates vary across queries Solution –Write outputs to int. files –Add file scan operator to upper query –Decompose into several sub-queries –Challenge Impact on Query engine To monitor sub-queries inputs Q? How many queries at most can result as a function of –Number of query groups (G) –Number of original user queries (U) Removing split-bottleneck using intermediate files

Support for Change-based & Timer-based Queries Change-based queries are fired –as soon as new data becomes available. Timer-based queries –only periodically executed. –Reduces computation  Make the system more scalable Timer-based queries pose two challenges: –Hard to monitor the timer events of queries. –Sharing the common computation becomes difficult due to various time intervals. NiagaraCQ handles both types of queries uniformly. Implementing Destination Buffers –Pipeline or Materialization –Q? Which is better for timer-based queries?

Other Techniques General selection predicates –range-query) may create intermediate files –containing numerous duplicate tuples –Solution: ‘Virtual intermediate files’ stores a value range. Memory caching is required –to handle intermediate files that do not fit in memory. Example of range query and its expression signature

Prototyping NiagaraCQ System Architecture Validation Methodology - 1

Validation Methodology - 2 Experimental Evaluation

Summary Paper’s focus –A Scalable continuous query system Ideas –Incremental group optimization –Query split for easy implementation –Support for change-based & timer-based queries Contributions –Achieve scalability / easy implementation / grouping timer-based queries. –Allow a very large # of users to register continuous queries in a high- level query language Analytical Validation –Experiments

Assumptions, Rewrite today Assumptions –Many queries tend to be similar in web environment. –Information and intermediate results may not fit in memory. Rewrite today –Include the results of dynamic re-grouping for system deterioration. –More extensive experiments on optimization efficiency.