Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Similar presentations


Presentation on theme: "Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,"— Presentation transcript:

1 Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt, Feng Tian, Yuan Wang

2 NiagaraCQ: A Scalable Continuous Query System for Internet Databases Problem –Problem Statement - –Why is this problem important? –Why is this problem hard? Approaches –Approach description, key concepts –Contributions (novelty, improved) –Assumptions

3 Why is this problem important? Transform Internet environment –From passive pages, e.g. Google search Pull by users –Active contents, e.g. triggers Push new events to users Example: Notify me whenever the price of Dell or Micron stock drops by more than 5% and the price of Intel stock remains unchanged over next three month Continuous queries –to deliver the large amounts of frequently changing information. –Question? Is this query change-based or timer-based?

4 Problem Statement Given –Frequently changing information –A large group of Continuous queries Find: –Method to answer all queries Objectives –Scalability to a very large set of continuous queries Constraints –Many queries are similar (especially in Internet). –Information required by the continuous queries and intermediate results do not fit in memory. –XML dataset, XML QL query language –Queries may be added/removed asynchronously

5 Why is this problem Hard? Scale of Internet –Support millions of continuous queries –Potentially large number of web users Complex queries –Support a large number of triggers, –Expressed as complex queries, –Against web-resident data sets. Example: Notify me whenever the price of Dell or Micron stock drops by more than 5% and the price of Intel stock remains unchanged over next three month

6 Novelty of Contribution Related Work –Simple approach: optimizing queries independently –Grouping Approach: Group similar queries Limitations of Previous work –Focused on optimal plan for a small # of similar queries. –Too expensive for large # of continuous queries. –Not designed for web. Contributions –A novel grouping approach for scalability Incremental group optimization strategy w/ dynamic re-grouping New query usually does not require re-grouping –Query-split scheme requires minimal changes to a query engine. –Support change-based & timer-based queries in a uniform way.

7 Key Ideas - Expression Signature Expression signature: –Mechanism to identify queries sharing monitored data –Same syntax structure, but different constant values across queries Example - Two queries sharing events on stock quotes –Replaces the constants in the predicates with a place holder

8 Key Ideas - Query Groups Basic Ides –Group queries by expression signatures –Query group signature = union of signature of all queries in a group –Group constant table Keep signature constants for group –Group execution plans Shared by all queries in a group Split operation in group plans –Distribute result tuples to destinations –Using destination buffer name in the tuple of Constant table (Fig. 3.4). –Pros: Reduce number of output buffers –Cons: Split may become a bottleneck High variance of update-rate in a group

9 Key Ideas – Incremental Group Assignment Incremental Group Optimization –Assign group(s) for a new query –match query signature to group signatures in a bottom-up fashion. –Ex. Consider query in Fig. 3.6 Its plan is in Fig. 3.7 Add lower part to group plan Add upper part to group constant table

10 Key Ideas - Decomposition, Materialized Int. Files Limitation of Split Operation –Potential Bottleneck –If update rates vary across queries Solution –Write outputs to int. files –Add file scan operator to upper query –Decompose into several sub-queries –Challenge Impact on Query engine To monitor sub-queries inputs Q? How many queries at most can result as a function of –Number of query groups (G) –Number of original user queries (U) Removing split-bottleneck using intermediate files

11 Support for Change-based & Timer-based Queries Change-based queries are fired –as soon as new data becomes available. Timer-based queries –only periodically executed. –Reduces computation  Make the system more scalable Timer-based queries pose two challenges: –Hard to monitor the timer events of queries. –Sharing the common computation becomes difficult due to various time intervals. NiagaraCQ handles both types of queries uniformly. Implementing Destination Buffers –Pipeline or Materialization –Q? Which is better for timer-based queries?

12 Other Techniques General selection predicates –range-query) may create intermediate files –containing numerous duplicate tuples –Solution: ‘Virtual intermediate files’ stores a value range. Memory caching is required –to handle intermediate files that do not fit in memory. Example of range query and its expression signature

13 Prototyping NiagaraCQ System Architecture Validation Methodology - 1

14 Validation Methodology - 2 Experimental Evaluation

15 Summary Paper’s focus –A Scalable continuous query system Ideas –Incremental group optimization –Query split for easy implementation –Support for change-based & timer-based queries Contributions –Achieve scalability / easy implementation / grouping timer-based queries. –Allow a very large # of users to register continuous queries in a high- level query language Analytical Validation –Experiments

16 Assumptions, Rewrite today Assumptions –Many queries tend to be similar in web environment. –Information and intermediate results may not fit in memory. Rewrite today –Include the results of dynamic re-grouping for system deterioration. –More extensive experiments on optimization efficiency.


Download ppt "Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,"

Similar presentations


Ads by Google