Adaptive Query Processing (Background)

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
SDN + Storage.
1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Introduction CSCI 444/544 Operating Systems Fall 2008.
PZ13B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ13B - Client server computing Programming Language.
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.
1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
SWiM Panel on Engine Implementation Jennifer Widom.
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.
Chain: Operator Scheduling for Memory Minimization in Data Stream Systems Authors: Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani (Dept.
CS533 - Concepts of Operating Systems
Computer Organization and Architecture
Elke A. Rundensteiner Database Systems Research Group Office: Fuller 238 Phone: Ext. – 5815 WebPages:
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 DCAPE: Distributed and Self-Tuned Continuous Query Processing Tim Sutherland,Bin Liu,Mariana Jbantova, and Elke A. Rundensteiner Department of Computer.
SWIM 1/9/20031 QoS in Data Stream Systems Rajeev Motwani Stanford University.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
W. Hong & S. Madden – Implementation and Research Issues in Query Processing for Wireless Sensor Networks, ICDE 2004.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
OPERATING SYSTEMS CS 3502 Fall 2017
Efficient Evaluation of XQuery over Streaming Data
Lecture 1: Operating System Services
Processes and threads.
15.1 – Introduction to physical-Query-plan operators
Process Management Process Concept Why only the global variables?
Chapter 3: Process Concept
Advanced Computer Networks
Parallel Databases.
Applying Control Theory to Stream Processing Systems
Operating Systems (CS 340 D)
CIS, University of Delaware
Database Performance Tuning and Query Optimization
Operating Systems (CS 340 D)
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Query Execution Presented by Khadke, Suvarna CS 257
Multimedia Data Stream Management System
Processor Management Damian Gordon.
Internet Control Message Protocol Version 4 (ICMPv4)
Database Query Execution
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Smita Vijayakumar Qian Zhu Gagan Agrawal
Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
CS 501: Software Engineering Fall 1999
Threads Chapter 4.
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Chapter 4 Threads, SMP, and Microkernels
Chapter 11 Database Performance Tuning and Query Optimization
Uniprocessor scheduling
Control Theory in Log Processing Systems
Canonical Computation without Canonical Data Structure
Processor Management Damian Gordon.
Presentation transcript:

Adaptive Query Processing (Background) Advisor: Elke A. Rundensteiner Luping Ding Brad Pielech 5/21/2019 DSRG TALK

Contents Motivation Issues to consider when building adaptive query system Category of adaptivity and related issues Related work Our initial ideas thus far (to be continued…) 5/21/2019 DSRG TALK

Motivation New environment and applications Characteristics Internet and web-based query system Sample applications Network monitoring system Financial applications: stock trading, … Characteristics Distributed, heterogeneous, autonomous data sources Un-predictable, variable data volume and transfer rate 5/21/2019 DSRG TALK

Adaptive Query Processor … XML View DS1 DS2 DSn User Query Adaptive Query Processor N S  J T 5/21/2019 DSRG TALK

Motivation II Requirements Ability to process streaming data using non-blocking operators Dynamic inter- and intra- operator scheduling to adapt to data transfer rate Sharing and re-use of sub-plan across multiple queries The ability to output partial/approximate results according to user preferences (discussed later) 5/21/2019 DSRG TALK

Traditional vs. Adaptive Ready data One-time query Blocking operators Query optimization before execution Exact answer Streaming data may be continuous query Non-blocking operators Query optimization before and during execution Partial/approximate answer 5/21/2019 DSRG TALK

Challenges and Possible Solutions The data arrive at a very high speed Sample data and compute approximate answer Un-predictable change of data transfer rate due to sources drying up or network congestion Interleave query execution and optimization to rework the query plan to minimize execution downtime Blocking operators appear in query plan caused by GroupBy, OrderBy, and Join clauses Implement non-blocking alternatives for blocking operators Unbounded or huge data streams need unbounded or huge intermediate storage Compute approximate answer Switch between memory and disk 5/21/2019 DSRG TALK

Contents Motivation Issues to consider when building adaptive system Category of adaptivity and related issues Related work 5/21/2019 DSRG TALK

General Issues I Decide granularity of stream data Each token Individual Element Decided by XPath specified by query 5/21/2019 DSRG TALK

for $b in document(“bib.xml")/bib/book return <result> { $b/title } { $b/author } </result> <bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author>W. Stevens</author> <price> 65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author>Serge Abiteboul</author> <author>Peter Buneman</author> <author>Dan Suciu</author> <price> 39.95</price> 5/21/2019 DSRG TALK

General Issues II Give order-sensitive result Assign unique ID for each data unit (sequence number or timestamp) Each algebra node keeps order of the data Each algebra node doesn’t keep order, but the top node do sorting 5/21/2019 DSRG TALK

General Issues III Generate approximate results Answers to aggregate queries may change based on new tuples and thus the results are approximate Generate partial results New tuples will not change the validity of existing results Both require non-blocking operator implementations to provide the answer so far 5/21/2019 DSRG TALK

* * * * * P * * * * * * * * * P * * * * General Issues IV Compute statistics Data arrive speed Selectivity of operator Execution cost of operator Introduce control message for synchronization Within algebra node Along with data stream * * * * * P * * * * * * * * * P * * * * 5/21/2019 DSRG TALK

General Issues V Design mechanisms for query plan re-optimization When to re-optimize Action-event rule (Tukwila) Signal in the stream (Niagara) How to re-optimize Reorder joins based on statistics Possibly find other sources to obtain data from slow sources 5/21/2019 DSRG TALK

Contents Motivation Issues to consider when building adaptive system Category of adaptivity and related issues Related work Our Initial Ideas Thus Far (to be continued…) 5/21/2019 DSRG TALK

Categories of Adaptively An adaptive system can be adaptive on many different levels including: Batch: adapt query plans after X unit of time Per query: adapt after every query Inter-operator: adapt after several operators Intra-operator: adapt within an operator Per tuple: adapt after one or more tuples 5/21/2019 DSRG TALK

Per Query Adaptivity Illustration XML View Data Sources N S  J T Adapt after every query has been executed Sharing execution of common sub expressions between similar queries Reusing of optimized sub-plans 5/21/2019 DSRG TALK

Inter-Operator Adaptivity Illustration Adapt after one or more operators have been executed XML View Data Sources N S  J T Modify query execution plans on-the-fly when delays are encountered during runtime Operator scheduling for CPU and memory allocation Alternative source selecting 5/21/2019 DSRG TALK

J Intra-Operator Adaptivity Illustration Adapt during the execution of one operator J J  N S N N  S S Change execution of one operator to another semantically correct implementation Input stream scheduling XML View Data Sources 5/21/2019 DSRG TALK

J J Per Tuple Adaptivity Illustration Adapt some operator’s execution on a tuple by tuple basis T J J Each tuple can be routed to a different join in the query plan so that each join is busy at all times Uses timestamp to keep track of which tuples have run through which joins Tuple Router N  S S N N  S XML View Data Sources 5/21/2019 DSRG TALK

Contents Motivation Issues to consider when building adaptive system Category of adaptivity and related issues Related work 5/21/2019 DSRG TALK

Related Work Tukwila project at U. of Washington Pure XML AQP through the integration of query planning and execution Optimizes for time-to-first tuple first, then for the whole result later Dynamic scheduling of operators to adjust to I/O delays and flow rates Breaks query into execution groups or fragments and can re-optimize plan after each group has been executed Uses event-condition-action rules to determine if re-optimization should take place 5/21/2019 DSRG TALK

Related Work II Havasu project at Arizona State U. User preference driven query optimization Niagara project at U. of Wisconsin User doesn’t have to specify the sources for a query Allows user to “give me results so far” even in the presence of aggregation operators MIX system at San Diego State Information integration system using XML as the intermediate data model Lazy navigation into the result controlled by the user Doesn’t adapt query plan during execution 5/21/2019 DSRG TALK

Related Work III Aurora project at Brown/MIT/Brandeis Telegraph project at UC Berkeley Stream project at Stanford Univ. 5/21/2019 DSRG TALK

To be continued… 5/21/2019 DSRG TALK