49221052 施賀傑 69521041 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.

Slides:



Advertisements
Similar presentations
Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams DMSN 2011 Cagri Balkesen & Nesime Tatbul.
Advertisements

Systems and Technology Group © 2006 IBM Corporation Cell Programming Tutorial - JHD24 May 2006 Cell Programming Tutorial Jeff Derby, Senior Technical Staff.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
Online Aggregation Liu Long Aggregation Operations related to aggregating data in DBMS –AVG –SUM –COUNT.
Intro to Threading CS221 – 4/20/09. What we’ll cover today Finish the DOTS program Introduction to threads and multi-threading.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Introduction to SQL Programming Techniques.
Continuously Adaptive Processing of Data and Query Streams Michael Franklin UC Berkeley April 2002 Joint work w/Joe Hellerstein and the Berkeley DB Group.
Fjording the Stream: An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael J. Franklin University of California, Berkeley Proceedings.
Eddies: Continuously Adaptive Query Processing Ron Avnur Joseph M. Hellerstein UC Berkeley.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.
PSoup Kevin Menard CS 561 4/11/2005. Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin.
Operator Placement for In-Network Stream Query Processing.
Interactive query processing partial query results dynamic user control during query execution adaptive query execution interactive data cleaning and transformation.
1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden, Mehul Shah, Joseph Hellerstein, and Vijayshankar Raman Presented by: Bhuvan.
SWiM Panel on Engine Implementation Jennifer Widom.
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
Copyright ©2009 Opher Etzion Event Processing Course Engineering and implementation considerations (related to chapter 10)
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
Performance Issues in Adaptive Query Processing Fred Reiss U.C. Berkeley Database Group.
1 Chapter 13 Embedded Systems Embedded Systems Characteristics of Embedded Operating Systems.
Chapter 14 Instruction Level Parallelism and Superscalar Processors
Eddies: Continuously Adaptive Query Processing Based on a SIGMOD’2002 paper and talk by Avnur and Hellerstein.
1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.
WebQuilt and Mobile Devices: A Web Usability Testing and Analysis Tool for the Mobile Internet Tara Matthews Seattle University April 5, 2001 Faculty Mentor:
MATE: MPLS Adaptive Traffic Engineering Anwar Elwalid, et. al. IEEE INFOCOM 2001.
Visualization By: Simon Luangsisombath. Canonical Visualization  Architectural modeling notations are ways to organize information  Canonical notation.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Copyright 2002 Prentice-Hall, Inc. Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich Chapter 20 Object-Oriented.
Se Over the past decade, there has been an increased interest in providing new environments for teaching children about computer programming. This has.
Master Thesis Defense Jan Fiedler 04/17/98
Data Streams and Continuous Query Systems
The HDF Group Multi-threading in HDF5: Paths Forward Current implementation - Future directions May 30-31, 2012HDF5 Workshop at PSI 1.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.
Motion Planning in Games Mark Overmars Utrecht University.
Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Streaming Queries over Streaming Data Sirish Chandrasekaran (UC Berkeley) Michael J. Franklin (UC Berkeley) Presented by Andy Williamson.
An Example Data Stream Management System: TelegraphCQ INF5100, Autumn 2009 Jarle Søberg.
Integrating Computing Resources on Multiple Grid-enabled Job Scheduling Systems Through a Grid RPC System Yoshihiro Nakajima, Mitsuhisa Sato, Yoshiaki.
1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden SIGMOD 2002 June 4, 2002 With Mehul Shah, Joseph Hellerstein, and Vijayshankar.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Visualization Four groups Design pattern for information visualization
Software Engineering Chapter: Computer Aided Software Engineering 1 Chapter : Computer Aided Software Engineering.
Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
Sesame A generic architecture for storing and querying RDF and RDFs Written by Jeen Broekstra, Arjohn Kampman Summarized by Gihyun Gong.
Danilo Florissi, Yechiam Yemini (YY), Sushil da Silva, Hao Huang Columbia University, New York, NY 10027
Query Optimization for Stream Databases Presented by: Guillermo Cabrera Fall 2008.
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Chapter 20 Object-Oriented Analysis and Design
Programmable Networks
Chi: A Scalable & Programmable Control Plane for Distributed Stream Processing Luo Mai, Kai Zeng, Rahul Potharaju, Le Xu, Steve Suh, Shivaram Venkataraman,
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World
PSoup: A System for streaming queries over streaming data
Eddies for Continuous Queries
Presentation transcript:

施賀傑 何承恩 TelegraphCQ

Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building Blocks Initial CQ Approaches TelegraphCQ Conclusion and Future Work

Introduction TelegraphCQ is an extension to the Telegraph project Handling large streams of continuous queries over high- volume, highly-variable data streams Traditional data processing environment is not suitable for motion data Large scale Unpredictability of the environment Need for close interaction with users

Data Movement Implies Adaptivity Traditional database are inappropriate for dataflow processing Streaming data Pushing, instead of pulling Have to be processed on the fly Continuous Queries (CQ) Queries are continuously active Data initiates access to queries

Data Movement Implies Adaptivity Shared processing Avoid blocking or interrupt dataflow Processing each query individually can be slow and wasteful of resources Queries should have some commonalities Other Sources of Unpredictability Deeply networked environment User may need to adjust the query on the fly based on the previous result

Telegraph - an Ancestor of TelegraphCQ Designed to provide adaptability to individual dataflow graphs Two new prototypes to extend Telegraph to support shared processing over streams CACQ PSoup

Adaptive Building Blocks Telegraph consist a set of modules Module Types Ingress and Caching Query Processing Adaptive Routing

Adaptive Building Blocks

Ingress and Caching Interface with external data sources HTML/XML screen scraper (TeSS) Proxy for fetching data from peer-to-peer networks (TeleNap) Query Processing Routing tuples through query modules on a tuple-by-tuple basis A special type of module known as a State Module (SteM)

Adaptive Building Blocks Adaptive Routing Construct a query plan that contains adaptive routing modules Be able to re-optimize the plan while a query is running Eddy : route data to other query operators Juggle : perform online reordering FLuX : route tupples to support parallelism with load-balancing and fault-tolerance

Eddy Continuously route tuples among a set of other modules according to a routing policy

Eddy Routing policy Naive Eddy: Handle only operators with different costs but equal selectivity Deliver tuples to the two selection equally Fast Eddy: Improve the Naive Eddy with Lottery Scheduling Tuple to operator → costs a “ticket” Operator return a tuple → a “ticket” is debited Benefit: nearly optimal performance with less effort

Eddy Our query

Eddy Suppose s1 and s2 have the same selectivity Set s2 cost 5 delay units

Eddy Suppose s1 and s2 have the same cost Set the selectivity of s2 fixed at 50%

Eddy

SteMs A temporary repository of tuples

Fjords An inter-module communications API Allow query plans to use a mixture of push and pull connections between modules

Initial CQ Approaches CACQ PSoup Limitation of CACQ and PSoup Restricted their processing to data that could fit in memory Did not investigate scheduling and resource management issues for queries with little or no overlap Did not explicitly deal with the notion of QoS for adapting to resource limitation Did not explore opportunities for varying the degree of adaptivity to tradeoff flexibility and overhead

CACQ First continuous query engine to exploit the adaptive query processing framework of Telegraph Modify Eddies to execute multiple queries simultaneously Use grouped filters to optimize selections in the shared execution of the individual queries.

CACQ

PSoup Extend the mechanisms developed in CACQ in two main ways Allow queries to access historical data Support disconnected operation New queries can be applied to old data New data can be applied to old queries Accomplished by creating a query SteM

PSoup

For example: add a new query

PSoup

Exercise: add a new data using PSoup with example step by step R.a=3 R.b=6

PSoup