MONITORING STREAMS: A NEW CLASS OF DATA MANAGEMENT APPLICATIONS DON CARNEY, U Ğ UR ÇETINTEMEL, MITCH CHERNIACK, CHRISTIAN CONVEY, SANGDON LEE, GREG SEIDMAN,

Slides:



Advertisements
Similar presentations
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
MapReduce Online Veli Hasanov Fatih University.
6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
The Design of the Borealis Stream Processing Engine Daniel J. Abadi1, Yanif Ahmad2, Magdalena Balazinska1, Ug ̆ur C ̧ etintemel2, Mitch Cherniack3, Jeong-Hyon.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
03/09/2007CSCI 315 Operating Systems Design1 Memory Management Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
Quality-Of-Service (QoS) Panel Mitch Cherniack Brandeis David Maier OGI Rajeev Motwani Stanford Johannes GehrkeCornell Hari BalakrishnanMIT SWiM, Stanford.
Scalable Distributed Stream System Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Uğur Çetintemel, Ying Xing, and Stan Zdonik Proceedings.
MPDS 2003 San Diego 1 Reducing Execution Overhead in a Data Stream Manager Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack.
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
1 Load Shedding in a Data Stream Manager Slides edited from the original slides of Kevin Hoeschele Anurag Shakti Maskey.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis.
03/05/2008CSCI 315 Operating Systems Design1 Memory Management Notice: The slides for this lecture have been largely based on those accompanying the textbook.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
ElasticTree: Saving Energy in Data Center Networks 許倫愷 2013/5/28.
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Monitoring Streams- A New Class of Data Management Applications Presented by Qing Cao at
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Master’s Thesis (30 credits) By: Morten Lindeberg Supervisors: Vera Goebel and Jarle Søberg Design, Implementation, and Evaluation of Network Monitoring.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
A new model and architecture for data stream management.
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
Aurora – system architecture Pawel Jurczyk. Currently used DB systems Classical DBMS: –Passive repository storing data (HADP – human-active, DBMS- passive.
Data Streams: Lecture 101 Window Aggregates in NiagaraST Kristin Tufte, Jin Li Thanks to the NiagaraST PSU.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Aurora Group 19 : Chu Xuân Tình Trần Nhật Tuấn Huỳnh Thái Tâm Lec: Associate Professor Dr.techn. Dang Tran Khanh A new model and architecture for data.
A new model and architecture for data stream management.
CSCE Database Systems Chapter 15: Query Execution 1.
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 31 Memory Management.
Monitoring Streams -- A New Class of Data Management Applications based on paper and talk by authors below, slightly adapted for CS561: Don Carney Brown.
CS 540 Database Management Systems
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Chapter 8: Memory Management. 8.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 8: Memory Management Background Swapping Contiguous.
Advanced Database Management System
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Memory Management.
Data Structures Using C, 2e
CS 440 Database Management Systems
Applying Control Theory to Stream Processing Systems
Load Shedding CS240B notes.
Moirae: History-Enhanced Monitoring
Chapter 12: Query Processing
An overview of Data Streaming
Advanced Database Management System
Data Stream Management System (DSMS)
Presenter Kyungho Jeon 11/17/2018.
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Memory Management-I 1.
Main Memory Background Swapping Contiguous Allocation Paging
Advanced Database Management System
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Lecture Topics: 11/1 General Operating System Concepts Processes
Overview of Query Evaluation
Load Shedding CS240B notes.
Presentation transcript:

MONITORING STREAMS: A NEW CLASS OF DATA MANAGEMENT APPLICATIONS DON CARNEY, U Ğ UR ÇETINTEMEL, MITCH CHERNIACK, CHRISTIAN CONVEY, SANGDON LEE, GREG SEIDMAN, MICHAEL STONEBRAKER, NESIME TATBUL, STAN ZDONIK VLDB 2002: Advanced Database Management System Includes some slides by: Original authors (web.cs.wpi.edu/~cs561/s03/talks/aurora.ppt)web.cs.wpi.edu/~cs561/s03/talks/aurora.ppt Joydip Datta Debarghya Majumdar YongChul Kwon ( Jong-Won Roh ( Presented by: Vinit Deodhar Ajay Gupta Under the guidance of: Prof. S. Sudarshan 1

Examples of monitoring systems 2 Monitor continuous data streams, detect abnormal activity,and alert users those situations  Patient monitoring  Aircraft Safety monitoring  Stock monitoring  Intrusion detection systems

Introduction 3 Characteristics of monitoring applications: 1.They operate on a stream of data. 2.Real time operation expected.

Monitoring Applications  Concept  Monitor continuous data streams, detect abnormal activity, and alert users those situations  Data Stream  Continuous,  Unbounded,  Rapid,  May contain missing, out of order values  Occurs in a variety of modern applications 4

Comparison with traditional database sytems 5 ParameterTraditional DBMSMonitoring systems DBMS ModelHuman active DBMS passiveDBMS active human passive Data stateCurrent dataTime series data TriggersNot scalableTriggers must be scalable ResultsExactApproximate OperationDoes not require real time service Requires real time service

Aurora  This paper describes a new prototype system, Aurora, which is designed to better support monitoring applications  Stream data  Continuous Queries  Historical Data requirements  Imprecise data  Real-time requirement 6

Outline of talk 7 1. Aurora system model 2. Query model 3. Optimizations

Aurora Overall System Model 8/15 External data source User application Operator boxes data flow (collection of stream) Query spec Historical Storage Aurora System QoS spec Query spec Application administrator 8

Representation of Stream  Aurora stream tuple: (TS=ts, A 1 =v 1, A 2 =v 2.. A n =v n ) 9

Types of operators Continuous:-operate on single tuples 2. Windowed:- operate on a set of continuous tuples 1.)Slide 2.)Tumble 3.)Latch

Types of operators Continuous:-operate on single tuples 2. Windowed:- operate on a set of continuous tuples 1.)Slide 2.)Tumble 3.)Latch

Operators 12 Filter --Drop Algebraic notation: Filter(P1,..., Pm)(S) Price > 1000 Input stream Output stream

Operators 13 Map Applies a function to every tuple in the stream Map(TS = t.TS,B1 = F1(t),..., Bm = Fm(t))(S)

Operators 14 1.Union Combine tuples from all input streams Union (S1,..., Sn) 2.Resample

Operators 15 Group By Partitioning of input stream into multiple output streams Partitioning condition

Operators 16 Aggregate Aggregate (F, Assuming O, Size s, Advance i) (S) Bsort Approximate sorting operator

Aggregate 17

Join 18  Join is a binary join operator that takes the form  Join (P, Size s, Left Assuming O1, Right Assuming O2)(S1, S2)

Aurora Query model 19 Queries can be classified as: 1.Continuous queries 2.Views 3.Ad hoc queries

Aurora Query Model (cntd.)  Continuous queries: Continuously monitors input stream b1b2b3 app QoS spec continuous query Connection point data input Picture Courtesy: Reference [2] 20

Aurora Query model: Views b1b2b3 b4 b5b6 app QoS spec continuous query view Connection point data input Picture Courtesy: Reference [2] 21

Aurora Query model: Ad-hoc queries 22 b1b2b3 b4 b5b6 b7 b8b9 app QoS spec continuous query view ad-hoc query Connection point data input Picture Courtesy: Reference [2] 3 days

Aurora GUI 23 Operator Boxes

QoS specifications 24  Done by administrator  A multi dimensional function specified as a set of 2D functions

System Architecture 25

26 Optimization Optimization issues Large no. of small box operations Dependencies among Box operations High Data rates Architecture changes to the system Extra – Not taught during class lectures

27 Optimization Optimizations Dynamic Continuous Query Optimization Inserting projections Combining Boxes Reordering Boxes Adhoc Query Optimization Scheduling related optimizations

Dynamic Continuous Query Optimizations  Optimize local sub-networks 1. Inserting projections 2. Combining Boxes: Combining reduces box execution overhead Eg : two filters into one etc 3. Re-ordering boxes: 28 B1B2 B1 C(b) – time taken by b to process 1 tuple S(b) – Selectivity of box b Total Cost(B1-B2) = TC1 = C(B1) + S(B1) * C(B2) Total Cost(B2-B1) = TC2 = C(B2) + S(B2) * C(B1)

Optimizing Ad-hoc queries  Two separate copies of sub-networks for the ad-hoc query is created  COPY#1: works on historical data  COPY#2: works on current data  COPY#1  is run first  Storage has B+ Tree implementation  Utilize B+ tree index if possible  Index look-up for filter, appropriate join algorithms  Subsequent steps optimized as before  COPY#2 is optimized as before 29

Real Time Scheduling(RTS) Scheduler selects which box to execute next Scheduling decision depends upon QoS information End to End processing cost should also be considered (eg. Secondary storage accesses) Aurora scheduling considers both 30 Extra – Not taught during class lectures

Real Time Scheduling(RTS) Non-linearity :- Output tuple rate not always proportional to input tuple rate. 31 Inter-box non-linearity (Superbox scheduling) Less buffer space  thrashing Bypass storage manager with advanced scheduling Intra-box non-linearity (Train scheduling) Tuple processing cost could decrease if more tuples are available for processing (similar to batch processing) Reduced context switches

RTS by Optimizing overall processing cost(contd.) Picture Courtesy: Reference [2] 32 Extra – Taken from journal version of the paper

RTS by Optimizing QoS: Priority Assignment Latency = Processing delay + waiting delay Train scheduling considers the Processing Delay Waiting delay is a function of scheduling Give priority to tuple while scheduling to improve QoS Two approaches to assign priority a state-based approach feedback-based approach 33 Extra – Not taught during class lectures

Different priority assignment approach High Priority  Less Waiting Delay State-based approach (importance based) assigns priorities to outputs based on their expected utility Utility  How much QoS is sacrificed if execution is deferred Selects the output with max utility Feedback-based approach (performance based) Monitor performance Increase priority of application which are not doing well Decrease priority of application in good zone 34

Load Shedding  Systems have a limit to how much fast data can be processed  Load shedding discards some data so that the system can flow  Drop box are used to discard data  Different from networking load shedding  Data has semantic value in DSMS  QoS can be used to find the best stream to drop  Application code can handle missing tuples 35

Load Shedding 36  Static  For detecting insufficient h/w resources  Dynamic  For detecting long spikes in input rates

Detecting Load Shedding: Static Analysis  Detects hardware insufficiencies  Condition for overload  C X H < min_cap C=capacity of Aurora system H=Headroom factor, % of sys resources that can be used at a steady state min_cap=minimum aggregate computational capacity required  min_cap is calculated using input data rate and selectivity of the operator 37 B s(B) B s(B) Input data ( Rate r(B) )Output data ( 1/c(B)) Extra – Not explained during lecture

Detecting Load Shedding: Dynamic Analysis The system can have sufficient resources but low QoS (eg. Due to long spikes in input rates) Uses delay based QoS information Timstamp tuples at input, check performance at output If enough output is outside of good zone it indicates overload 38 Extra – Not explained during lecture

Static Load Shedding by dropping tuples We will prefer to drop tuples that result in minimum compromise on QoS. Find output graph resulting in minimum QoS drop Drop tuples (Drop Box) Recalculate resources. If insufficient, repeat process Get rid of tuples as early as possible Move the drop-box as close to the data source or connection point 39

Static Load Shedding by dropping tuples Considers the drop based QoS graph Step1: Finds the output and amount of tuple drop which would results in minimum overall QoS drop Step 2: Insert drop box in appropriate place and drop tuples randomly Step3: Re-calculate the amount of system resources. If System resource is not sufficient repeat the process 40 Extra – Not explained during lecture

Dynamic Load Shedding by dropping tuples Delay based QoS graph is considered Drop tuples beyond delay thresholds Selects output which has QoS not in good zone Insert drop box Repeat if required Issues No consideration for tuple importance. Have to consider load at the queue while placing drop boxes. 41

Semantic Load shedding by filtering tuples Previous method drops packet randomly at strategic point Some tuple may be more important than other Consult value based QoS information before dropping a tuple Drop tuple based on QoS value and frequency of the value 42

 Hospital - Network Stream of free doctors locations Stream of untreated patients locations, their condition (dyeing, critical, injured, barely injured) Output: match a patient with doctors within a certain distance Semantic Load shedding example Join Doctors PatientsDoctors who can work on a patient 43

Load shedding based on condition Most critical patients get treated first Filter added before the Join Semantic Load shedding example Join Doctors Patients Doctors who can work on a patient Too much Load Drop barely Injured tuples Drop barely Injured tuples 44

Conclusion  Aurora is a Data Stream Management System for Monitoring Systems. It provides:  Continuous and Ad-hoc Queries on Data streams  Historical Data of a predefined duration is stored  Box and arrow style query specification  Supports Imprecise data through Slack specification  Real-time requirement is supported by Dynamic Load- shedding  Aurora runs on Single Computer  Borealis [3] is a distributed data stream management system 45

References [1] D. Carney et al., “Monitoring streams: a new class of data management applications,” Proceedings of the 28th international conference on Very Large Data Bases, p. 215–226, [2] D. J. Abadi et al., “Aurora: a new model and architecture for data stream management,” The VLDB Journal The International Journal on Very Large Data Bases, vol. 12, no. 2, pp , [3] D. J. Abadi et al., others, “The design of the borealis stream processing engine,” in Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), Asilomar, CA, 2005, p. 277–

Storage manager Queue management Queue is present in main memory and disk. Storage manager implements a replacement policy to select which tuples to keep in memory Extra – Taken from journal version of the paper

Storage manager Connection point management Connection point stores historical data  Data organized as a B Tree  Default key is time stamp  Insertions to B Tree are done in batched  Periodic passes deletes tuples older than the history requirement. Extra – Taken from journal version of the paper