Maintaining Sliding Widow Skylines on Data Streams.

Slides:



Advertisements
Similar presentations
By Snigdha Rao Parvatneni
Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Join Processing in Databases Systems with Large Main Memories
Outline Introduction Related work on packet classification Grouper Performance Empirical Evaluation Conclusions.
ISAC 教育學術資安資訊分享與分析中心研發專案 The Skyline Operator Stephan B¨orzs¨onyi, Donald Kossmann, Konrad Stocker EDBT
July 29HDMS'08 Caching Dynamic Skyline Queries D. Sacharidis 1, P. Bouros 1, T. Sellis 1,2 1 National Technical University of Athens 2 Institute for Management.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
CS 257 Database Systems Principles Assignment 2 Instructor: Student: Dr. T. Y. Lin Rajan Vyas (119)
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Probabilistic Skyline Operator over sliding Windows Wan Qian HKUST DB Group.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
Continuous Processing of Preference Queries in Data Streams : a Survey
STREAM The Stanford Data Stream Management System.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Kyriakos Mouratidis, Spiridon Bakiras, Dimitris Papadias SIGMOD
SUBSKY: Efficient Computation of Skylines in Subspaces Authors: Yufei Tao, Xiaokui Xiao, and Jian Pei Conference: ICDE 2006 Presenter: Kamiru Superviosr:
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
CSCE Database Systems Chapter 15: Query Execution 1.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Chapter 11 Heap. Overview ● The heap is a special type of binary tree. ● It may be used either as a priority queue or as a tool for sorting.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Hidemoto Nakada, Hirotaka Ogawa and Tomohiro Kudoh National Institute of Advanced Industrial Science and Technology, Umezono, Tsukuba, Ibaraki ,
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Efficient Processing of Top-k Spatial Preference Queries
1University of Texas at Arlington.  Introduction  Motivation  Requirements  Paper’s Contribution.  Related Work  Overview of Ripple Join  Rank.
Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
Online Interval Skyline Queries on Time Series ICDE 2009.
Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.
Bin Jiang, Jian Pei ICDE 2009 Online Interval Skyline Queries on Time Series 1.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
HKU CSIS DB Seminar Skyline Queries HKU CSIS DB Seminar 9 April 2003 Speaker: Eric Lo.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
W4118 Operating Systems Instructor: Junfeng Yang.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Module 11: File Structure
CS522 Advanced database Systems
Updating SF-Tree Speaker: Ho Wai Shing.
Chapter 11: File System Implementation
Chapter 11: File System Implementation
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Chapter 11: File System Implementation
Spatio-Temporal Databases
Xu Zhou Kenli Li Yantao Zhou Keqin Li
ICOM 5016 – Introduction to Database Systems
Uniprocessor scheduling
Heavy Hitters in Streams and Sliding Windows
Chapter 11: File System Implementation
Virtual Memory: Working Sets
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Maintaining Sliding Widow Skylines on Data Streams

Contents  Introduction  Related work  Frameworks For tracking Skylines on Streams  The Lazy Method  The Eager Method  Conclusions 2

Abstract  Existing algorithms are inapplicable to stream applications  Because  1) they assume static data  2) they focus on “one-time” execution that returns a single skyline  3) they aim at reducing the I/O overhead (as opposed to minimizing the CPU-cost and main-memory consumption)  This paper studies skyline computation in stream environments. 3

Introduction  Skyline queries  Retrieval with respect to user preferences.  Techniques developed in traditional databases are inefficient.  Because,,  They do not consider the special characteristics of streams  Such as fast data arrivals, strict limits for response time, etc. 4

Introduction Fig. 1. Skyline examples. (a) Conventional skyline. (b) Stream skyline (W = 5). 5

Introduction  This paper studies skyline computation in stream systems  Consider only the tuples that arrived in a sliding window covering the W most recent timestamps.  W : a system parameter called the window length.  a tuple r is alive during its lifespan  : arrival time.  : expiry time (= ) 6

Introduction  Fig. 1b  (+a, 1), (+b, 3), (-a, 6), (+c, 6), (+d, 7), (-b, 8), (+e, 9), (-c, 9), (-d, 9), (+f, 11)  A pair (+r, t) : implies that point r starts belonging to the skyline at time t.  (-r, t) : indicates the removal of r from the skyline at time t 7

propose  We propose algorithms that utilize the special properties of “stream skylines”  to improve space and time efficiency  by expunging tuples from the system as early as possible.  property  All points dominated by an incoming tuple r can be safely discarded since they are guaranteed not to appear in the skyline in the future.  An arriving tuple r cannot be directly discarded even if it is dominated by some existing tuple r’.  A tuple r can appear in the skyline for at most a single continuous time interval. 8

Related Work  Divide-and-conquer  Block-nested-loop  Sort-first-skyline  Sorts the database according to a preference function  Skyline can be found in another pass over the sorted list.  Bitmap algorithm 9

Frameworks for Tracking Skylines on Streams Fig. 2. The architecture of our system. 10

Fig.2 1. Arriving tuples are placed in an input buffer (BF). 2. processed by the preprocessing module (PM) in ascending order of their arrival times. 3. then included into the database (DB). 4. DB sky and DB rest storing points that are and are not in the current skyline. 5. Whenever a skyline point expires, some points in DBrest may appear in the new skyline.  They are identified by the maintenance module(MM). 6. expunging the obsolete data from DB 7. outputting the skyline stream 11

Frameworks for Tracking Skylines on Streams  “lazy” strategy  Delays most computational work until the expiration of a skyline point.  “eager” approach  Takes advantage of precomputation to minimize memory consumption. 12

TABLE 1.Frequently Used Symbols 13

The Lazy Method  Skyline changes can occur  When  1) a new tuple arrives  2) some skyline point expires.   Lazy handles these two situations in its preprocessing module (L-PM) and maintenance module (L-MM).  Given an arriving tuple r  L-PM checks if it is dominated by any point in DB sky.  If the new tuple r is not dominated by any skyline point, it is added to DB sky.  r may dominate some skyline points, which are expunged from the system. 14

The Lazy Method  define  dominance region r.DR : the area of the data space dominated by r.  “max corner” : having the maximum coordinates on all dimensions.  antidominance region r.ADR : the area where a point dominating r could fall. 15

The Lazy Method Fig. 3. The dominance and antidominance regions of h. 16

The Lazy Method Fig. 4. Preprocessing module of Lazy. 17

The Lazy Method Fig. 5. Maintenance module of Lazy. 18

The Lazy Method  Disadvantage of Lazy  DB rest needs to store obsolete data and tuples that will naver appear in the skyline.   This problem motivates the Eager Method. 19

The Eager method  Eager aims at  1) minimizing the memory consumption. (by keeping only those tuples that are or may become part of the skyline in the future.)  2) reducing the cost of the maintenance module (E-MM). 20

The Eager method Fig. 6. Execution example of Eager (W = 15). (a) Skyline from time 13 to 15. (b) Skyline at time 18. (c) Skyline at time 20. (d) Skyline at time

The Eager method  h.t sky : skyline influence time of h. (=26)  ex) fig.6a,  h can appear in the skyline only after timestamp 26 when all the tuples in {a,c,d,e,f} have expired.  Fig.6b,  Fig. 6c,  Fig. 6d, 22

The Eager method  Eager maintains an event list EL  that contains entries of the form e=.  e.ptr : a pointer to the tuple involved in the event.  e.t : specifies the event time (i.e., when the event will happen)  e.tag : event type.  if the tuple r referenced by e.ptr belongs to the skyline currently, then e.tag = ‘EX’.  ‘EX’ : a keyword indicating the expiry of a point. (e.t = r.t exp )  ‘SK’ : indicating the future inclusion of r in the skyline. (e.t = r.t sky ) 23

The Eager method Fig. 7. Preprocessing module of Eager. 24

The Eager method Fig. 8. Maintenance module of Eager. 25

Practical implementations with r-trees. 26 Fig. 10. A main-memory R-tree on DB.

Conclusions  Future work  “top-k skyline” extracts only the k skyline tuples maximizing a user’s preference function. 27