Presentation is loading. Please wait.

Presentation is loading. Please wait.

Maintaining Sliding Widow Skylines on Data Streams.

Similar presentations


Presentation on theme: "Maintaining Sliding Widow Skylines on Data Streams."— Presentation transcript:

1 Maintaining Sliding Widow Skylines on Data Streams

2 Contents  Introduction  Related work  Frameworks For tracking Skylines on Streams  The Lazy Method  The Eager Method  Conclusions 2

3 Abstract  Existing algorithms are inapplicable to stream applications  Because  1) they assume static data  2) they focus on “one-time” execution that returns a single skyline  3) they aim at reducing the I/O overhead (as opposed to minimizing the CPU-cost and main-memory consumption)  This paper studies skyline computation in stream environments. 3

4 Introduction  Skyline queries  Retrieval with respect to user preferences.  Techniques developed in traditional databases are inefficient.  Because,,  They do not consider the special characteristics of streams  Such as fast data arrivals, strict limits for response time, etc. 4

5 Introduction Fig. 1. Skyline examples. (a) Conventional skyline. (b) Stream skyline (W = 5). 5

6 Introduction  This paper studies skyline computation in stream systems  Consider only the tuples that arrived in a sliding window covering the W most recent timestamps.  W : a system parameter called the window length.  a tuple r is alive during its lifespan  : arrival time.  : expiry time (= ) 6

7 Introduction  Fig. 1b  (+a, 1), (+b, 3), (-a, 6), (+c, 6), (+d, 7), (-b, 8), (+e, 9), (-c, 9), (-d, 9), (+f, 11)  A pair (+r, t) : implies that point r starts belonging to the skyline at time t.  (-r, t) : indicates the removal of r from the skyline at time t 7

8 propose  We propose algorithms that utilize the special properties of “stream skylines”  to improve space and time efficiency  by expunging tuples from the system as early as possible.  property  All points dominated by an incoming tuple r can be safely discarded since they are guaranteed not to appear in the skyline in the future.  An arriving tuple r cannot be directly discarded even if it is dominated by some existing tuple r’.  A tuple r can appear in the skyline for at most a single continuous time interval. 8

9 Related Work  Divide-and-conquer  Block-nested-loop  Sort-first-skyline  Sorts the database according to a preference function  Skyline can be found in another pass over the sorted list.  Bitmap algorithm 9

10 Frameworks for Tracking Skylines on Streams Fig. 2. The architecture of our system. 10

11 Fig.2 1. Arriving tuples are placed in an input buffer (BF). 2. processed by the preprocessing module (PM) in ascending order of their arrival times. 3. then included into the database (DB). 4. DB sky and DB rest storing points that are and are not in the current skyline. 5. Whenever a skyline point expires, some points in DBrest may appear in the new skyline.  They are identified by the maintenance module(MM). 6. expunging the obsolete data from DB 7. outputting the skyline stream 11

12 Frameworks for Tracking Skylines on Streams  “lazy” strategy  Delays most computational work until the expiration of a skyline point.  “eager” approach  Takes advantage of precomputation to minimize memory consumption. 12

13 TABLE 1.Frequently Used Symbols 13

14 The Lazy Method  Skyline changes can occur  When  1) a new tuple arrives  2) some skyline point expires.   Lazy handles these two situations in its preprocessing module (L-PM) and maintenance module (L-MM).  Given an arriving tuple r  L-PM checks if it is dominated by any point in DB sky.  If the new tuple r is not dominated by any skyline point, it is added to DB sky.  r may dominate some skyline points, which are expunged from the system. 14

15 The Lazy Method  define  dominance region r.DR : the area of the data space dominated by r.  “max corner” : having the maximum coordinates on all dimensions.  antidominance region r.ADR : the area where a point dominating r could fall. 15

16 The Lazy Method Fig. 3. The dominance and antidominance regions of h. 16

17 The Lazy Method Fig. 4. Preprocessing module of Lazy. 17

18 The Lazy Method Fig. 5. Maintenance module of Lazy. 18

19 The Lazy Method  Disadvantage of Lazy  DB rest needs to store obsolete data and tuples that will naver appear in the skyline.   This problem motivates the Eager Method. 19

20 The Eager method  Eager aims at  1) minimizing the memory consumption. (by keeping only those tuples that are or may become part of the skyline in the future.)  2) reducing the cost of the maintenance module (E-MM). 20

21 The Eager method Fig. 6. Execution example of Eager (W = 15). (a) Skyline from time 13 to 15. (b) Skyline at time 18. (c) Skyline at time 20. (d) Skyline at time

22 The Eager method  h.t sky : skyline influence time of h. (=26)  ex) fig.6a,  h can appear in the skyline only after timestamp 26 when all the tuples in {a,c,d,e,f} have expired.  Fig.6b,  Fig. 6c,  Fig. 6d, 22

23 The Eager method  Eager maintains an event list EL  that contains entries of the form e=.  e.ptr : a pointer to the tuple involved in the event.  e.t : specifies the event time (i.e., when the event will happen)  e.tag : event type.  if the tuple r referenced by e.ptr belongs to the skyline currently, then e.tag = ‘EX’.  ‘EX’ : a keyword indicating the expiry of a point. (e.t = r.t exp )  ‘SK’ : indicating the future inclusion of r in the skyline. (e.t = r.t sky ) 23

24 The Eager method Fig. 7. Preprocessing module of Eager. 24

25 The Eager method Fig. 8. Maintenance module of Eager. 25

26 Practical implementations with r-trees. 26 Fig. 10. A main-memory R-tree on DB.

27 Conclusions  Future work  “top-k skyline” extracts only the k skyline tuples maximizing a user’s preference function. 27


Download ppt "Maintaining Sliding Widow Skylines on Data Streams."

Similar presentations


Ads by Google