Download presentation

Presentation is loading. Please wait.

Published byPreston Kelly Modified over 2 years ago

1
Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Jun Xu 1 Georgia Tech VLDB 2009

2
In one sentence …. 2

3
We develop a streaming algorithm 3

4
We develop a streaming algorithm for skyline problem 4

5
We develop a streaming algorithm for skyline problem with near-optimal worst-case guarantee. 5

6
6

7
HotelPriceDistance Athena$972.9 km Park & Suites$ km Hotel du Helder$763.8 km de la Cité Concorde$ km Mercure Carlton Lyon$ km I want a cheap hotel nearby 7

8
HotelPriceDistance Athena$972.9 km Park & Suites$ km Hotel du Helder$763.8 km de la Cité Concorde$ km Mercure Carlton Lyon$ km I want a cheap hotel nearby dominates 8

9
HotelPriceDistance Athena$972.9 km Park & Suites$ km Hotel du Helder$763.8 km de la Cité Concorde$ km Mercure Carlton Lyon$ km I want a cheap hotel nearby dominates 9

10
Price Distance de la Cite Park & Suites du Helder Athena Mercure 10

11
Price Distance de la Cite Park & Suites du Helder Athena Mercure 11

12
Problem definition Given distinct d-dimensional points (a 1, …, a d ) dominates (b 1, …, b d ) if a i b i for all i and a i < b i for some i Skyline = set of undominated points dominates Skyline = { (1, 3), (3, 2) } (5,2) (1,3) (3,2) Example (1, 3), (5, 2), (3, 2) 12

13
Skyline algorithms RAM Disk (External) PreprocessingNon-preprocessing BBS Papadias et al. SIGMOD03 NN Kossman et al. VLDB02 13 DD&C Kung et al. FOCS 75 LD&C Bently et al. JACM78, FLET Bently et al. SODA90, SD&C Borzsonyi et al. ICDE01, BNL Borzsonyi et al. ICDE01, SFS Chomicki et al. ICDE03, LESS Godfrey et al. VLDB05

14
Our Goal Non-preprocessing external algorithm with worst-case guarantee What is the model of external algorithms? 14

15
CPU process I/O Sequental I/O Random I/O CPU process I/O Sequental I/O Random I/O Multi-pass Streaming Model 15 # of random I/Os = # of passes Streaming model naturally forces us to minimize the number of random I/Os

16
16

17
17 (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM Huge Harddisk

18
18 (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM Huge Harddisk

19
19 (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM Huge Harddisk

20
20 (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM Huge Harddisk

21
21 (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM Huge Harddisk 2 nd pass

22
22 (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) Small RAM Huge Harddisk 3 rd pass

23
Our Goal Non-preprocessing external algorithm with worst-case guarantee streaming 23

24
Theory RAND: Almost optimal multi-pass streaming algorithm for skyline O(log n) passes & O(m) space O(log n) passes & O(m) space n = # of points and m = skyline size 1 pass needs Ω(n) space RAND uses O(log n) passes & O(m) space Every algorithm that uses 1 pass needs Ω(n) space Next: RAND algorithm Later: Experimental result 24

25
RAND algorithm 25

26
Algorithms: Main Idea Suppose m is known. Theorem: In 3 passes and m space, we can find skyline points that dominate at least n/2 points, with high probability 26

27
Eliminate-Points algorithm 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4) 27

28
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 28

29
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 29

30
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4) (3, 4) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 30

31
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 31

32
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 32

33
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 33

34
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 34

35
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 35

36
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 36

37
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 37

38
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 38

39
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 39

40
Eliminate-Points algorithm (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) (3, 3) 1. Sample x=2m ln(mn log n) points p 1, p 2, …, p x 2. Go through the stream, Replace each p i by a point dominating it 3. For each p i, delete p i and all points it dominates Output p 1, p 2, …, p x and repeat 40

41
Analysis Theorem: Eliminate-Points algorithm deletes at least n/2 points with high probability 41

42
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 42

43
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 43

44
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 44 Note: There will be m trees, each rooted by a skyline point

45
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 45

46
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 (3, 3) 46

47
4, 4 Analysis Claim: The tree that some element is sampled will be deleted (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) 1, 5 3, 3 3, 4 4, 3 4, 5 (3, 3) 47

48
Analysis There are m trees, each rooted by a skyline point 48 12mm-1

49
Analysis There are m trees, each rooted by a skyline point 49 12mm-1

50
Analysis Big tree has bigger chance of being sampled … and deleted 50 12mm-1

51
Analysis If enough points are sampled, every tree that is big enough will be deleted 51 12mm-1

52
Analysis Lemma: With high probability, all trees of size n/(2m) are deleted We delete n/2 points in total 52 12mm-1

53
Extending to RAND Recall: If we know m then we can delete n/2 points in 3 passes If m is known, we can find skyline in O(log n) passes with high probability – We delete n/2 points every 3 passes m is not known – Guess m by doubling trick – Additional O(log m) passes Fixed-window case – Memory space is limited Random I/Os, Sequential I/Os and Number of comparisons have to be analyzed separately 53

54
Theory RAND: Almost optimal multi-pass streaming algorithm for skyline O(log n) passes & O(m) space O(log n) passes & O(m) space n = # of points and m = skyline size 1 pass needs Ω(n) space RAND uses O(log n) passes & O(m) space Every algorithm that uses 1 pass needs Ω(n) space 54

55
Theory RAND: Almost optimal multi-pass streaming algorithm for skyline O(log n) passes & O(m) space O(log n) passes & O(m) space n = # of points and m = skyline size 1 pass needs Ω(n) space Algorithms comparison w = window (memory) size AlgorithmRandom I/OsSequential I/OsComparisons BNL(w) (min{w, n/w}) (min{w, n 2 /w}) (dmin{wmn, n 2 }) LESS(w) (n log w (n/w)) (mn/w) (dmn+n log n) RAND(w) O(m log (n/w))O(mn/w)O(dmn) 55

56
56 Experiment RAND BNL & LESS Average case Worst case We try several datasets in the literature … Correlate, Anti-correlated, Independent, Island, House, NBA, Color

57
Average case - No clear winner between BNL and LESS - RAND is always close to the winner Average case - No clear winner between BNL and LESS - RAND is always close to the winner Experimental Results 57 RAND BNL & LESS

58
Experimental Results 58 RAND Worse: After sorting by decreasing first coordinate - RAND is the most robust and usually fastest Worse: After sorting by decreasing first coordinate - RAND is the most robust and usually fastest BNL & LESS

59
Experimental Results 59 RAND BNL & LESS Even Worse: After sorting by entropy

60
Summary 60 (1, 2) (3, 7) (5, 3) (2, 5) (4, 1) (9, 9) 60 RAND BNL & LESS Average case Worst case Disk Stream 12mm-1 Random Sampling RAND Experiment

61
Extensions Distributed skyline algorithm Derandomize the algorithm for 2D case Skyline for partially ordered sets (posets) Open problems Develop algorithm on Parallel Disk Model (PDM) and Cache Oblivious model Extend the techniques to pre-processing algorithm Is O(log n) passes the best possible? 61 Summary

62
Thank you 62

63
Appendix 63

64
Charts for average case 64

65
65

66
The lower bound Theorem: Any randomized one-pass algorithm with space at most n/2 succeeds with probability at most 1/2 Proof 66 - Random unique survivor - 2 points come at the end - If space <= n/2 then will fail if didnt store survivor in the memory

67
Proof of Claim 67

68
Proof of Claim Claim: The tree that some element is sampled will be deleted (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 4, 4 (3, 3) 68

69
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 4, 4 69

70
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (4, 4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 4, 4 (3, 4) 3, 4 70

71
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 4, 4 (3, 3) 3, 4 3, 3 71

72
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 4, 4 (3, 3) 3, 4 3, 3 72

73
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 4, 4 (3, 3) 3, 4 3, 3 73

74
Analysis Draw trees: Each point points to its first dominating point (1, 5), (3, 4), (4, 5), (4, 3), (3, 3), (4,4) (3, 4) 1, 5 3, 3 3, 4 4, 3 4, 4 4, 5 4, 4 (3, 3) 3, 4 3, 3 74

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google