Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.

Similar presentations


Presentation on theme: "Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor."— Presentation transcript:

1 Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor

2 Overview  Databases and data types  Fagin’s Algorithm  Threshold Algorithm  Advantages

3 Multimedia vs String  Early databases  Modern and Middleware databases  “fuzzy” attributes  Querying a database (x, g)

4 0.1 0.9 0.2 0.3 0.5 0.70.6 0.80.4

5

6

7 Naïve Algorithm  Find the top 2 objects R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

8 R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm

9 R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5

10 R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5 X2X2 1.6

11 R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5 X2X2 1.6 X3X3 1.8

12 R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5 X2X2 1.6 X3X3 1.8 X4X4 1.3

13 R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Naïve Algorithm X1X1 1.5 X2X2 1.6 X3X3 1.8 X4X4 1.3 X5X5 0.3

14 Naïve Algorithm X3X3 1.8 X2X2 1.6 X1X1 1.5 X4X4 1.3 X5X5 0.3 Top-2 objects

15 Fagin’s Algorithm  Sequential access in parallel until k matches  Perform random access  Compute the grade for each R object R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

16 Fagin’s Algorithm  Sequential Access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

17 Fagin’s Algorithm  Sequential Access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

18 Fagin’s Algorithm Since k=2, and X 1 and X 3 have been seen in all lists  Sequential Access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

19 Fagin’s Algorithm  Perform random accesses to obtain the scores of all seen objects R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

20 Fagin’s Algorithm  Compute score for all objects and return top k R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 X3X3 1.8 X2X2 1.6 X1X1 1.5 X4X4 1.3

21 Threshold Algorithm  Sequential access for top k matches  Define threshold value τ  Find all seen object and compute scores  Maintain list of top k objects  Continue until top-k >= τ  Output graded set

22 Threshold Algorithm  Sequential access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

23 Threshold Algorithm  Set τ to be the aggregate of the scores seen in this access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 2.6

24 Threshold Algorithm  Random access and compute scores R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 2.6 X1X1 1.5 X2X2 1.6 X4X4 1.3 Top-k

25 Threshold Algorithm  Sequential access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 X1X1 1.5 X2X2 1.6 X4X4 1.3 Top-k

26 Threshold Algorithm  Set τ to be the aggregate of the scores seen in this access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 2.1 X1X1 1.5 X2X2 1.6 X4X4 1.3 Top-k

27 Threshold Algorithm  Random access and compute scores R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 2.1 X1X1 1.5 X2X2 1.6 X4X4 1.3 X3X3 1.8 Top-k

28 X1X1 1.5 X2X2 1.6 X4X4 1.3 X3X3 1.8 Top-k Threshold Algorithm  Sequential Access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0

29 X1X1 1.5 X2X2 1.6 X4X4 1.3 X3X3 1.8 Top-k Threshold Algorithm  Set τ to be the aggregate of the scores seen in this access R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 1

30 Threshold Algorithm  Stop when top-k >= τ R1R1 X1X1 1 X2X2 0.8 X3X3 0.5 X4X4 0.3 X5X5 0.1 R2R2 X2X2 0.8 X3X3 0.7 X1X1 0.3 X4X4 0.2 X5X5 0.1 R3R3 X4X4 0.8 X3X3 0.6 X1X1 0.2 X5X5 0.1 X2X2 0 Τ = 1 X1X1 1.5 X2X2 1.6 X4X4 1.3 X3X3 1.8 Top-k

31 Comparison  Naïve Algorithm Buffer space required = number of objects The cost is linear Not efficient for large databases

32 Comparison  Fagin’s Algorithm Large buffer space required Random access is done at the end Optimal under certain aggregate functions

33 Comparison  The Threshold Algorithm Buffer space bounded by k Objects not seen < τ Less object access required Always optimal

34 Sources  http://alumni.cs.ucr.edu/~skulhari/Top-k- Query.pdf http://alumni.cs.ucr.edu/~skulhari/Top-k- Query.pdf  http://researcher.watson.ibm.com/res earcher/files/us-fagin/jcss03.pdf http://researcher.watson.ibm.com/res earcher/files/us-fagin/jcss03.pdf


Download ppt "Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor."

Similar presentations


Ads by Google