Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Suresh Barukula 2011csz8090 1.  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.

Similar presentations


Presentation on theme: "Presented by Suresh Barukula 2011csz8090 1.  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia."— Presentation transcript:

1 Presented by Suresh Barukula 2011csz8090 1

2  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia database *combines different graded attributes through an aggregation function *Overall grade for each object will be calculated using an aggregation function, and we can return top-k objects. 2

3 In general multimedia databases contains fuzzy data. For example: We want to retrieve all red objects What we can say about the below object? Is it red or not? We can’t say whether it is red or not, but we can grade it by the amount of redness. attribute values are typically graded [0,1] 3

4  FA-Fagin’s Algorithm  TA-Threshold Algorithm  TA Z Algorithm  NRA- No Random Access  CA- Combined Algorithm 4

5  N-Number of Objects  m-No of attributes  x i Є [0,1]  Database is consisting of m sorted lists L 1 …L m ; each of length N. We may refer to L i as list i. Each entry of L i is of the form (R, x i ), where x i is the i th field of object R, Each list L i is sorted in descending order by the x i value. 5

6  Sorted access  Random access  The cost of the middleware is sC S + rC R Where s is the no of sorted accesses, r is no of random accesses, C S is sorted access cost and C R is random access cost. 6

7 Example – Simple Database model (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ Sorted L 1 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ N a b c d........ Object ID 0.9 0.8 0.72 0.6........ Attribute 1 0.85 0.2 0.9........ Attribute 2 0.7 M Sorted L 2 7

8 Find the top 2 (k = 2) objects on the following ‘query’ executed on the middleware: A1 & A2 (eg: color=red & shape=round) Example – Simple Query A1 & A2 as a ‘query’ to the middleware results in combining of the grades of A1 andA2 by min(A1,A2) 8

9 c ID A1A1 A2A2 Min(A 1,A 2 ) STEP 1: Read attributes from every sorted list Stop when k objects have been seen in common from all lists (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ a d 0.9 0.85 b 0.8 0.72 0.7 Example – Fagin’s Algorithm 9

10 c IDA1A1 A2A2 Min(A 1,A 2 ) STEP 2: Random access to find missing grades (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ a d 0.9 0.85 b 0.8 0.72 0.7 0.6 0.2 Example – Fagin’s Algortihm 10

11 c IDA1A1 A2A2 Min(A 1,A 2 ) STEP 3 Compute the grades of the seen objects. Return the k highest graded objects. (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ a d 0.9 0.85 b 0.8 0.72 0.7 0.6 0.2 0.85 0.6 0.7 0.2 Example – Fagin’s Algortihm 11

12 Read all grades of an object once seen from a sorted access No need to wait until the lists give k common objects Do sorted access (and corresponding random accesses) until you have seen the top k answers. How do we know that grades of seen objects are higher than the grades of unseen objects ? Predict maximum possible grade unseen objects: a: 0.9 b: 0.8 c: 0.72........ L1L1 L2L2 d: 0.9 a: 0.85 b: 0.7 c: 0.2........ f: 0.65 d: 0.6 f: 0.6 Seen Possibly unseen Threshold value Threshold Algorithm (TA) T = min(0.72, 0.7) = 0.7 12

13 IDA1A1 A2A2 Min(A 1,A 2 ) Step 1: - parallel sorted access to each list (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ a d 0.9 0.85 0.6 For each object seen: - get all grades by random access - determine Min(A1,A2) - amongst 2 highest seen ? keep in buffer Example – Threshold Algorithm 13

14 IDA1A1 A2A2 Min(A 1,A 2 ) a: 0.9 b: 0.8 c: 0.72 d: 0.6........ L1L1 L2L2 d: 0.9 a: 0.85 b: 0.7 c: 0.2........ Step 2: - Determine threshold value based on objects currently seen under sorted access. T = min(L1, L2) a d 0.9 0.85 0.6 T = min(0.9, 0.9) = 0.9 - 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1 Example – Threshold Algorithm 14

15 IDA1A1 A2A2 Min(A 1,A 2 ) Step 1 (Again): - parallel sorted access to each list (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ a d 0.9 0.85 0.6 For each object seen: - get all grades by random access - determine Min(A1,A2) - amongst 2 highest seen ? keep in buffer b0.80.7 Example – Threshold Algorithm 15

16 IDA1A1 A2A2 Min(A 1,A 2 ) a: 0.9 b: 0.8 c: 0.72 d: 0.6........ L1L1 L2L2 d: 0.9 a: 0.85 b: 0.7 c: 0.2........ Step 2 (Again): - Determine threshold value based on objects currently seen. T = min(L1, L2) a b 0.9 0.7 0.85 0.8 0.7 T = min(0.8, 0.85) = 0.8 - 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1 Example – Threshold Algorithm 16

17 IDA1A1 A2A2 Min(A 1,A 2 ) a: 0.9 b: 0.8 c: 0.72 d: 0.6........ L1L1 L2L2 d: 0.9 a: 0.85 b: 0.7 c: 0.2........ Situation at stopping condition a b 0.9 0.7 0.85 0.8 0.7 T = min(0.72, 0.7) = 0.7 Example – Threshold Algorithm 17

18  The middleware cost of the FA is same no matter what the aggregation function is.  TA stops at least as early as FA  TA may perform more random accesses than FA  TA requires only bounded buffers  TA can be stopped early(θ-approximation) 18

19 A = class of algorithms, A Є A represents an algorithm D = legal inputs to algorithms (databases), D Є D represents a database Cost(A,D ) = middleware cost when running algorithm A over database D Concept of instance optimality Algorithm B is instance optimal over A and D if : B Є A and Cost(B,D ) = O(Cost(A,D )) A Є A, D Є D Which means that: Cost(B,D ) ≤ c. Cost(A,D ) + c’, A Є A, D Є D optimality ratio, 19

20  Theorem: If the aggregation function t is monotone, TA correctly finds the top K answers.  Theorem: TA is instance optimal for every monotone aggregation function, over every database (Note: if we exclude wild guesses). 20

21  (a, 0.9) (b, 0.8) (c, 0.72) (d, 0.6)........ L1L1 L2L2 (d, 0.9) (a, 0.85) (b, 0.7) (c, 0.2)........ (b, 0.6) (a, 0.83) (d, 0.61) (c, 0.9)........ L3L3 1 T=min(0.72,0.7,1)=0.7 21

22  Can we determine rank of an object without seeing all of it’s grades?  The main essence of this algorithm is estimating the rank using best and worst possible values 22 1 1/3................

23  CA is merge between TA and NRA.  The idea of CA is to run NRA but after every h steps to perform random access step.  Both NRA and CA are instance optimal over all databases, when the aggregation function is monotone 23

24  In this paper the authors have studied a simple and elegant algorithm called TA.  They have also studied the variants of TA, when there are no sorted access, no random access etc..,  They have emphasized on instance optimality and they have proved that their algorithms are instance optimal over all algorithms for all databases under normal assumptions.  But they have not considered the computational costs and the data structures that are required to implement the algorithms. 24

25 25


Download ppt "Presented by Suresh Barukula 2011csz8090 1.  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia."

Similar presentations


Ads by Google