# Comparison of parallel and random approach to a candidate list in the multifeature querying Peter Gurský Institute of Computer Science UPJŠ, Košice, Slovakia.

## Presentation on theme: "Comparison of parallel and random approach to a candidate list in the multifeature querying Peter Gurský Institute of Computer Science UPJŠ, Košice, Slovakia."— Presentation transcript:

Comparison of parallel and random approach to a candidate list in the multifeature querying Peter Gurský Institute of Computer Science UPJŠ, Košice, Slovakia http://klud.ics.upjs.sk/~gursky

2 Multifeature querying We want to find top k objects, in a possible huge set of objects with minimal number of accesses to a sources. The ordering of objects depends on features of the objects and on user requirements over the features. Example: The conference will be at ”address”. I want to find the hotel (cottage) that is near, cheap and new. Show me 5 best according to my aggregation function: F(near,cheap,new)=2*near i +3*cheap i +new i

3 Specifying a query What the term “near” means? m tv 1001000 1 0

4 Model L1L1 L2L2 L3L3 L4L4 LmLm O10,85 O10,92 O10,11 O11 0,3 O20,69 O20,5 O20,12 O20,51 O20,92 O30,7 O30 0,5 O30,6 O30,1 n

5 Two types of accesses Sorted (sequential) access: –return the next greatest value from the i-th list together with a name of an object Random (direct) access: –return the value of an object from the i-th list

6 L1L1 L2L2 L3L3 0,950,940,85 0,78 0,62 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 Fo 1 = 2* 0,95 +3* 0,78 + 0,62 = 4,86 Fo 2 = 2* 0,11 +3* 0,94 + 0,44 = 3,48 0,11 0,44 0,92 0,34 Fo 3 = 2* 0,92 +3* 0,34 + 0,85 = 3,71 0,910,790,65  =2* 0,95 +3* 0,94 + 0,85 = 5,57  =2* 0,91 +3* 0,79 + 0,65 = 4,84 Threshold algorithm (Fagin 1999)

7 Which list should be accessed next under sorted access? We have two ways to obtain the top-k list faster: Increase the left side Decrease the right side F(x 1,…,x m )≥  Requirement for correctness: For each object from the top-k list must hold:

8 ∂F/∂x*x algorithm 0,62 0,81 0,70 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 L1L1 L2L2 L3L3 χ 2 = 3*0,7 = 2,1 χ 3 = 1*0,62 = 0,62 χ 1 = 2*0,81 = 1,62

9 The quick-combine algorithm - ∂F/∂x(∆x) (Gűntzer, Balke, Kiessling 2000) L1L1 L2L2 L3L3 p p p 0,75 0,62 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 0,82 0,81 ∆ 1 = 2*(0,82-0,81)=0,02 0,74 0,70 ∆ 2 = 3*(0,74-0,7)=0,12∆ 3 = 1*(0,75-0,62)=0,13

10 x/∆x-switch algorithm During the evaluation we are switching in each step between the quick-combine and the ∂F/∂x*x algorithm Using both strategies for choosing of the next list Best in our experiments

11 Types of values in Lists Discrete data – “human rated” –number of starts of hotels, ratings of companies, marks in a school, … Finer discretisation – naturally discrete –guess of a length of a trip, temperature in the weather forecast, ratings of terms in IR, number of rooms, price, … Continuous data –Physical experiments, precise measurements, multimedial data, …

12 Discrete data L1L1 L2L2 O11 0,6 O20 0,2 O31 0 Has rooms with toiletsLuxury (from number of stars) Yes (4200 objects) No (4800 objects) ***** **** *** ** * no stars

13 ∂F/∂x(∆x) L1L1 L2L2 L3L3 p p p 0,7 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 0,8 ∆ 1 = 2*(0,8-0,8)=0 0,7 ∆ 2 = 3*(0,7-0,7)=0∆ 3 = 1*(0,7-0,7)=0

14 Again: Which list should be accessed now under sorted access? All original algorithms choose one list randomly (random approach) New: access all candidates with highest values (parallel approach)

15 Data for experiments Artificial data: 2 exponential and 2 logarithmic distributions with 10000 objects and 6 types of aggregation functions Values of the attributes was rounded to 10 discrete values 16 different combinations of inputs

16 Data for experiments Benchmark data: 6 sets of real data obtained from the collection of 25535 documents using 50 different terms in queries in IR with different combinations of local and global weights These data have low number of objects with high values and a lot of objects with low values Fine discretisation

17 Comparison of random and parallel approach (random=100%) Artificial data

18 Comparison of random and parallel approach (random=100%) Benchmark data

19 Artificial data

20 Benchmark data

21 Conclusions Parallel approach helps to improve the quick-combine algorithm over discretised values Switch algorithm keeps its first place with the lowest number of accesses

22 Thank you for your attention. http://klud.ics.upjs.sk/~gursky gursky@upjs.sk

Download ppt "Comparison of parallel and random approach to a candidate list in the multifeature querying Peter Gurský Institute of Computer Science UPJŠ, Košice, Slovakia."

Similar presentations