Download presentation

Presentation is loading. Please wait.

Published byPhillip Weston Modified over 3 years ago

1
Comparison of parallel and random approach to a candidate list in the multifeature querying Peter Gurský Institute of Computer Science UPJŠ, Košice, Slovakia http://klud.ics.upjs.sk/~gursky

2
2 Multifeature querying We want to find top k objects, in a possible huge set of objects with minimal number of accesses to a sources. The ordering of objects depends on features of the objects and on user requirements over the features. Example: The conference will be at ”address”. I want to find the hotel (cottage) that is near, cheap and new. Show me 5 best according to my aggregation function: F(near,cheap,new)=2*near i +3*cheap i +new i

3
3 Specifying a query What the term “near” means? m tv 1001000 1 0

4
4 Model L1L1 L2L2 L3L3 L4L4 LmLm O10,85 O10,92 O10,11 O11 0,3 O20,69 O20,5 O20,12 O20,51 O20,92 O30,7 O30 0,5 O30,6 O30,1 n

5
5 Two types of accesses Sorted (sequential) access: –return the next greatest value from the i-th list together with a name of an object Random (direct) access: –return the value of an object from the i-th list

6
6 L1L1 L2L2 L3L3 0,950,940,85 0,78 0,62 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 Fo 1 = 2* 0,95 +3* 0,78 + 0,62 = 4,86 Fo 2 = 2* 0,11 +3* 0,94 + 0,44 = 3,48 0,11 0,44 0,92 0,34 Fo 3 = 2* 0,92 +3* 0,34 + 0,85 = 3,71 0,910,790,65 =2* 0,95 +3* 0,94 + 0,85 = 5,57 =2* 0,91 +3* 0,79 + 0,65 = 4,84 Threshold algorithm (Fagin 1999)

7
7 Which list should be accessed next under sorted access? We have two ways to obtain the top-k list faster: Increase the left side Decrease the right side F(x 1,…,x m )≥ Requirement for correctness: For each object from the top-k list must hold:

8
8 ∂F/∂x*x algorithm 0,62 0,81 0,70 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 L1L1 L2L2 L3L3 χ 2 = 3*0,7 = 2,1 χ 3 = 1*0,62 = 0,62 χ 1 = 2*0,81 = 1,62

9
9 The quick-combine algorithm - ∂F/∂x(∆x) (Gűntzer, Balke, Kiessling 2000) L1L1 L2L2 L3L3 p p p 0,75 0,62 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 0,82 0,81 ∆ 1 = 2*(0,82-0,81)=0,02 0,74 0,70 ∆ 2 = 3*(0,74-0,7)=0,12∆ 3 = 1*(0,75-0,62)=0,13

10
10 x/∆x-switch algorithm During the evaluation we are switching in each step between the quick-combine and the ∂F/∂x*x algorithm Using both strategies for choosing of the next list Best in our experiments

11
11 Types of values in Lists Discrete data – “human rated” –number of starts of hotels, ratings of companies, marks in a school, … Finer discretisation – naturally discrete –guess of a length of a trip, temperature in the weather forecast, ratings of terms in IR, number of rooms, price, … Continuous data –Physical experiments, precise measurements, multimedial data, …

12
12 Discrete data L1L1 L2L2 O11 0,6 O20 0,2 O31 0 Has rooms with toiletsLuxury (from number of stars) Yes (4200 objects) No (4800 objects) ***** **** *** ** * no stars

13
13 ∂F/∂x(∆x) L1L1 L2L2 L3L3 p p p 0,7 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 0,8 ∆ 1 = 2*(0,8-0,8)=0 0,7 ∆ 2 = 3*(0,7-0,7)=0∆ 3 = 1*(0,7-0,7)=0

14
14 Again: Which list should be accessed now under sorted access? All original algorithms choose one list randomly (random approach) New: access all candidates with highest values (parallel approach)

15
15 Data for experiments Artificial data: 2 exponential and 2 logarithmic distributions with 10000 objects and 6 types of aggregation functions Values of the attributes was rounded to 10 discrete values 16 different combinations of inputs

16
16 Data for experiments Benchmark data: 6 sets of real data obtained from the collection of 25535 documents using 50 different terms in queries in IR with different combinations of local and global weights These data have low number of objects with high values and a lot of objects with low values Fine discretisation

17
17 Comparison of random and parallel approach (random=100%) Artificial data

18
18 Comparison of random and parallel approach (random=100%) Benchmark data

19
19 Artificial data

20
20 Benchmark data

21
21 Conclusions Parallel approach helps to improve the quick-combine algorithm over discretised values Switch algorithm keeps its first place with the lowest number of accesses

22
22 Thank you for your attention. http://klud.ics.upjs.sk/~gursky gursky@upjs.sk

Similar presentations

OK

Online Data Fusion School of Computing National University of Singapore AT&T Shannon Research Labs Xuan Liu, Xin Luna Dong, Beng Chin Ooi, Divesh Srivastava.

Online Data Fusion School of Computing National University of Singapore AT&T Shannon Research Labs Xuan Liu, Xin Luna Dong, Beng Chin Ooi, Divesh Srivastava.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on limits and derivatives practice Ppt on total parenteral nutrition side Detail ppt on filariasis disease Ppt on central excise act 1944 View ppt on mac Ppt on barack obama leadership style Ppt on different types of computer softwares free Ppt on role of entrepreneur in economic development Ppt on latest technology in cse Ppt on polynomials in maths what does commutative mean