Comparison of parallel and random approach to a candidate list in the multifeature querying Peter Gurský Institute of Computer Science UPJŠ, Košice, Slovakia.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Web Information Retrieval
Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
 Introduction  Views  Related Work  Preliminaries  Problems Discussed  Algorithm LPTA  View Selection Problem  Experimental Results.
Best-Effort Top-k Query Processing Under Budgetary Constraints
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
JAVA Coursework (the same for 2A and 2B). Fundamental Information The coursework is 30 marks in your O’Level = 15% of the exam Must be word processed.
SunCast: Fine-grained Prediction of Natural Sunlight Levels for Improved Daylight Harvesting Jiakang Lu and Kamin Whitehouse Department of Computer Science,
The University of Georgia Department of Computer Science Department of Computer Science Introducing Parallelism through Sorting Integrating Concepts from.
Online Data Fusion School of Computing National University of Singapore AT&T Shannon Research Labs Xuan Liu, Xin Luna Dong, Beng Chin Ooi, Divesh Srivastava.
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,
Active Learning and Collaborative Filtering
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
Ensemble Learning: An Introduction
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Aggregation Algorithms and Instance Optimality
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
Machine Learning: Ensemble Methods
The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.
Quick Reference Guide ACCESSING SITE SEGOnline is Sony’s online booking site for booking business travel. To access SEGOnline, direct your Web browser.
Illuminating Computer Science CCIT 4-6Sep
Institute for Personal Robots in Education (IPRE)‏ CSC 170 Computing: Science and Creativity.
Artificial Neural Networks
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
1 Ethics of Computing MONT 113G, Spring 2012 Session 13 Limits of Computer Science.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Jason Houle Vice President, Travel Operations Lixto Travel Price Intelligence 2.0.
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Online Data Fusion School of Computing National University of Singapore AT&T Shannon Research Labs Xuan Liu, Xin Luna Dong, Beng Chin Ooi, Divesh Srivastava.
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice.
Reverse Top-k Queries Akrivi Vlachou *, Christos Doulkeridis *, Yannis Kotidis #, Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU),
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Information Networks Rank Aggregation Lecture 10.
Tanja Magoč, François Modave, Xiaojing Wang, and Martine Ceberio Computer Science Department The University of Texas at El Paso.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
1 Le Thi Thu Thuy*, Doan Dai Duong*, Virendrakumar C. Bhavsar* and Harold Boley** * Faculty of Computer Science, University of New Brunswick, Fredericton,
Voice Separation-A Local Optimisation Approach Jurgen Kilian Department of Computer Science Darmstadt University of technology Holger H.Hoos Department.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Combining Fuzzy Information: An Overview Ronald Fagin.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
COMP 1001: Introduction to Computers for Arts and Social Sciences Sorting Algorithms Wednesday, June 1, 2011.
MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR K. TAMBALKAR.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Is Top-k Sufficient for Ranking? Yanyan Lan, Shuzi Niu, Jiafeng Guo, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Information Retrieval Quality of a Search Engine.
Modern Information Retrieval
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
Ensemble Classifiers.
Gleb Skobeltsyn Flavio Junqueira Vassilis Plachouras
SIMILARITY SEARCH The Metric Space Approach
Saisai Gong, Wei Hu, Yuzhong Qu
Rank Aggregation.
Popular Ranking Algorithms
Multi-Objective Optimization
Actively Learning Ontology Matching via User Interaction
Retrieval Performance Evaluation - Measures
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
WSExpress: A QoS-Aware Search Engine for Web Services
Presentation transcript:

Comparison of parallel and random approach to a candidate list in the multifeature querying Peter Gurský Institute of Computer Science UPJŠ, Košice, Slovakia

2 Multifeature querying We want to find top k objects, in a possible huge set of objects with minimal number of accesses to a sources. The ordering of objects depends on features of the objects and on user requirements over the features. Example: The conference will be at ”address”. I want to find the hotel (cottage) that is near, cheap and new. Show me 5 best according to my aggregation function: F(near,cheap,new)=2*near i +3*cheap i +new i

3 Specifying a query What the term “near” means? m tv

4 Model L1L1 L2L2 L3L3 L4L4 LmLm O10,85 O10,92 O10,11 O11 0,3 O20,69 O20,5 O20,12 O20,51 O20,92 O30,7 O30 0,5 O30,6 O30,1 n

5 Two types of accesses Sorted (sequential) access: –return the next greatest value from the i-th list together with a name of an object Random (direct) access: –return the value of an object from the i-th list

6 L1L1 L2L2 L3L3 0,950,940,85 0,78 0,62 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 Fo 1 = 2* 0,95 +3* 0,78 + 0,62 = 4,86 Fo 2 = 2* 0,11 +3* 0,94 + 0,44 = 3,48 0,11 0,44 0,92 0,34 Fo 3 = 2* 0,92 +3* 0,34 + 0,85 = 3,71 0,910,790,65  =2* 0,95 +3* 0,94 + 0,85 = 5,57  =2* 0,91 +3* 0,79 + 0,65 = 4,84 Threshold algorithm (Fagin 1999)

7 Which list should be accessed next under sorted access? We have two ways to obtain the top-k list faster: Increase the left side Decrease the right side F(x 1,…,x m )≥  Requirement for correctness: For each object from the top-k list must hold:

8 ∂F/∂x*x algorithm 0,62 0,81 0,70 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 L1L1 L2L2 L3L3 χ 2 = 3*0,7 = 2,1 χ 3 = 1*0,62 = 0,62 χ 1 = 2*0,81 = 1,62

9 The quick-combine algorithm - ∂F/∂x(∆x) (Gűntzer, Balke, Kiessling 2000) L1L1 L2L2 L3L3 p p p 0,75 0,62 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 0,82 0,81 ∆ 1 = 2*(0,82-0,81)=0,02 0,74 0,70 ∆ 2 = 3*(0,74-0,7)=0,12∆ 3 = 1*(0,75-0,62)=0,13

10 x/∆x-switch algorithm During the evaluation we are switching in each step between the quick-combine and the ∂F/∂x*x algorithm Using both strategies for choosing of the next list Best in our experiments

11 Types of values in Lists Discrete data – “human rated” –number of starts of hotels, ratings of companies, marks in a school, … Finer discretisation – naturally discrete –guess of a length of a trip, temperature in the weather forecast, ratings of terms in IR, number of rooms, price, … Continuous data –Physical experiments, precise measurements, multimedial data, …

12 Discrete data L1L1 L2L2 O11 0,6 O20 0,2 O31 0 Has rooms with toiletsLuxury (from number of stars) Yes (4200 objects) No (4800 objects) ***** **** *** ** * no stars

13 ∂F/∂x(∆x) L1L1 L2L2 L3L3 p p p 0,7 F(x 1,x 2,x 3 )=2*x 1 +3*x 2 +x 3 0,8 ∆ 1 = 2*(0,8-0,8)=0 0,7 ∆ 2 = 3*(0,7-0,7)=0∆ 3 = 1*(0,7-0,7)=0

14 Again: Which list should be accessed now under sorted access? All original algorithms choose one list randomly (random approach) New: access all candidates with highest values (parallel approach)

15 Data for experiments Artificial data: 2 exponential and 2 logarithmic distributions with objects and 6 types of aggregation functions Values of the attributes was rounded to 10 discrete values 16 different combinations of inputs

16 Data for experiments Benchmark data: 6 sets of real data obtained from the collection of documents using 50 different terms in queries in IR with different combinations of local and global weights These data have low number of objects with high values and a lot of objects with low values Fine discretisation

17 Comparison of random and parallel approach (random=100%) Artificial data

18 Comparison of random and parallel approach (random=100%) Benchmark data

19 Artificial data

20 Benchmark data

21 Conclusions Parallel approach helps to improve the quick-combine algorithm over discretised values Switch algorithm keeps its first place with the lowest number of accesses

22 Thank you for your attention.