Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,

Slides:



Advertisements
Similar presentations
Topic 3 Top-K and Skyline Algorithms. 2 What is top-k processing? Find k items that best answer a users query –As a set, as a sorted list, or as a sorted.
Advertisements

Lindsey Bleimes Charlie Garrod Adam Meyerson
Web Information Retrieval
Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha ( )
Seminar in Auctions and Mechanism Design Based on J. Hartline’s book: Approximation in Economic Design Presented by: Miki Dimenshtein & Noga Levy.
Best-Effort Top-k Query Processing Under Budgetary Constraints
Data Mining Classification: Alternative Techniques
Efficient Network Aware Search in Collaborative Tagging Sites… Sihem Amer Yahia, Michael Benedikt Laks V.S. Lakshmanan, Julia Stoyanovichy PRESENTED BY,
CS0007: Introduction to Computer Programming Array Algorithms.
Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 Searching and Integrating Information on the Web Seminar 4: Ranking Queries and Data Privacy Professor Chen Li UC Irvine.
Aggregation Algorithms and Instance Optimality
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 7: Scores in a Complete Search.
Combining Fuzzy Information: an Overview Ronald Fagin Abdullah Mueen -- Slides by Abdullah Mueen.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
A Unified Approach for Computing Top-k Pairs in Multidimensional Space Presented By: Muhammad Aamir Cheema 1 Joint work with Xuemin Lin 1, Haixun Wang.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Online Learning Algorithms
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
TECH Computer Science Problem: Selection Design and Analysis: Adversary Arguments The selection problem >  Finding max and min Designing against an adversary.
Information Networks Rank Aggregation Lecture 10.
Symbol Tables and Search Trees CSE 2320 – Algorithms and Data Structures Vassilis Athitsos University of Texas at Arlington 1.
Efficient Processing of Top-k Spatial Preference Queries
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
+ Efficient network aware search in collaborative tagging Sihem Amer Yahia, Michael Benedikt, Laks V.S. Lakshmanan, Julia Stoyanovich Presented by: Ashish.
Combining Fuzzy Information: An Overview Ronald Fagin.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
Real-Time systems By Dr. Amin Danial Asham.
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
Efficient Skyline Computation on Vertically Partitioned Datasets Dimitris Papadias, David Yang, Georgios Trimponias CSE Department, HKUST, Hong Kong.
Database Searching and Information Retrieval Presented by: Tushar Kumar.J Ritesh Bagga.
1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Neighborhood - based Tag Prediction
Indexing & querying text
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Indexing & querying text
Top-k Query Processing
CS573 Data Privacy and Security
CS 4/527: Artificial Intelligence
Randomized Algorithms CS648
Rank Aggregation.
Randomized Algorithms
Laks V.S. Lakshmanan Depf. of CS UBC
Popular Ranking Algorithms
Richard Anderson Autumn 2006 Lecture 1
Topic 3: Prob. Analysis Randomized Alg.
Ensemble learning.
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Ch. 2: Getting Started.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Arrays, Part 1 of 2 Topics Definition of a Data Structure
Efficient Processing of Top-k Spatial Preference Queries
Query Specific Ranking
Discrete Mathematics CS 2610
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Rank Aggregation

Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users, according to different features… Some aggregation function on the scores – Sum, Average, Max… Goal: compute the top-k items

Rank Aggregation Example ModelPriceRank Honda9 Volvo3 Subaru 9 ModelComfortRank Honda7 Volvo10 Subaru5 ModelBeautyRank Honda3 Volvo8 Subaru4 ModelTotalRank(min) Honda3 Volvo3 Subaru4 ModelTotalRank(avg) Honda6.333 Volvo7 Subaru6

Naïve Algorithm Compute the aggregated rank for all items Find the best one, then the second best one… the k best one Good for small-scale problems Still not feasible for web scales…

Can we do any better? An assumption to help us: each individual list comes sorted – Reasonable for search engines, user rankings… Another assumption: monotonicity of the aggregation function Now can we do any better?

Fagin's algorithm (FA) Do sorted access on all lists in parallel For every item do random access to the other lists to fetch all of its values Stop when at least k items were seen (in the sorted access) in all lists Sort the list Why is this enough?

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 Average

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 Average

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 Average

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 C4 Average

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 C4 Average

Example (top-3) ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 C4 Average How do we know not to look further?

Complexity Probabilistic analysis on the order of items can be used to show better bounds (with good probability) Can we do even better?

Cost model This is a very simple settings so we can define a finer cost model than worst case complexity In a web context it is important to do so – Since the scale is huge We associate some cost C s with every sorted access, and some cost C r with every random access Denote the cost for algorithm A on input instance I by cost(A,I)

Instance-optimality An algorithm A is instance-optimal if for every input instance I, cost(A,I) = O(cost(A',I)) for every algorithm A' A very strong notion But we can realize it here!

Threshold Algorithm (TA) Idea: sometimes we can stop before seeing k objects in every list Use a threshold on how good can a score of an unseen object be. Based on aggregating the minimal score seen so far in all lists

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 Average

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 Average T=9.5

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 Average T=9.5

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 C4 Average T=7

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 C4 Average T=4 One step less!

Instance-optimality Theorem: If the aggregation function is strictly monotone and every two items in a list have distinct grades, then TA is instance-optimal – Intuition: If an algorithm stops on input I before reaching the threshold, then we can design an input I' on which it is wrong, by changing values it did not see – TA sees at most K items more than any algorithm on any input Strict monotonicity is needed to avoid "lucky guesses" in breaking ties – Thm. In general no instance-optimal algorithm exists Theorem: TA is instance-optimal against all algorithms that do not "guess" – i.e. do not do random access to an item they did not see in sorted access

Restricted Sorted Access Some rankings are not available as sorted – E.g. distances from a map site Then we can revise TA to do sorted access only on the list where it is possible And still instance-optimal! (Against algorithms that work under the same restrictions, of course)

No Random Access Maintain bottom and upper bounds for every item (worst and best grades) Best is the aggregation of what we have seen and the worst we have seen in every list, Worst is the aggregation with what we have seen and zeros Keep in the list those with top-K "worst" grades – Break ties by "best" grades Halt if we have k items in the list, and the best grade for every item out of the list is less than the k'th in the list

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A4.5<S<9 Average

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfortAverage ItemScore A4.5<S<9 B5<S<10

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfortAverage ItemScore A4.5<S<9 B9.5

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfortAverage ItemScore A4.5<S<9 B9.5 C2.5<S<5

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 C4 Average ItemScore A4.5<S<9 B9.5 C4

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 C4 Average ItemScore A6.5 B9.5 C4

Example ItemScore A9 B9 C3 D1 ItemScore B10 C5 A4 D3 BeautyComfort ItemScore A6.5 B9.5 C4 Average ItemScore A6.5 B9.5 C4 Score(D)<3