Rank Aggregation.

Slides:



Advertisements
Similar presentations
Web Information Retrieval
Advertisements

Best-Effort Top-k Query Processing Under Budgetary Constraints
CS420 lecture one Problems, algorithms, decidability, tractability.
CS0007: Introduction to Computer Programming Array Algorithms.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Introduction to Analysis of Algorithms
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades.
CSE115/ENGR160 Discrete Mathematics 03/03/11 Ming-Hsuan Yang UC Merced 1.
Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,
Aggregation Algorithms and Instance Optimality
An Introduction to Black-Box Complexity
Combining Fuzzy Information: an Overview Ronald Fagin Abdullah Mueen -- Slides by Abdullah Mueen.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Evaluating Top-k Queries over Web-Accessible Databases Nicolas Bruno Luis Gravano Amélie Marian Columbia University.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Top- K Query Evaluation with Probabilistic Guarantees Martin Theobald, Gerhard Weikum, Ralf Schenkel Presenter: Avinandan Sengupta.
CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.
Approximation Algorithms
1 Evaluating top-k Queries over Web-Accessible Databases Paper By: Amelie Marian, Nicolas Bruno, Luis Gravano Presented By Bhushan Chaudhari University.
Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC.
Lecture 6 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
TECH Computer Science Problem: Selection Design and Analysis: Adversary Arguments The selection problem >  Finding max and min Designing against an adversary.
Chapter 10 Algorithm Analysis.  Introduction  Generalizing Running Time  Doing a Timing Analysis  Big-Oh Notation  Analyzing Some Simple Programs.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
All right reserved by Xuehua Shen 1 Optimal Aggregation Algorithms for Middleware Ronald Fagin, Amnon Lotem, Moni Naor (PODS01)
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Combining Fuzzy Information: An Overview Ronald Fagin.
Presented by Suresh Barukula 2011csz  Top-k query processing means finding k- objects, that have highest overall grades.  A query in multimedia.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1.
Optimal Aggregation Algorithms for Middleware By Ronald Fagin, Amnon Lotem, and Moni Naor.
Top-k Query Processing Optimal aggregation algorithms for middleware Ronald Fagin, Amnon Lotem, and Moni Naor + Sushruth P. + Arjun Dasgupta.
1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
CSE15 Discrete Mathematics 03/06/17
Indexing & querying text
Decision trees Polynomial-Time
COSC160: Data Structures Linked Lists
Algorithm Analysis CSE 2011 Winter September 2018.
Top-k Query Processing
CS573 Data Privacy and Security
NP-Completeness Yin Tat Lee
Arrays, Part 1 of 2 Topics Definition of a Data Structure
Algorithm Analysis (not included in any exams!)
Algorithm design and Analysis
Randomized Algorithms CS648
RS – Reed Solomon List Decoding.
Laks V.S. Lakshmanan Depf. of CS UBC
The Curve Merger (Dvir & Widgerson, 2008)
Lecture 6 Algorithm Analysis
Data Structures Sorting Haim Kaplan & Uri Zwick December 2014.
The Lower Bounds of Problems
Arrays, Part 1 of 2 Topics Definition of a Data Structure
Lecture 6 Algorithm Analysis
Arrays, Part 1 of 2 Topics Definition of a Data Structure
Ensemble learning.
Probabilistic Databases
NP-Completeness Yin Tat Lee
Ch. 2: Getting Started.
David Kauchak cs161 Summer 2009
The Selection Problem.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Arrays, Part 1 of 2 Topics Definition of a Data Structure
Query Specific Ranking
Discrete Mathematics CS 2610
CS 165: Project in Algorithms and Data Structures Michael T. Goodrich
Presentation transcript:

Rank Aggregation

Rank Aggregation: Settings Multiple items Web-pages, cars, apartments,…. Multiple scores for each item By different reviewers, users, according to different features… Some aggregation function on the scores Sum, Average, Max… Goal: compute the top-k items

Rank Aggregation Example Model PriceRank Honda 9 Volvo 3 Subaru Model ComfortRank Honda 7 Volvo 10 Subaru 5 Model BeautyRank Honda 3 Volvo 8 Subaru 4 Model TotalRank(min) Honda 3 Volvo Subaru 4 Model TotalRank(avg) Honda 6.333 Volvo 7 Subaru 6

Naïve Algorithm Compute the aggregated rank for all items Find the best one, then the second best one… the k best one Good for small-scale problems Still not feasible for web scales…

Can we do any better? An assumption to help us: each individual list comes sorted Reasonable for search engines, user rankings… Another assumption: monotonicity of the aggregation function Now can we do any better?

Fagin's algorithm (FA) Do sorted access on all lists in parallel For every item do random access to the other lists to fetch all of its values Stop when at least k items were seen (in the sorted access) in all lists Sort the list Why is this enough?

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 6.5

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5 C 4

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5 C 4

Example (top-3) Beauty Comfort Average Item Score A 9 B C 3 D 1 Item 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5 C 4 How do we know not to look further?

Complexity Probabilistic analysis on the order of items can be used to show better bounds (with good probability) Can we do even better?

Cost model This is a very simple settings so we can define a finer cost model than worst case complexity In a web context it is important to do so Since the scale is huge We associate some cost Cs with every sorted access , and some cost Cr with every random access Denote the cost for algorithm A on input instance I by cost(A,I)

Instance-optimality An algorithm A is instance-optimal if for every input instance I, cost(A,I) = O(cost(A',I)) for every algorithm A' A very strong notion But we can realize it here!

Threshold Algorithm (TA) Idea: sometimes we can stop before seeing k objects in every list Use a threshold on how good can a score of an unseen object be. Based on aggregating the minimal score seen so far in all lists

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 6.5

Example T=9.5 Beauty Comfort Average Item Score A 9 B C 3 D 1 Item 10 C 5 A 4 D 3 Item Score A 6.5 T=9.5

Example T=9.5 Beauty Comfort Average Item Score A 9 B C 3 D 1 Item 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5 T=9.5

Example T=7 Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5 C 4 T=7

Example T=4 Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5 C 4 T=4 One step less!

Theorem Assume that the aggregation function t is monotone. Let D be the class of all databases. Let A be the class of all algorithms that correctly find the top k answers for t for every database and that do not make wild guesses. Then TA is instance optimal over A and D

Proof Assume that algorithm A halts at depth d (that is, if di is the number of objects seen under sorted access to list i; then d =max di). Assume that A sees a distinct objects (some possibly multiple times). In particular, a>= d: Since A makes no wild guesses, and sees a distinct objects, it must make at least a sorted accesses

Claim: TA halts on D by depth a +k Note that for each choice of d’ TA sees at least d0 objects by depth d’ By depth d’ it has made m*d’ sorted accesses, and each object is accessed at most m times under sorted access. If there are at most k objects that A does not see, then TA halts by depth a + k (after having seen every object), and we are done.

Now assume that there are at least k + 1 objects that A does not see. Let Y be the output set of A Since Y is of size k; there is some object V that A does not see and that is not in Y Let t be the threshold value when algorithm A halts I.e. the aggregation of the lowest scores observed

Call object R big if it has grade better than t, otherwise small Claim: Every R in Y is big Proof: Add another item with “lowest” di values in di, it is not seen by A thus not outputted; by correctness of A the claim follows Now TA will see all elements in Y after depth d and will halt d <= a and so we are done.

Restricted Sorted Access Some rankings are not available as sorted E.g. distances from a map site Then we can revise TA to do sorted access only on the list where it is possible And still instance-optimal! (Against algorithms that work under the same restrictions, of course)

No Random Access Maintain bottom and upper bounds for every item (worst and best grades) Best is the aggregation of what we have seen and the worst we have seen in every list, Worst is the aggregation with what we have seen and zeros Keep in the list those with top-K "worst" grades Break ties by "best" grades Halt if we have k items in the list, and the best grade for every item out of the list is less than the k'th in the list

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 4.5<S<9.5

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 4.5<S<9.5 B 5<S<9.5

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 4.5<S<9.5 B 9.5

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 4.5<S<9.5 B 9.5 C 2.5<S<7

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 4.5<S<9.5 B 9.5 C 4 Item Score A 6.5 B 9.5 C 4

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5 C 4 Item Score A 6.5 B 9.5 C 4

Example Beauty Comfort Average Item Score A 9 B C 3 D 1 Item Score B 10 C 5 A 4 D 3 Item Score A 6.5 B 9.5 C 4 Item Score A 6.5 B 9.5 C 4 Score(D)<3