Ranking: Compare, Don’t Score Ammar Ammar, Devavrat Shah (LIDS – MIT) Poster ( No preprint), WIDS 2011.

Slides:



Advertisements
Similar presentations
Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto.
Advertisements

Sep 16, 2013 Lirong Xia Computational social choice The easy-to-compute axiom.
Sep 15, 2014 Lirong Xia Computational social choice The easy-to-compute axiom.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Active Learning and Collaborative Filtering
Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Analysis of Variance. Experimental Design u Investigator controls one or more independent variables –Called treatment variables or factors –Contain two.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.
Computing Trust in Social Networks
Introduction to Management Science
1 © 1998 HRL Laboratories, LLC. All Rights Reserved Development of Bayesian Diagnostic Models Using Troubleshooting Flow Diagrams K. Wojtek Przytula: HRL.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Presented by Johanna Lind and Anna Schurba Facility Location Planning using the Analytic Hierarchy Process Specialisation Seminar „Facility Location Planning“
Estimating Entropy for Data Streams Khanh Do Ba, Dartmouth College Advisor: S. Muthu Muthukrishnan.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Training and Testing of Recommender Systems on Data Missing Not at Random Harald Steck at KDD, July 2010 Bell Labs, Murray Hill.
Sahand Negahban Sewoong Oh Devavrat Shah Yale + UIUC + MIT.
Quantification of the non- parametric continuous BBNs with expert judgment Iwona Jagielska Msc. Applied Mathematics.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems by Carla P. Gomes, Bart Selman, Nuno Crato and henry Kautz Presented by Yunho.
SOFIANE ABBAR, HABIBUR RAHMAN, SARAVANA N THIRUMURUGANATHAN, CARLOS CASTILLO, G AUTAM DAS QATAR COMPUTING RESEARCH INSTITUTE UNIVERSITY OF TEXAS AT ARLINGTON.
Bayesian inference for Plackett-Luce ranking models
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
We report an empirical study of buying and selling prices for three kinds of gambles: Risky (with known probabilities), Ambiguous (with lower and upper.
Recommender Systems Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Credits to Bing Liu (UIC) and Angshul Majumdar.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Estimation. The Model Probability The Model for N Items — 1 The vector probability takes this form if we assume independence.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
Intro to NLP - J. Eisner1 A MAXENT viewpoint.
Machine Learning 5. Parametric Methods.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Innovation Team of Recommender System(ITRS) Collaborative Competitive Filtering : Learning Recommender Using Context of User Choice Keynote: Zhi-qiang.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
MathematicalMarketing Slide 3c.1 Mathematical Tools Chapter 3: Part c – Parameter Estimation We will be discussing  Nonlinear Parameter Estimation  Maximum.
Markov Chain Monte Carlo in R
Fast search for Dirichlet process mixture models
Estimating standard error using bootstrap
Statistical Estimation
Algorithms for Large Data Sets
A Bayesian approach to recommender systems
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Slides by JOHN LOUCKS St. Edward’s University.
Clustering Using Pairwise Comparisons
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
Ensembles.
Range-Efficient Computation of F0 over Massive Data Streams
Recommender Systems: Movie Recommendations
Econometrics Chengyuan Yin School of Mathematics.
Probabilistic Latent Preference Analysis
Introduction to Stream Computing and Reservoir Sampling
Computational social choice
Chapter 7 Sampling and Sampling Distributions
Linear Discrimination
Markov Networks.
IED Product Management Day #2
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Presentation transcript:

Ranking: Compare, Don’t Score Ammar Ammar, Devavrat Shah (LIDS – MIT) Poster ( No preprint), WIDS 2011

Introduction  The need to rank items based on user input – election, betting, recommendation systems. E.g. Netflix, the movie streaming service company (accounts for 30% of U.S. web traffic) : the problem of recommending movies to users based on partial historical information about their preferences.  Two main approaches  Scores : ask users to provide a score/rating for each product, and use the scores to rank the products. (A popular approach)  Comparisons : ask users to compare two, or more, products at a time. Use comparisons to rank products. (A natural alternative) 2

Introduction  Scores  Advantage : Easy aggregation.  Disadvantage : Scores are arbitrary/relative (e.g. scale).  Comparisons  Advantage : Absolute information.  Disadvantage : Hard aggregation

Mathematical Model  n products, N={1,...,n}.  Each customer is associated with a permutation σ of the elements of N.  σ(i) < σ(j) means customer prefers product i to j. E.g. N={1,2,3,4,5} a customer have a permutation σ = ( )  his preference ranking 3 > 1 > 4 > 2 > 5  Their model of customer choice is a distribution, μ: S n  [0,1], over the set of possible permutations, S n.  Observed data is limited to pairwise comparison marginal of μ : w ij = P [ σ(i) < σ(j) ] (fraction of users who prefer item i to item j)  Goal: find an estimate μ, that is consistent with the data likedislike ˆ 4

Maximum Entropy  Multiple distributions are consistent with the data constraints  The principle of maximum entropy helps a distribution choice.  Subject to known constraints (called “testable information”)  The probability distribution which best represents the current state of knowledge is the one with largest entropy.  The solution has the parametric form max μ σ log μ σ 5

Contributions  Developed a consistent algorithm for estimating the parameter of the Maximum Entropy distribution.  Algorithm is distributed and iterative.  Provided a randomized 2-approximation scheme for the mean of the distribution.  Developed two ranking schemes that utilize the Maximum Entropy distribution to obtain a ranking that puts emphasis at the top elements:  Top-k ranking: uses likelihood of the item appearing in top k.  θ-ranking: uses a tilted average of the item's possible positions. 6

Algorithm Sketch  The Maximum Entropy distribution is fully characterized by the parameter λ ij.  To estimate these parameters using the data w ij 's.  Initialize the parameter to λ ij =1.  for t=1,2,... T1: Set λ ij t+1 = λ ij t + 1/t(w ij – E λ [I { σ(i) < σ(j) } ] )  Exact computation of E λ [I { σ(i) < σ(j) } ] is hard.  Use MCMC or BP to obtain an approximation  Parameters can be estimated "separately" in a distributed manner. 7 t t

Mode  Mode : σ* = argmax ( )  Exact computation of the mode is hard.  A randomized 2-approximation: 1. Generate k permutations, σ 1,..., σ k, u.a.r. 2. Select the permutation σ with the largest weight. 8 σ ˆ

Top-k Ranking  A robust ranking of the top k items using one of the following two schemes: 1. top-k ranking  Compute: S k (i)=P λ [σ(i) ≤ k].  Rank products using S k and choose top k. 2. θ-ranking  Compute: S θ (i)=∑ j e -θj ∙ P λ [σ(i) = j]  Rank products using S θ and choose top k. 9