Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Jose-Luis Blanco, Javier González, Juan-Antonio Fernández-Madrigal University of Málaga (Spain) Dpt. of System Engineering and Automation May Pasadena,
Fast Algorithms For Hierarchical Range Histogram Constructions
Karthik Raman, Thorsten Joachims Cornell University.
CS479/679 Pattern Recognition Dr. George Bebis
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Model Assessment, Selection and Averaging
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.
How well can we learn what the stimulus is by looking at the neural responses? We will discuss two approaches: devise and evaluate explicit algorithms.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
1 Distributed localization of networked cameras Stanislav Funiak Carlos Guestrin Carnegie Mellon University Mark Paskin Stanford University Rahul Sukthankar.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Optimal Bandwidth Selection for MLS Surfaces
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Bayesian Learning Rong Jin.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Scalable Text Mining with Sparse Generative Models
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Bayesian inference for Plackett-Luce ranking models
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
Mutual Information Scheduling for Ranking Hamza Aftab Nevin Raj Paul Cuff Sanjeev Kulkarni Adam Finkelstein 1.
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
NTU & MSRA Ming-Feng Tsai
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
CEE 6410 Water Resources Systems Analysis
Modeling Annotator Accuracies for Supervised Learning
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Xi Chen Mentor: Denny Zhou In collaboration with: Qihang Lin
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Stochastic Optimization Maximization for Latent Variable Models
Probabilistic Latent Preference Analysis
Learning to Rank with Ties
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Presentation transcript:

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning Department Carnegie Mellon University

Judgment  Judgment is important in many domains, e.g. search, ads, game.  Judgment type: Absolute vs. Relative  Absolute judgments are noisy and less agreement  Relative judgments have much higher agreement.  Relative judgments are faster per judgment.  Judgment in crowdsourced settings: judgments provided by multiple annotators 2 A: 5 B: 3 C: 1 A: 5 B: 3 C: 1 A > B A > C B > C A > B A > C B > C [Carterette, et al. ECIR 2008 ] Pair-wise Comparisons

Challenges for Consensus from Crowd  Not all annotators are equally reliable.  This is true in many settings but exacerbated in crowd setting.  Model annotators’ reliability and incorporate into a well-studied probabilistic ranking aggregation framework.  Reliability can depend on how well the annotator fits the task.  An annotator’s reliability may depend on both characteristics of the annotator and characteristics of the task  Introduce pooling across tasks and incorporate representation of task features into model.  Maximum quality for minimum cost but we not only need to choose what objects to label but who should label them.  Active learning with no oracle.  Formalize as explore-exploit objective tradeoff.  Need to compute online and consider all annotator-object pairs.  Derive constant-time Bayesian online update 3

Ranking Aggregation  Permutation Based Methods  Mallows Model (Mallows, 1957)  CPS (T.Qin et. al., 10)  Computationally very expensive, approximation heuristics  Score-based Methods  Learn a real number score for each object:  Bradley-Terry model (Bradley and Terry, 1952)  Plackett-Luce (Luce 1959, Plackett, 1975)  Thurstone (Thurstone, 1927)  Low rank approximation (Jiang et al. 09, Gleich et al. 11) 4

Bradley-Terry Model 5 A > B A > C C > D A > B A > C C > D B > D C > A E > D B > D C > A E > D A > B B > C C > A A > B B > C C > A A > B > C > E > D

Roadmap 6 Modeling Annotators’ Reliability for Probabilistic Ranking Active Learning with Exploitation-Exploration Tradeoff Ranking in Multitask Setting

Model the Reliability of Annotators  Types of annotators :  (1) Perfect annotator  (2) Random annotator  (3) Malicious annotator Most annotators are good but imperfect, need to quantify their reliability  Model the reliability of annotators: 7

Interpretation and CrowdBT 8 Perfect annotator : Random annotator : Malicious annotator : Maximize Log-Likelihood

Effect of Annotators’ Average Reliability 9 Conclusion: (1) Avg. Reliability >0.5, CrowdBT works well with all one initialization (2) Avg. Reliability <=0.5, CrowdBT works well with rough estimate of reliability Initialization by the performance on 5 golden pairs No. of Objects :100 (score 1 to 100) No. of Annotator: pairs and each pair labeled by 10 annotators [W. Yih, 09]

Roadmap 10 Modeling Annotators’ Reliability for Probabilistic Ranking Active Learning with Exploitation-Exploration Tradeoff Ranking in Multitask Setting (1)Extend Bradley-Terry model and incorporate annotators’ reliability (2)Distinguish different types of annotators and automatically recovery the error from malicious ones (3)Average Reliability v.s. Initialization strategy

Consensus Ranking for Different Tasks 11

Picture This Dataset 12 For each query and player, there are only 2.79 labeled pairs on average Statistics of Dataset 35 queries (features extracted using ODP classifier) 44,419 players 483,477 pairs of images [Bennett et. al., 2009] Explore the commonality among tasks

Roadmap 13 Modeling Annotators’ Reliability for Probabilistic Ranking Active Learning with Exploitation-Exploration Tradeoff Ranking in Multitask Setting (1)Extend Bradley-Terry model and incorporate annotators’ reliability (2)Distinguish different types of annotators and automatically recovery the error from malicious ones (3)Average Reliability v.s. Initialization strategy (1)Explore the commonality of annotators across tasks (2)Utilize task features in modeling annotators’ reliability

Active Learning  Active Learning  Oracle provides correct answer  Optimally select next sample  Active Learning in the Crowd  Optimally select next sample  Assign the sample to whom ?  Assign uncertain samples to good annotators  Assign the certain samples to test annotators’ reliability Computational Challenges: online learning algorithms 14 Pictures from [Settles 2009] Exploitation-Exploration Tradeoff (a)Exploit labels of uncertain samples from annotators with known reliability (b) Explore annotators’ reliability, discover good annotators Exploitation-Exploration Tradeoff (a)Exploit labels of uncertain samples from annotators with known reliability (b) Explore annotators’ reliability, discover good annotators

Online Learning: Bayesian Modeling  Assign Priors:  Bayesian Inference: 15 Likelihood: Prior: Posterior: Approximation: Two Approximation: (1) Independent (2) Gaussian + Beta distribution

Active Learning 16

Active Learning Simulated Study 17 Too much exploration Too little exploration

Area under the active learning curve 18

Bayesian inference  Many choices: MCMC, variational inference, expectation propagation  How to get constant-time inference for each pair?  Moment-matching !  How to estimate these first and second order moments?  The technique from [Weng et al., 11] (Stein’s Lemma)  Constant-time update ! 19

20 Stein’s Lemma [Woodroofe, 89, Weng et al. 11]

Bayesian inference 21 Perfect Annotator: Random Annotator: Malicious Annotator: Random Annotator: Perfect or Malicious Annotator:

Reading Difficulty Dataset 22 Ratio to Best (0.6843) 5Ran dom 98% % % Dataset 491 articles with difficulty level 1 ~ annotators 12,728 comparisons Provided By Kevyn Collins-Thompson

Reading Difficulty Dataset 23 Dataset 491 articles with difficulty level 1 ~ annotators 12,728 comparisons Provided By Kevyn Collins-Thompson

Roadmap 24 Modeling Annotators’ Reliability for Probabilistic Ranking Active Learning with Exploitation-Exploration Tradeoff Ranking in Multitask Setting (1)Extend Bradley-Terry model and incorporate annotator’s reliability (2)Distinguish different types of annotators and automatically recovery the error from malicious ones (3)Average Reliability v.s. Initialization strategy (1)Explore the commonality of annotators across tasks (2)Utilize task features in modeling annotators’ reliability (1)Active Learning in Crowd: selection of sample & annotator (2)Exploitation-Exploration tradeoff (3)Efficient online Bayesian update

Conclusions and Future Works  Probabilistic Ranking in Crowdsourced Setting  Extend the classical Bradley-Terry model to model the reliability of annotators: techniques can be applied to other models (e.g., Thurstone) !  Saving Cost : Active learning in crowd and explicitly model the exploitation-exploration tradeoff :  Efficient online learning for active learning  Future Works  Active Learning in a batch mode  Value of Information: how much effort should be used to test the performance of annotators? 25