Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning Department Carnegie Mellon University

Judgment  Judgment is important in many domains, e.g. search, ads, game.  Judgment type: Absolute vs. Relative  Absolute judgments are noisy and less agreement  Relative judgments have much higher agreement.  Relative judgments are faster per judgment.  Judgment in crowdsourced settings: judgments provided by multiple annotators 2 A: 5 B: 3 C: 1 A: 5 B: 3 C: 1 A > B A > C B > C A > B A > C B > C [Carterette, et al. ECIR 2008 ] Pair-wise Comparisons

Challenges for Consensus from Crowd  Not all annotators are equally reliable.  This is true in many settings but exacerbated in crowd setting.  Model annotators’ reliability and incorporate into a well-studied probabilistic ranking aggregation framework.  Reliability can depend on how well the annotator fits the task.  An annotator’s reliability may depend on both characteristics of the annotator and characteristics of the task  Introduce pooling across tasks and incorporate representation of task features into model.  Maximum quality for minimum cost but we not only need to choose what objects to label but who should label them.  Active learning with no oracle.  Formalize as explore-exploit objective tradeoff.  Need to compute online and consider all annotator-object pairs.  Derive constant-time Bayesian online update 3

Ranking Aggregation  Permutation Based Methods  Mallows Model (Mallows, 1957)  CPS (T.Qin et. al., 10)  Computationally very expensive, approximation heuristics  Score-based Methods  Learn a real number score for each object:  Bradley-Terry model (Bradley and Terry, 1952)  Plackett-Luce (Luce 1959, Plackett, 1975)  Thurstone (Thurstone, 1927)  Low rank approximation (Jiang et al. 09, Gleich et al. 11) 4

Bradley-Terry Model 5 A > B A > C C > D A > B A > C C > D B > D C > A E > D B > D C > A E > D A > B B > C C > A A > B B > C C > A A > B > C > E > D

Roadmap 6 Modeling Annotators’ Reliability for Probabilistic Ranking Active Learning with Exploitation-Exploration Tradeoff Ranking in Multitask Setting

Model the Reliability of Annotators  Types of annotators :  (1) Perfect annotator  (2) Random annotator  (3) Malicious annotator Most annotators are good but imperfect, need to quantify their reliability  Model the reliability of annotators: 7

Interpretation and CrowdBT 8 Perfect annotator : Random annotator : Malicious annotator : Maximize Log-Likelihood

Effect of Annotators’ Average Reliability 9 Conclusion: (1) Avg. Reliability >0.5, CrowdBT works well with all one initialization (2) Avg. Reliability <=0.5, CrowdBT works well with rough estimate of reliability Initialization by the performance on 5 golden pairs No. of Objects :100 (score 1 to 100) No. of Annotator: 100 400 pairs and each pair labeled by 10 annotators [W. Yih, 09]

Roadmap 10 Modeling Annotators’ Reliability for Probabilistic Ranking Active Learning with Exploitation-Exploration Tradeoff Ranking in Multitask Setting (1)Extend Bradley-Terry model and incorporate annotators’ reliability (2)Distinguish different types of annotators and automatically recovery the error from malicious ones (3)Average Reliability v.s. Initialization strategy

Consensus Ranking for Different Tasks 11

Picture This Dataset 12 For each query and player, there are only 2.79 labeled pairs on average Statistics of Dataset 35 queries (features extracted using ODP classifier) 44,419 players 483,477 pairs of images [Bennett et. al., 2009] Explore the commonality among tasks

Roadmap 13 Modeling Annotators’ Reliability for Probabilistic Ranking Active Learning with Exploitation-Exploration Tradeoff Ranking in Multitask Setting (1)Extend Bradley-Terry model and incorporate annotators’ reliability (2)Distinguish different types of annotators and automatically recovery the error from malicious ones (3)Average Reliability v.s. Initialization strategy (1)Explore the commonality of annotators across tasks (2)Utilize task features in modeling annotators’ reliability

Active Learning  Active Learning  Oracle provides correct answer  Optimally select next sample  Active Learning in the Crowd  Optimally select next sample  Assign the sample to whom ?  Assign uncertain samples to good annotators  Assign the certain samples to test annotators’ reliability Computational Challenges: online learning algorithms 14 Pictures from [Settles 2009] Exploitation-Exploration Tradeoff (a)Exploit labels of uncertain samples from annotators with known reliability (b) Explore annotators’ reliability, discover good annotators Exploitation-Exploration Tradeoff (a)Exploit labels of uncertain samples from annotators with known reliability (b) Explore annotators’ reliability, discover good annotators

Online Learning: Bayesian Modeling  Assign Priors:  Bayesian Inference: 15 Likelihood: Prior: Posterior: Approximation: Two Approximation: (1) Independent (2) Gaussian + Beta distribution

Active Learning 16

Active Learning Simulated Study 17 Too much exploration Too little exploration

Area under the active learning curve 18

Bayesian inference  Many choices: MCMC, variational inference, expectation propagation  How to get constant-time inference for each pair?  Moment-matching !  How to estimate these first and second order moments?  The technique from [Weng et al., 11] (Stein’s Lemma)  Constant-time update ! 19

20 Stein’s Lemma [Woodroofe, 89, Weng et al. 11]

Bayesian inference 21 Perfect Annotator: Random Annotator: Malicious Annotator: Random Annotator: Perfect or Malicious Annotator:

Reading Difficulty Dataset 22 Ratio to Best (0.6843) 5Ran dom 98%1850140036507250 95%70085024505350 90%4004508502150 Dataset 491 articles with difficulty level 1 ~ 12 624 annotators 12,728 comparisons Provided By Kevyn Collins-Thompson

Reading Difficulty Dataset 23 Dataset 491 articles with difficulty level 1 ~ 12 624 annotators 12,728 comparisons Provided By Kevyn Collins-Thompson

Roadmap 24 Modeling Annotators’ Reliability for Probabilistic Ranking Active Learning with Exploitation-Exploration Tradeoff Ranking in Multitask Setting (1)Extend Bradley-Terry model and incorporate annotator’s reliability (2)Distinguish different types of annotators and automatically recovery the error from malicious ones (3)Average Reliability v.s. Initialization strategy (1)Explore the commonality of annotators across tasks (2)Utilize task features in modeling annotators’ reliability (1)Active Learning in Crowd: selection of sample & annotator (2)Exploitation-Exploration tradeoff (3)Efficient online Bayesian update

Conclusions and Future Works  Probabilistic Ranking in Crowdsourced Setting  Extend the classical Bradley-Terry model to model the reliability of annotators: techniques can be applied to other models (e.g., Thurstone) !  Saving Cost : Active learning in crowd and explicitly model the exploitation-exploration tradeoff :  Efficient online learning for active learning  Future Works  Active Learning in a batch mode  Value of Information: how much effort should be used to test the performance of annotators? 25

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

Similar presentations

Presentation on theme: "Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

Similar presentations

Presentation on theme: "Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback