C ROWD C ENTRALITY David Karger Sewoong Oh Devavrat Shah MIT and UIUC.

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

David Karger Sewoong Oh Devavrat Shah MIT + UIUC.

Incentivize Crowd Labeling under Budget Constraint

Image Modeling & Segmentation

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Linear Regression.

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

T HE POWER OF C ONVEX R ELAXATION : N EAR - OPTIMAL MATRIX COMPLETION E MMANUEL J. C ANDES AND T ERENCE T AO M ARCH, 2009 Presenter: Shujie Hou February,

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.

Markov Networks.

Visual Recognition Tutorial

Pattern Recognition and Machine Learning

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Differentially expressed genes

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.

Maximum Likelihood (ML), Expectation Maximization (EM)

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Lecture 9 Today: –Log transformation: interpretation for population inference (3.5) –Rank sum test (4.2) –Wilcoxon signed-rank test (4.4.2) Thursday: –Welch’s.

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.

Radial Basis Function Networks

Statistical Decision Theory

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers Victor Sheng, Foster Provost, Panos Ipeirotis KDD 2008 New York.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

St5219: Bayesian hierarchical modelling lecture 2.1.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel

Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel

Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Lecture 2: Statistical learning primer for biologists

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

Analyzing Expression Data: Clustering and Stats Chapter 16.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:

Machine Learning 5. Parametric Methods.

CGH Data BIOS Chromosome Re-arrangements.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.

Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.

Semantic Alignment Spring 2009 Ben-Gurion University of the Negev.

KNN & Naïve Bayes Hongning Wang

S IMILARITY E STIMATION T ECHNIQUES FROM R OUNDING A LGORITHMS Paper Review Jieun Lee Moses S. Charikar Princeton University Advanced Database.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Bayesian Conditional Random Fields using Power EP Tom Minka Joint work with Yuan Qi and Martin Szummer.

Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.

1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.

Ranking: Compare, Don’t Score Ammar Ammar, Devavrat Shah (LIDS – MIT) Poster ( No preprint), WIDS 2011.

Learning Deep Generative Models by Ruslan Salakhutdinov

Probability Theory and Parameter Estimation I

Xi Chen Mentor: Denny Zhou In collaboration with: Qihang Lin

Mathematical Foundations of BME Reza Shadmehr

Learning Theory Reza Shadmehr

Mathematical Foundations of BME

Markov Networks.

Optimization under Uncertainty

Presentation transcript:

C ROWD C ENTRALITY David Karger Sewoong Oh Devavrat Shah MIT and UIUC

C ROWD S OURCING

$30 million to land on moon $0.05 for Image Labeling Data Entry Transcription

M ICRO - TASK C ROWDSOURCING

Which door is the women’s restroom? Right Left M ICRO - TASK C ROWDSOURCING

Undergrad Intern: Mturk (single label): 200 image/hr, cost: $15/hr 4000 image/hr, cost: $15/hr Reliability 90% 65% Mturk (mult. labels): 500 image/hr, cost: $15/hr 90% Find cancerous tumor cells

T HE P ROBLEM Goal: Reliable estimate the tasks with minimal cost Operational questions: Task assignment Inferring the “answers”

T ASK A SSIGNMENT Random (, )-regular bipartite graphs Locally Tree-like  Sharp analysis Good expander  High Signal to Noise Ratio Tasks Batches

M ODELING T HE C ROWD Binary tasks: Worker reliability: Necessary assumption: we know A ij +-+-+

I NFERENCE P ROBLEM Majority: Oracle: titi p1p1 p2p2 p3p3 p4p4 p5p5

I NFERENCE P ROBLEM Majority: Oracle: Our Approach: p1p1 p2p2 p3p3 p4p4 p5p5

P REVIEW OF R ESULTS Distribution of {p j }: observed to be Beta distribution by Holmes ‘10 + Ryker et al ‘10 EM algorithm : Dawid, Skene ‘79 + Sheng, Provost, Ipeirotis ‘10

P REVIEW OF R ESULTS

I TERATIVE I NFERENCE Iteratively learn Message-passing O(# edges) operations Approximate MAP p1p1 p2p2 p3p3 p4p4 p5p5

E XPERIMENTS : A MAZON MT URK Learning similarities Recommendations Searching, …

E XPERIMENTS : A MAZON MT URK Learning similarities Recommendations Searching, …

E XPERIMENTS : A MAZON MT URK

T ASK A SSIGNMENT : W HY R ANDOM G RAPH

K EY M ETRIC : Q UALITY OF C ROWD Crowd Quality Parameter p1p1 p2p2 p3p3 p4p4 p5p5 Theorem (Karger-Oh-Shah). Let n tasks assigned to n workers as per an ( l,l) random regular graph Let ql > √2 Then, for all n large enough (i.e. n =Ω( l O(log(1/q)) e lq ))) after O(log (1/q)) iterations of the algorithm If p j = 1 for all j q = 1 If p j = 0.5 for all j q = 0 q different from μ 2 = (E[2p-1]) 2 q≤μ≤√q

H OW G OOD I S T HIS ? To achieve target P error ≤ε, we need Per task budget l = Θ(1/q log ( 1/ε)) And this is minimax optimal Under majority voting (with any graph choice) Per task budget required is l = Ω(1/q 2 log (1/ε)) no significant gain by knowing side-information (golden question, reputation, …!) no significant gain by knowing side-information (golden question, reputation, …!)

A DAPTIVE T ASK A SSIGNMENT : D OES IT H ELP ? Theorem (Karger-Oh-Shah). Given any adaptive algorithm, let Δbe the average number of workers required per task to achieve desired P error ≤ε Then there exists { p j } with quality q so that gain through adaptivity is limited

W HICH C ROWD T O E MPLOY

B EYOND B INARY T ASKS Tasks: Workers: Assume p j ≥ 0.5 for all j Let q be quality of { p j } Results for binary task extend to this setting Per task, number of workers required scale as O(1/q log (1/ε) + 1/q log K) To achieve P error ≤ ε

B EYOND B INARY T ASKS Converting to K-1 binary problems each with quality ≥ q For each x, 1 < x ≤ K: A ij (x) = +1 if A ij ≥ x, and -1 otherwise t i (x) = +1 if t i ≥ x, and -1 otherwise Then Corresponding quality q (x) ≥ q Using result for binary problem, we have P error (x) ≤ exp(- lq /16) Therefore P error ≤ P error (2) + … + P error (K) ≤ K exp(- lq /16)

W HY A LGORITHM W ORKS ? MAP estimation Prior on probability { p j } Let f(p) be density over [0,1] Answers A=[ A ij ] Then, Belief propagation (max-product) algorithm for MAP With Haldane prior: p j is 0 or 1 with equal probability Iteration k+1: for all task-worker pairs (i,j) X i /Y j represent log likelihood ratio for t i /p j = +1 vs -1 This is exactly the same as our algorithm! And our random task assignment graph is tree-like That is, our algorithm is effectively MAP for Haldane prior

A minor variation of this algorithm T i next = T ij next = Σ W ij’ A ij’ = Σ W j’ A ij’ W j next = W ij next = Σ T i’j A i’j = Σ T i’ A i’j Then, T next = AA T T (subject to this modification) our algorithm is computing Left signular vector of A (corresponding to largest s.v.) So why compute rank-1 approximation of A ? W HY A LGORITHM W ORKS ?

Random graph + probabilistic model E[A ij ] = (t i p j - (1-p j )t i ) l /n = t i (2p j -1) l /n E[A] = t ( 2p - 1 ) T l /n That is, E[A] is rank-1 matrix And, t is the left singular vector of E[A] If A ≈ E[A] Then computing left singular vector of A makes sense Building upon Friedman-Kahn-Szemeredi ‘89 Singular vector of A provides reasonable approximation P error = O(1/ lq ) Ghosh, Kale, Mcafee ’12 For sharper result we use belief propagation

C ONCLUDING R EMARKS Budget optimal micro-task crowd sourcing via Random regular task allocation graph Belief propagation Key messages All that matters is quality of crowd Worker reputation is not useful for non-adaptive tasks Adaptation does not help due to fleeting nature of workers Reputation + worker id needed for adaptation to be effective Inference algorithm can be useful for assigning reputation Model of binary task is equivalent to K-ary tasks

O N T HAT N OTE …