David Karger Sewoong Oh Devavrat Shah MIT + UIUC.

Presentation on theme: "David Karger Sewoong Oh Devavrat Shah MIT + UIUC."— Presentation transcript:

David Karger Sewoong Oh Devavrat Shah MIT + UIUC

o A patient is asked: rate your pain on scale 1-10 o Medical student gets answer : 5 o Intern gets answer : 8 o Fellow gets answer : 4.5 o Doctor gets answer : 6 o So what is the “right” amount of pain? o Crowd-sourcing o Pain of patient = task o Answer of patient = completion of task by a worker

o Goal: reliable estimate the tasks with min’l cost o Key operational questions: o Task assignment o Inferring the “answers”

o N tasks o Denote by t 1, t 2, …, t N – “true” value in {1,..,K} o M workers o Denote by w 1, w 2, …, w M – “confusion” matrix o Worker j: confusion matrix P j =[P j kl ] o Worker j’s answer: is l for task with value k with prob. P j kl o Binary symmetric case o K = 2: tasks takes value +1 or -1 o Correct answer w.p. p j

t1t1 tNtN t2t2 t N-1 w1w1 w2w2 w M-1 wMwM A 11 A N-1 1 A N2 A 2M o Binary tasks: o Worker reliability: o Necessary assumption: we know

o Goal: given N tasks o To obtain answer correctly w.p. at least 1-ε o What is the minimal number of questions (edges) needed? o How to assign them, and how to infer tasks values? t1t1 tNtN t2t2 t N-1 w1w1 w2w2 w M-1 wMwM A 11 A N-1 1 A N2 A 2M

o Task assignment graph o Random regular graph o Or, regular graph w large girth t1t1 tNtN t2t2 t N-1 w1w1 w2w2 w M-1 wMwM A 11 A N-1 1 A N2 A 2M

o Majority: o Oracle: t1t1 tNtN t2t2 t N-1 w1w1 w2w2 w M-1 wMwM A 11 A N-1 1 A N2 A 2M

o Majority: o Oracle: o Our Approach: t1t1 tNtN t2t2 t N-1 w1w1 w2w2 w M-1 wMwM A 11 A N-1 1 A N2 A 2M

o Iteratively learn o Message-passing o O(# edges) operations o Approximation of o Maximum Likelihood t1t1 tNtN t2t2 t N-1 w1w1 w2w2 w M-1 wMwM A 11 A N-1 1 A N2 A 2M

t1t1 tNtN t2t2 t N-1 w1w1 w2w2 w M-1 wMwM A 11 A N-1 1 A N2 A 2M o Theorem (Karger-Oh-Shah). o Let n tasks assigned to n workers as per o an (l,l) random regular graph o Let ql > √2 o Then, for all n large enough (i.e. n =Ω(l O(log(1/q)) e lq ))) after O(log (1/q)) iterations of the algorithm Crowd Quality

o To achieve target P error ≤ε, we need o Per task budget l = Θ(1/q log (1/ε)) o And this is minimax optimal o Under majority voting (with any graph choice) o Per task budget required is l = Ω(1/q 2 log (1/ε)) no significant gain by knowing side-information (golden question, reputation, …!)

Theorem (Karger-Oh-Shah). Given any adaptive algorithm, let Δ be the average number of workers required per task to achieve desired P error ≤ε Then there exists {p j } with quality q so that gain through adaptivity is limited

Theorem (Karger-Oh-Shah). To achieve reliability 1-ε, per task redundancy scales as K/q (log 1/ε + log K) Through reducing K-ary problem to K-binary problems (and dealing with few asymmetries)

o Learning similarities o Recommendations o Searching, …

o Learning similarities o Recommendations o Searching, …

o Crow-sourcing o Regular graph + message passing o Useful for designing surveys/taking polls o Algorithmically o Iterative algorithm is like power-iteration o Beyond stand-alone tasks o Learning global structure, e.g. ranking