Crowd Algorithms Hector Garcia-Molina, Stephen Guo, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Petros Venetis, Jennifer Widom Stanford and UC.

Crowd Algorithms Hector Garcia-Molina, Stephen Guo, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Petros Venetis, Jennifer Widom Stanford and UC Santa Cruz Scoop — The Stanford – Santa Cruz Project for Cooperative Computing with Algorithms, Data, and People

2 The Goal Design Fundamental Algorithms for Human Computation Latency Cost Uncertainty Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers? Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers?

3 The Problems Sort / Max GraphSearch Categorize Filter Crowd- Latency Cost Uncertainty : Difficult! Progress! [VLDB 2011] The focus of this talk. Summaries of the rest

Filters 4 Dataset of Items Predicate 1 Predicate 2 …… Predicate k Is this image that of Bytes Café ? Is the image blurry? Does it show people’s faces? Filtered Dataset  Given: —Error Probability (FP/FN) & Selectivity for each predicate —Desired Overall Error Probability  To: Compose a filtering strategy —Minimize Overall Cost (# of questions) Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers? Which questions do I ask? When do I ask the questions? When do I stop? How do I combine the answers?

Single Filter  Surprisingly difficult!  Need to meet an overall error threshold —Say, up to 10% of my images may be wrongly filtered  Minimize overall expected number of questions  Boils down to the following: —Take one item —Ask some questions Results in a certain number of (Y, N) for a given item —Do I stop (if so, what do I return), or do I continue asking? 5 Dataset of Items Predicate 1 Filtered Dataset

Hasn’t this been done before?  Solutions from statistics guarantee the same error per item —Important on contexts like: Automobile testing Diagnosis  We’re worried about aggregate error over all items: a uniquely data-oriented problem —I don’t care if every image is perfect as long as the overall error is met. —As we will see, results in $$$ savings 6

Strategies 7 YES = 5, NO = 6 Return “Passed” YES = 5, NO = 6 Return “Passed” YES Answers NO Answers NO Answers YES = 3, NO = 7 Return “Failed” YES = 3, NO = 7 Return “Failed” YES = 3, NO = 5 Continue YES = 3, NO = 5 Continue Reformulated Task: For each point in grid : Return Pass/Fail/Cont. Equivalently, Find the best shape and color it! Reformulated Task: For each point in grid : Return Pass/Fail/Cont. Equivalently, Find the best shape and color it! Start here, with no questions

Common Strategies  Always ask X questions, return most likely answer —The triangle shape  If you get X YES, return “Pass” or Y NO, return “Fail”, else keep asking. —Rectangular shape  Ask until |#YES - #NO| > X, or at most Y questions —Chopped off rectangle —Anhai’s work on MOBS 8

Summary of Results  A characterization of which “shapes” are optimal  A optimal PTIME “probabilistic” approach —LP leveraging the inherent DP structure —Optimal: Strategy with minimum overall cost for given parameters and requirements —Probabilistic: Probability of “Pass” “Fail” “Continue” 9

Empirical Results  Evaluation on 10000 synthetic scenarios  Tested: —Optimal, Brute Force, Statistical, 5 Heuristic Algorithms  Optimal Probabilistic issues fewer questions overall —15% savings on average compared to brute force 32% savings when optimal wins —22% savings on average compared to the statistics approach 49% savings when optimal wins 10 Translates to $$$ for many items !! Generate Parameters Other Algorithms Brute Force Deterministic Brute Force Deterministic Optimal Probabilistic COST1 >> COST2 COST3 >>

Crowd-Max/Sort  The problem(s): —Find the strategy of sorting n items Given: Probability of error for a comparison Given: Desired threshold on error,#questions,#rounds  Sorting automatically given evidence —NP-Hard even for a simple probability of error model —Related work in the area of voting theory, economics  Which r questions do we ask next? 11 Ask all pairs a total of 2k/n times Tournament, with k repetitions at each level One question in each round Decreasing Parallelism More Accuracy

Crowd-GraphSearch Image Categorization Example 12 vehicle car nissanhondatoyota maximasentra To attach: image of a honda car Is image one of vehicle? YES! Is image one of toyota? NO! Is image one of honda? YES! target node = intended category Is the image one of X? = Is the target node reachable from X? Find the target node by asking minimum number of search questions. target node = intended category Is the image one of X? = Is the target node reachable from X? Find the target node by asking minimum number of search questions.

Crowd-Categorize  k buckets, n items  Categorize every item, overall error < threshold  For k = 1, same as filters problem  Two versions: —Discrete Independent (like in the filters case) Dependent buckets (e.g., colors, GraphSearch) —Continuous (e.g., age) 13 ……. Dataset of Items

14 Questions?

Crowd Algorithms Hector Garcia-Molina, Stephen Guo, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Petros Venetis, Jennifer Widom Stanford and UC.

Similar presentations

Presentation on theme: "Crowd Algorithms Hector Garcia-Molina, Stephen Guo, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Petros Venetis, Jennifer Widom Stanford and UC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Crowd Algorithms Hector Garcia-Molina, Stephen Guo, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Petros Venetis, Jennifer Widom Stanford and UC.

Similar presentations

Presentation on theme: "Crowd Algorithms Hector Garcia-Molina, Stephen Guo, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Petros Venetis, Jennifer Widom Stanford and UC."— Presentation transcript:

Similar presentations

About project

Feedback