CrowdFlow Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility Alex Quinn, Ben Bederson, Tom Yeh, Jimmy Lin.

Slides:



Advertisements
Similar presentations
Monotrans: Human-Computer Collaborative Translation Chang Hu, Ben Bederson, Philip Resnik Human-Computer Interaction Lab Computational Linguistics and.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
CrowdER - Crowdsourcing Entity Resolution
Appsheet: Crowdsourcing Decision Support Efficiently Alex Quinn, Tom Yeh, Ben Bederson.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
DIGITAL WORKFLOW The Essential Reference Guide for Filmmakers.
Applying Crowd Sourcing and Workflow in Social Conflict Detection By: Reshmi De, Bhargabi Chakrabarti 28/03/13.
Running Experiments on Mechanical Turk: Day 1 Tim Brady What is Turk? Who are Turkers? How much do I pay? Are Turk workers good subjects? Setting up a.
Presenter: Chien-Ju Ho  Introduction to Amazon Mechanical Turk  Applications  Demographics and statistics  The value of using MTurk Repeated.
Amazon Mechanical Turk (Mturk) What is MTurk? – Crowdsourcing Internet marketplace that utilizes human intelligence to perform tasks that computers are.
The Viola/Jones Face Detector Prepared with figures taken from “Robust real-time object detection” CRL 2001/01, February 2001.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Crowdsourcing research data UMBC ebiquity,
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
Dynamic Face Recognition Committee Machine Presented by Sunny Tang.
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Three kinds of learning
Input and Output By Stephen Folmer Comm 165 MWF 9-10 Dr. Cagel By Stephen Folmer Comm 165 MWF 9-10 Dr. Cagel.
Database Design Concepts INFO1408 Term 2 week 1 Data validation and Referential integrity.
Input and Outputs Leslie Flores. What is an Input? Input consists of data and instructions. Input devices translate what people understand into a form.
Software Process and Product Metrics
Introduction to machine learning
Face Detection CSE 576. Face detection State-of-the-art face detection demo (Courtesy Boris Babenko)Boris Babenko.
Automation and Drives Vision Sensor SIMATIC VS 110 Image processing without the need for specialist knowledge.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology.
Copyright © 2008 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin 4 Activity-Based Costing Systems.
Beyond datasets: Learning in a fully-labeled real world Thesis proposal Alexander Sorokin.
Human tracking and counting using the KINECT range sensor based on Adaboost and Kalman Filter ISVC 2013.
Writing a Program Chapter 1. Introduction We all “start” by learning how to code in some programming language. –With a small, hypothetical, and fairly.
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
Chapter 7 Business Process Redesign Reference: Tan, A. (2007). Business Process Reengineering in Asia: A Practical Approach, Pearson Education, Singapore.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Christopher Harris Informatics Program The University of Iowa Workshop on Crowdsourcing for Search and Data Mining (CSDM 2011) Hong Kong, Feb. 9, 2011.
Chapter 17 – Additional Topics in Variance Analysis
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
1.2. Comparing Algorithms. Learning outcomes Understand that algorithms can be compared by expressing their complexity as a function relative to the size.
Exploration Seminar 3 Human Computation Roy McElmurry.
Problem description and pipeline
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Crowdsourcing: Ethics, Collaboration, Creativity KSE 801 Uichin Lee.
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
IXA 1234 : C++ PROGRAMMING CHAPTER 1. PROGRAMMING LANGUAGE Programming language is a computer program that can solve certain problem / task Keyword: Computer.
Augmented Reality and 3D modelling By Stafford Joemat Supervised by Mr James Connan.
The Beauty and Joy of Computing Lecture #6 Algorithms MIT researchers recently created an algorithm which they say will be able to predict what topics.
Introduction to Dialogue Systems. User Input System Output ?
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Crisp Answers to Fuzzy Questions: Design lessons for crowdsourcing decision inputs Alex Quinn, Ben Bederson.
A Lesson on Copyright Laws for 5 – 8 grade students.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.
By Franklin Kramer.   Crowdsourcing web service  Have turkers complete HITs for small amounts of money (most being 1-25 cents)  Can filter workers.
Crowdsourcing High Quality Labels with a Tight Budget Qi Li 1, Fenglong Ma 1, Jing Gao 1, Lu Su 1, Christopher J. Quinn 2 1 SUNY Buffalo; 2 Purdue University.
Introduction to Classification & Clustering Villanova University Machine Learning Lab Module 4.
Copyright © 2008 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin 4 Activity-Based Costing Systems.
STARTER ACTIVITY To a partner explain the definition of skill and technique giving examples from badminton. What makes a skilled performer? (YOU HAVE 2.
Introduction to Classification & Clustering
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
Deep Learning Amin Sobhani.
Session 7: Face Detection (cont.)
Welcome to our speech perception experiment!
Artificial Intelligence
Evaluate the limit: {image} Choose the correct answer from the following:
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Chapter 1: Creating a Program.
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Given that {image} {image} Evaluate the limit: {image} Choose the correct answer from the following:
Presentation transcript:

CrowdFlow Integrating Machine Learning with Mechanical Turk for Speed-Cost-Quality Flexibility Alex Quinn, Ben Bederson, Tom Yeh, Jimmy Lin

Human Computation Things HUMANS can do Things COMPUTERS can do Translation Photo tagging Face recognition Human detection Speech recognition Text analysis Planning

Human Computation Things HUMANS can do Things COMPUTERS can do Translation Photo tagging Face recognition Speech recognition Human detection Text analysis Planning

Example: Human detection

Trade-off space Quality Speed, Affordability Computers Human Workers (traditional) Human Computation

Trade-off space Quality Speed, Affordability Computers Human Computation Human Workers (traditional)

Man-Computer Symbiosis Automation with human post-correction Supervised machine learning humans computer speed cost quality computer humans speed cost quality

Man-Computer Symbiosis CrowdFlow Automation with human post-correction Supervised machine learning humans computer speed cost quality humans computer speed cost quality computer humans speed cost quality

Mechanical Turk

Human Detection – Starting point

Human Detection – Task

Human Detection – Results Quality Speed, Affordability 60%90% 119 images took 3 hrs 50 mins and cost $2.38

Human Detection – Scenarios Quality Speed, Affordability 60%90% 1000 photos at 72% accuracy would take 12 hrs 20 mins and cost $ images took 3 hrs 50 mins and cost $2.38

Vision: Richer model Input with computer results Validator Appraiser Fixer Worker Correct Incorrect Fix Start over Output

Lessons Learned  Design for overall needs/constraints  Practical advice: Pay consistently and reasonably Reject only work that is definitely cheating Build in fair cheating deterrence from the start Keep instructions short, but always clear Contact: Alex Quinn

Cheating  Earlier naïve experiment: 2000 reviews classified by 3 Turkers each 91% of work was cheated by 9 bad Turkers

Cheating Deterrence  Mix in task instances with known answers  Keep track of each worker’s accuracy  Warning after 10 HITs of <70% accuracy  Block after 20 HITs of <70% accuracy  Thresholds are problem-specific  Other mechanisms Approve payment only after inspection Filter workers based on approval record

Ideal Pricing  Pay proportional to Turker effort  Choose a reasonable hourly rate  Example: Confirming correct answer: 10 seconds Fixing an incorrect answer: 60 seconds Answering from scratch: 50 seconds If machine < 80%, bypass machine results  Need to adjust for human accuracy!

Sentiment Polarity – Example 1 “Skim each movie review and decide whether it is positive or negative....” ○ positive ○ negative

Sentiment Polarity – Results  1083 movie reviews grouped into 361 HITs  Cost: $ ¢ per movie review (5¢ per HIT)  Time: 8 hours 7 minutes 27 seconds per movie review  Human accuracy: 90%  Machine accuracy: 83.5%

Sentiment Polarity – Scenarios  Given: 100,000 movie reviews Cost constraint: $1000  Expect: Humans do 66,714; machines do the rest 78% combined accuracy 18 days, 17 hours, 40 minutes

Review: Monotrans Quality Affordability Machine Translation Machine Translation Professional Bilingual Human Participation Amateur Bilingual Human Participation Monolingual Human Participation Monolingual Human Participation