Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 2750: Machine Learning Active Learning and Crowdsourcing

Similar presentations


Presentation on theme: "CS 2750: Machine Learning Active Learning and Crowdsourcing"— Presentation transcript:

1 CS 2750: Machine Learning Active Learning and Crowdsourcing
Prof. Adriana Kovashka University of Pittsburgh April 18, 2016

2 Collecting data on Amazon Mechanical Turk
Workers Task Task: Dog? Broker Answer: Yes Pay: $0.01 Is this a dog? o Yes o No $0.01 Alex Sorokin

3 Annotation protocols Type keywords Select relevant images
Click on landmarks Outline something ……….. anything else ……… Alex Sorokin

4 Type keywords $0.01 Alex Sorokin

5 Select examples $0.02 Alex Sorokin

6 Outline something $0.01 Alex Sorokin

7 Motivation X 100 000 = $5000 Custom annotations Large scale Low price
Alex Sorokin

8

9 Issues Quality? Price? How good is it? How to be sure?
How to price it? Alex Sorokin

10 Ensuring Annotation Quality
Consensus / multiple annotation / “wisdom of the crowd” Qualification exam Gold standard questions Grading tasks A second tier of workers who grade others Adapted from Alex Sorokin

11 Pricing Trade off between throughput and cost
Higher pay can actually attract scammers Some studies find that the most accurate results are achieved if Turkers do tasks for free Adapted from Alex Sorokin

12 Games with a purpose: Luis von Ahn
Associate professor at CMU One of the “fathers” of crowdsourcing Created the ESP Game, Peekaboom, and several other “games with a purpose”

13 THE ESP GAME TWO-PLAYER ONLINE GAME
PARTNERS DON’T KNOW EACH OTHER AND CAN’T COMMUNICATE OBJECT OF THE GAME: TYPE THE SAME WORD THE ONLY THING IN COMMON IS AN IMAGE Luis von Ahn and Laura Dabbish. “Labeling Images with a Computer Game.” CHI 2004.

14 THE ESP GAME PLAYER 1 PLAYER 2 GUESSING: CAR GUESSING: BOY
GUESSING: KID GUESSING: HAT GUESSING: CAR SUCCESS! YOU AGREE ON CAR SUCCESS! YOU AGREE ON CAR Luis von Ahn

15 THE ESP GAME IS FUN 4.1 MILLION LABELS WITH 23,000 PLAYERS
THERE ARE MANY PEOPLE THAT PLAY OVER 20 HOURS A WEEK Luis von Ahn

16 WHY DO PEOPLE LIKE THE ESP GAME?
Luis von Ahn

17 THE ESP GAME GIVES ITS PLAYERS A WEIRD AND BEAUTIFUL SENSE OF ANONYMOUS INTIMACY.
THE ESP GAME GIVES ITS PLAYERS A WEIRD AND BEAUTIFUL SENSE OF ANONYMOUS INTIMACY. ON THE ONE HAND, YOU HAVE NO IDEA WHO YOUR PARTNER IS. THE ESP GAME GIVES ITS PLAYERS A WEIRD AND BEAUTIFUL SENSE OF ANONYMOUS INTIMACY. ON THE ONE HAND, YOU HAVE NO IDEA WHO YOUR PARTNER IS. ON THE OTHER HAND, THE TWO OF YOU ARE BRINGING YOUR MINDS TOGETHER IN A WAY THAT LOVERS WOULD ENVY. Luis von Ahn

18 “ ” “ ” “ ” “ ” STRANGELY ADDICTIVE
IT’S SO MUCH FUN TRYNG TO GUESS WHAT OTHERS THINK. YOU HAVE TO STEP OUTSIDE OF YOURSELF TO MATCH IT’S FAST-PACED HELPS ME LEARN ENGLISH Luis von Ahn

19 LOCATING OBJECTS IN IMAGES
THE ESP GAME TELLS US IF AN IMAGE CONTAINS A SPECIFIC OBJECT, BUT DOESN’T SAY WHERE IN THE IMAGE THE OBJECT IS SUCH INFORMATION WOULD BE EXTREMELY USEFUL FOR COMPUTER VISION RESEARCH Luis von Ahn

20 PLAYERS SHOOT AT OBJECTS ON THE IMAGE
PAINTBALL GAME PLAYERS SHOOT AT OBJECTS ON THE IMAGE SHOOT THE: CAR WE GIVE POINTS AND CHECK ACCURACY BY GIVING PLAYERS IMAGES FOR WHICH WE ALREADY KNOW WHERE THE OBJECT IS Luis von Ahn

21 REVEALING IMAGES GUESSER REVEALER CAR BRUSH CAR BRUSH CAR GUESS
PARTNER’S GUESS Luis von Ahn

22 Summary: Collecting annotations from humans
Crowdsourcing allows very cheap data collection Getting high-quality annotations can be tricky, but there are many ways to ensure quality One way to obtain high-quality data fast is by phrasing your data collection as a game What to do when data is expensive to obtain?

23 Crowdsourcing  Active Learning
Unlabeled data Show data, collect and filter labels Find data near decision boundary Training labels Training data Features Classifier training Trained classifier Training James Hays

24 Active Learning Traditional active learning reduces supervision by obtaining labels for the most informative or uncertain examples first. [Mackay 1992, Freund et al , Tong & Koller 2001, Lindenbaum et al , Kapoor et al ] Sudheendra Vijayanarasimhan

25 Visual Recognition With Humans in the Loop
ECCV 2010, Crete, Greece Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona

26 What type of bird is this?
Field guides difficult for average users Computer vision doesn’t work perfectly (yet) Research mostly on basic-level categories Suppose you are taking a hike and come across this bird. You would like to know what kind of bird it is. What would you do? You pull out your birding field guide, you fumble through it forever, and never figure it out (show Sibley) You plug it in to your leading computer vision algorithm (show bird, if you’re lucky, it might also say it’s a chair) S You learn to things 1) field guides don’t work 2) Computer vision just does basic categories and doesn’t perform that well What type of bird is this? Steve Branson

27 Visual Recognition With Humans in the Loop
What kind of bird is this? Parakeet Auklet Steve Branson

28 Motivation Supplement visual recognition with the human capacity for visual feature extraction to tackle difficult (fine-grained) recognition problems Typical progress is viewed as increasing data difficulty while maintaining full autonomy Here, the authors view progress as reduction in human effort on difficult data Brian O’Neill

29 Categories of Recognition
Basic-Level Subordinate Parts & Attributes Airplane? Chair? Bottle? … American Goldfinch? Indigo Bunting?… Yellow Belly? Blue Belly?… Easy for Humans Hard for Humans Easy for Humans Hard for computers Hard for computers Hard for computers Steve Branson

30 American Goldfinch? yes
Visual 20 Questions Game Blue Belly? no Cone-shaped Beak? yes Striped Wing? yes It doesn’t mean they necessarily answer them correctly American Goldfinch? yes Hard classification problems can be turned into a sequence of easy ones Steve Branson

31 Recognition With Humans in the Loop
Computer Vision Cone-shaped Beak? yes Computer Vision American Goldfinch? yes Computers: reduce number of required questions Humans: drive up accuracy of vision algorithms Steve Branson

32 Example Questions Steve Branson

33 Example Questions Steve Branson

34 Example Questions Steve Branson

35 … Basic Algorithm Input Image ( ) Computer Vision
Max Expected Information Gain Input Image ( ) A: NO Question 1: Is the belly black? Max Expected Information Gain A: YES Question 2: Is the bill hooked? Steve Branson

36 Some definitions: Set of possible questions
Possible answers to question i Possible confidence in answer i (Guessing, Probably, Definitely) User response History of user responses at time t Brian O’Neill

37 Question selection Seek the question (e.g. “What color is the belly of the bird?”) that gives the maximum information gain (entropy reduction) given the image and the set of previous user responses Probability of obtaining response ui to evaluated question given image and response history Entropy when response is added to history Entropy at this iteration (before response to evaluated question is added to history) where Brian O’Neill

38 Results Users drive performance: 19%  68%
Fewer questions asked if CV used Just Computer Vision 19% Adapted from Steve Branson

39 Summary: Human-in-the-loop learning
To make intelligent use of the human labeling effort during training, have the computer vision algorithm learn actively by selecting those questions that are most informative To combine strengths of human and imperfect vision algorithms, use a human-in-the-loop at recognition time


Download ppt "CS 2750: Machine Learning Active Learning and Crowdsourcing"

Similar presentations


Ads by Google