CS 2750: Machine Learning Active Learning and Crowdsourcing

Name: CS 2750: Machine Learning Active Learning and Crowdsourcing
Uploaded: 2017-08-16T18:17:32+00:00
Duration: PTM11S32
Channel: Ruth Arnold
Description: CS 2750: Machine Learning Active Learning and Crowdsourcing

CS 2750: Machine Learning Active Learning and Crowdsourcing
Prof. Adriana Kovashka University of Pittsburgh April 18, 2016

Collecting data on Amazon Mechanical Turk
Workers Task Task: Dog? Broker Answer: Yes Pay: $0.01 Is this a dog? o Yes o No $0.01 Alex Sorokin

Annotation protocols Type keywords Select relevant images
Click on landmarks Outline something ……….. anything else ……… Alex Sorokin

Type keywords $0.01 Alex Sorokin

Select examples $0.02 Alex Sorokin

Outline something $0.01 Alex Sorokin

Motivation X 100 000 = $5000 Custom annotations Large scale Low price
Alex Sorokin

Issues Quality? Price? How good is it? How to be sure?
How to price it? Alex Sorokin

Ensuring Annotation Quality
Consensus / multiple annotation / “wisdom of the crowd” Qualification exam Gold standard questions Grading tasks A second tier of workers who grade others Adapted from Alex Sorokin

Pricing Trade off between throughput and cost
Higher pay can actually attract scammers Some studies find that the most accurate results are achieved if Turkers do tasks for free Adapted from Alex Sorokin

Games with a purpose: Luis von Ahn
Associate professor at CMU One of the “fathers” of crowdsourcing Created the ESP Game, Peekaboom, and several other “games with a purpose”

THE ESP GAME TWO-PLAYER ONLINE GAME
PARTNERS DON’T KNOW EACH OTHER AND CAN’T COMMUNICATE OBJECT OF THE GAME: TYPE THE SAME WORD THE ONLY THING IN COMMON IS AN IMAGE Luis von Ahn and Laura Dabbish. “Labeling Images with a Computer Game.” CHI 2004.

THE ESP GAME PLAYER 1 PLAYER 2 GUESSING: CAR GUESSING: BOY
GUESSING: KID GUESSING: HAT GUESSING: CAR SUCCESS! YOU AGREE ON CAR SUCCESS! YOU AGREE ON CAR Luis von Ahn

THE ESP GAME IS FUN 4.1 MILLION LABELS WITH 23,000 PLAYERS
THERE ARE MANY PEOPLE THAT PLAY OVER 20 HOURS A WEEK Luis von Ahn

WHY DO PEOPLE LIKE THE ESP GAME?
Luis von Ahn

THE ESP GAME GIVES ITS PLAYERS A WEIRD AND BEAUTIFUL SENSE OF ANONYMOUS INTIMACY.
“ THE ESP GAME GIVES ITS PLAYERS A WEIRD AND BEAUTIFUL SENSE OF ANONYMOUS INTIMACY. ON THE ONE HAND, YOU HAVE NO IDEA WHO YOUR PARTNER IS. THE ESP GAME GIVES ITS PLAYERS A WEIRD AND BEAUTIFUL SENSE OF ANONYMOUS INTIMACY. ON THE ONE HAND, YOU HAVE NO IDEA WHO YOUR PARTNER IS. ON THE OTHER HAND, THE TWO OF YOU ARE BRINGING YOUR MINDS TOGETHER IN A WAY THAT LOVERS WOULD ENVY. ” Luis von Ahn

“ ” “ ” “ ” “ ” STRANGELY ADDICTIVE
IT’S SO MUCH FUN TRYNG TO GUESS WHAT OTHERS THINK. YOU HAVE TO STEP OUTSIDE OF YOURSELF TO MATCH ” IT’S FAST-PACED “ ” HELPS ME LEARN ENGLISH “ ” Luis von Ahn

LOCATING OBJECTS IN IMAGES
THE ESP GAME TELLS US IF AN IMAGE CONTAINS A SPECIFIC OBJECT, BUT DOESN’T SAY WHERE IN THE IMAGE THE OBJECT IS SUCH INFORMATION WOULD BE EXTREMELY USEFUL FOR COMPUTER VISION RESEARCH Luis von Ahn

PLAYERS SHOOT AT OBJECTS ON THE IMAGE
PAINTBALL GAME PLAYERS SHOOT AT OBJECTS ON THE IMAGE SHOOT THE: CAR WE GIVE POINTS AND CHECK ACCURACY BY GIVING PLAYERS IMAGES FOR WHICH WE ALREADY KNOW WHERE THE OBJECT IS Luis von Ahn

REVEALING IMAGES GUESSER REVEALER CAR BRUSH CAR BRUSH CAR GUESS
PARTNER’S GUESS Luis von Ahn

Summary: Collecting annotations from humans
Crowdsourcing allows very cheap data collection Getting high-quality annotations can be tricky, but there are many ways to ensure quality One way to obtain high-quality data fast is by phrasing your data collection as a game What to do when data is expensive to obtain?

Crowdsourcing  Active Learning
Unlabeled data Show data, collect and filter labels Find data near decision boundary Training labels Training data Features Classifier training Trained classifier Training James Hays

Active Learning Traditional active learning reduces supervision by obtaining labels for the most informative or uncertain examples first. [Mackay 1992, Freund et al , Tong & Koller 2001, Lindenbaum et al , Kapoor et al ] Sudheendra Vijayanarasimhan

Visual Recognition With Humans in the Loop
ECCV 2010, Crete, Greece Steve Branson Catherine Wah Florian Schroff Boris Babenko Serge Belongie Peter Welinder Pietro Perona

What type of bird is this?
Field guides difficult for average users Computer vision doesn’t work perfectly (yet) Research mostly on basic-level categories Suppose you are taking a hike and come across this bird. You would like to know what kind of bird it is. What would you do? You pull out your birding field guide, you fumble through it forever, and never figure it out (show Sibley) You plug it in to your leading computer vision algorithm (show bird, if you’re lucky, it might also say it’s a chair) S You learn to things 1) field guides don’t work 2) Computer vision just does basic categories and doesn’t perform that well What type of bird is this? Steve Branson

Visual Recognition With Humans in the Loop
What kind of bird is this? Parakeet Auklet Steve Branson

Motivation Supplement visual recognition with the human capacity for visual feature extraction to tackle difficult (fine-grained) recognition problems Typical progress is viewed as increasing data difficulty while maintaining full autonomy Here, the authors view progress as reduction in human effort on difficult data Brian O’Neill

Categories of Recognition
Basic-Level Subordinate Parts & Attributes Airplane? Chair? Bottle? … American Goldfinch? Indigo Bunting?… Yellow Belly? Blue Belly?… Easy for Humans Hard for Humans Easy for Humans Hard for computers Hard for computers Hard for computers Steve Branson

American Goldfinch? yes
Visual 20 Questions Game Blue Belly? no Cone-shaped Beak? yes Striped Wing? yes It doesn’t mean they necessarily answer them correctly American Goldfinch? yes Hard classification problems can be turned into a sequence of easy ones Steve Branson

Recognition With Humans in the Loop
Computer Vision Cone-shaped Beak? yes Computer Vision American Goldfinch? yes Computers: reduce number of required questions Humans: drive up accuracy of vision algorithms Steve Branson

Example Questions Steve Branson

… Basic Algorithm Input Image ( ) Computer Vision
Max Expected Information Gain Input Image ( ) A: NO Question 1: Is the belly black? Max Expected Information Gain A: YES Question 2: Is the bill hooked? … Steve Branson

Some definitions: Set of possible questions
Possible answers to question i Possible confidence in answer i (Guessing, Probably, Definitely) User response History of user responses at time t Brian O’Neill

Question selection Seek the question (e.g. “What color is the belly of the bird?”) that gives the maximum information gain (entropy reduction) given the image and the set of previous user responses Probability of obtaining response ui to evaluated question given image and response history Entropy when response is added to history Entropy at this iteration (before response to evaluated question is added to history) where Brian O’Neill

Results Users drive performance: 19%  68%
Fewer questions asked if CV used Just Computer Vision 19% Adapted from Steve Branson

Summary: Human-in-the-loop learning
To make intelligent use of the human labeling effort during training, have the computer vision algorithm learn actively by selecting those questions that are most informative To combine strengths of human and imperfect vision algorithms, use a human-in-the-loop at recognition time

CS 2750: Machine Learning Active Learning and Crowdsourcing

Similar presentations

Presentation on theme: "CS 2750: Machine Learning Active Learning and Crowdsourcing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 2750: Machine Learning Active Learning and Crowdsourcing

Similar presentations

Presentation on theme: "CS 2750: Machine Learning Active Learning and Crowdsourcing"— Presentation transcript:

Similar presentations

About project

Feedback