Presentation is loading. Please wait.

Presentation is loading. Please wait.

Beyond datasets: Learning in a fully-labeled real world Thesis proposal Alexander Sorokin.

Similar presentations


Presentation on theme: "Beyond datasets: Learning in a fully-labeled real world Thesis proposal Alexander Sorokin."— Presentation transcript:

1 Beyond datasets: Learning in a fully-labeled real world Thesis proposal Alexander Sorokin

2 Research projects

3

4 Thesis

5

6 Motivation

7 Task Amazon Mechanical Turk Is this a dog? o Yes o No Workers Answer: Yes Task: Dog? Pay: $0.01 Broker www.mturk.com $0.01

8 Select examples Joint work with Tamara and Alex Berg http://vision.cs.uiuc.edu/annotation/data/simpleevaluation/html/horse.html

9 Click on landmarks $0.01 http://vision-app1.cs.uiuc.edu:8080/mt/results/people14-batch11/p7/

10 Outline something $0.01 http://vision.cs.uiuc.edu/annotation/results/production-3-2/results_page_013.html Data from Ramanan NIPS06

11 Mark object attributes $0.03

12 Teach a robot

13 How do we define the task?

14 Annotation specification

15 Annotation language

16 Ideal task properties

17

18 How good are the annotations? Submission isVolumeActionRedo Empty6%Rejectyes Clearly bad2%Rejectyes Almost good4%Accept (pay)yes Good88%Accept (pay)no Task: label people, box+14pts; Volume 3078 HITs

19 How do we make it better?

20 1. Average N annotations

21 2. Require qualification Please read the detailed instructions to learn how to perform the task. Please confirm that you understand the instructions by answering the following questions: Which of the following checboxes are correct for this annotation? No people (there are people in the image) > 20 people (there are less than 20 people of appropriate size) Small heads (there are unmarked small heads in the image) Task: Put a box around every head

22 2. Require qualification

23 3. Use task pipeline

24 4. Do grading

25 Grade conflicts Total grades: 4410

26 5. Automatic grading

27 Learning to grade TaskBottlesPeopleHandsLarge objects Accuracy95.0%83.8%45.5%29.5%

28 Quality control

29 Setting the pay

30 Annotation Method Comparison ApproachCostScaleSetup effort CollaborativeQualityDirectedCentralElastic to $ MTurk$+++ * no+/+++Yesno+++++ GWAP++++***no+Yes + LabelME++Yes++noYes Image Parsing $$++**no++++Yes +++ In house$$$+*no+++Yesno++

31 Is it useful?

32 Publications

33 Thesis

34

35 Fully labeled world assumption Goal: learn to detect every object

36 Why is it important

37 Computer vision task

38 Challenges

39 Lighting conditions Background clutter Lighting and background are known Within-class variability Viewpoint changes Internal deformations 100 000 categories How many instances? 10s billions total 10 000 locally 1000 examples per category 1-10 labels per object Single image Rich sensor data

40 PR2 Sensing capabilities

41 Autonomous data collection

42 Data labeling

43 Learning

44 Preliminary learning results UChicago-VOC2008-person

45 Expected outcome

46 Thesis

47 Detect-Sample-Label

48 Sampling based estimation

49 Standard deviation table

50 Estimating recall

51 Experimental results

52 What are the errors?

53 Timeline

54

55

56

57

58

59 Acknowledgments Special thanks to: David Forsyth Nicolas Loeff, Ali Farhadi, Du Tran, Ian Endres Tamara Berg, Pushmeet Kohli Dolores Labs (Lukas Biewald) Willow Garage (Gary Bradsky, Alex Teichman, Daniel Munos, …) All workers at Amazon Mechanical Turk This work was supported in part by the National Science Foundation under IIS - 0534837 and in part by the Office of Naval Research under N00014-01-1-0890 as part of the MURI program. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation or the Office of Naval Research.

60 Thank you

61 What is an annotation task?

62 PR2 Platform 2 Laser scanners Fixed and Tilting 7 cameras 2 stereo pairs, 1 hires (5mpx) 2 in the arms Structured light 16 cores, 48 GB RAM 2 Arms

63 What are datasets good for? Training –The data is fully labeled Evaluation Tweaking the parameters –Performance is computed automatically Comparing algorithms –“They run on exact same data”

64 Why are datasets bad? Data sampling and labeling bias Small changes in performance are insignificant Parameter tweaking doesn’t generalize Overfitting to the datasets Datasets should be discarded after performance is measured

65


Download ppt "Beyond datasets: Learning in a fully-labeled real world Thesis proposal Alexander Sorokin."

Similar presentations


Ads by Google