Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Semantics with Less Supervision

Similar presentations

Presentation on theme: "Learning Semantics with Less Supervision"— Presentation transcript:

1 Learning Semantics with Less Supervision

2 Agenda Beyond Fixed Keypoints Beyond Keypoints Open discussion

3 Part Discovery from Partial Correspondence
[Subhransu Maji and Gregory Shakhnarovich, CVPR 2013]


5 Keypoints in diverse categories
Where are the keypoints? Can you name them?

6 Does the name of a keypoint matter?
We can mark correspondences without naming parts Maji and Shakhnarovich HCOMP’12

7 Annotation interface on MTurk
Example landmarks are provided:

8 Example annotations Annotators mark 5 landmark pairs on average

9 Are the landmarks consistent across annotators?

10 Semantic part discovery
Given a window in the first image we can find the corresponding window in the second image

11 propagate correspondence in the “semantic graph”

12 Semantic part discovery
Iter 2 Iter 1 Iter 0 Discover parts using breadth-first traversal

13 The semantic graph alone is not good enough
Graph only Graph + appearance Trained using latent LDA scale, translation, membership

14 Semantic part discovery
Graph only Graph + Appearance

15 Examples of learned parts

16 Part-based representation
image other activations on the training set

17 Part-based representation
image other activations on the training set

18 Detecting church buildings: individual parts
graph mining better seeds

19 Detecting church buildings: collection of parts
Detection is challenging due to structural variability Latent LDA parts + voting AP=39.9%, DPM AP=34.7%

20 Label Transfer Ask users to label parts where it makes sense: -> arch -> tower -> window Transfer labels on test images:

21 Agenda Beyond Fixed Keypoints Beyond Keypoints Open Discussion

22 Unsupervised Discovery of Mid-Level Discriminative Patches
Sarubh Singh, Abhinav Gupta and Alexei Efros, ECCV12

23 Can we get nice parts without supervision?
Idea 0: K-means clustering in HOG space

24 Still not good enough The SVM memorizes bad examples and still scores them highly However, the space of bad examples is much more diverse So we can avoid overfitting if we train on a training subset but look for patches on a validation subset

25 Why K-means on HOG fails?
Chicken & Egg Problem If we know that a set of patches are visually similar we can easily learn a distance metric for them If we know the distance metric, we can easily find other members

26 Idea 1: Discriminative Clustering
Start with K-Means Train a discriminative classifier for the distance function, using all other classes as negative examples Re-assign patches to clusters whose classifier gives highest score Repeat

27 Idea 2: Discriminative Clustering+
Start with K-Means or kNN Train a discriminative classifier for the distance function, using Detection Detect the patches and assign to top k clusters Repeat

28 Can we get good parts without supervision?
What makes a good part? Must occur frequently in one class (representative) Must not occur frequently in all classes (discriminative)

29 Discriminative Clustering+

30 Discriminative Clustering+

31 Idea 3: Discriminative Clustering++
Split the discovery dataset into two equal parts (training and validation) Train on the training subset Run the trained classifier on the validation set to collect examples Exchange training and validation sets Repeat

32 Discriminative Clustering++

33 Doublets: Discover second-order relationships
Start with high-scoring patches Find spatial correlations to other (weaker patches) Rank the potential doublets on validation set

34 Doublets

35 AP on MIT Indoor-67 scene recognition dataset

36 Blocks that shout: Distinctive Parts for Scene Classification
Juneja, Vedaldi, Jawahar and Zisserman, CVPR13 bookstore buffet computer room closet

37 Three steps Seeding (proposing initial parts)
Expansion (learning part detectors) Selection (identifying good parts)

38 Step 1: Seeding Segment the image
Find proposal regions based on “objectness” Compute HOG features for each

39 Step 2: Expansion Train Exemplar SVM for each seed region [Malisiewitz et al] Apply it on validation set to collect more examples Retrain and repeat

40 Step 3: Selection Good parts should occur frequently in small number of classes but infrequently in the rest Collect top 5 parts from each validation image, sort occurrences of each part by score and keep the top r Compute the entropy for each part over the class distribution. Retain lowest-entropy parts Filter out any parts too similar to others (based on cosine similarity of their SVM weights)

41 Features and learning Features: Explored Dense RootSIFT, BoW, LLS, Improved Fisher Vectors Non-linear SVM (sqrt kernel)

42 Results on MIT Indoor-67 Singh et al Juneja et al Seeding
K-means on HOG Exemplar SVM Feature space HOG IFV SVM Linear Non-linear Selection Purity & discriminativeness (penalizes parts that perform well for multiple clusters) Entropy rank (allows for parts that work for multiple clusters) AP on MIT 67 49.4 61.1

43 Learning Collections of Parts for Object Recognition
[Endres, Shih, Jiaa and Hoiem, CVPR13]

44 Overview of the method Seeding: Random samples including full bounding box and sub-window boxes Expanding: Exemplar SVM, fast training (using LDA) Selection: Greedy method, pick parts that require each training example to be explained by a part Appearance Consistency: Include parts that have high SVM score Spatial Consistency: Prefer parts that come from the same location within bounding box Training and Detection: Boosting over Category Independent Object Proposals [Endres & Hoiem]

45 Results on PASCAL 2010 detection
Averages of patches on the top 15 detections on the validation set for a set of parts

46 Agenda Beyond Fixed Keypoints Beyond Keypoints Open Discussion

47 Gender Recognition on Labeled Faces in the Wild
Much easier dataset – no occlusion, high resolution, centered frontal faces Method Gender AP Kumar et al, ICCV 2009 95.52 Frontal Face poselet 96.43 [Zhang et al, arXiv: ]

48 Gender Recognition on Labeled Faces in the Wild
Much easier dataset – no occlusion, high resolution, centered frontal faces Method Gender AP Kumar et al, ICCV 2009 95.52 Frontal Face poselet 96.43 Poselets + Deep Learning 99.54 Male of female? [Zhang et al, arXiv: ]

49 Poselets vs DPMs vs Discriminative Patches
Approach Parametric Non-parametric Speed Faster (fewer types) Slower Slower (many types) Redundancy Little A lot (improves accuracy) A lot Spatial model Sophisticated Primitive (threshold) Primitive Supervision requirements Needs 2 keypoints Needs more keypoints (10+) No supervision Uses multi-scale signal? Two scale levels Yes, multiple scales yes Jointly trained Yes No Attached semantics Medium

unsupervised strongly supervised

51 Questions for open discussion
What is the future for mid-level parts? More supervision vs less supervision? Should low-level parts be hard-coded or jointly trained? Parametric vs non-parametric approaches? Parts with/without associated semantics

Download ppt "Learning Semantics with Less Supervision"

Similar presentations

Ads by Google