Presentation is loading. Please wait.

Presentation is loading. Please wait.

Self-Paced Learning for Semantic Segmentation

Similar presentations


Presentation on theme: "Self-Paced Learning for Semantic Segmentation"— Presentation transcript:

1 Self-Paced Learning for Semantic Segmentation
M. Pawan Kumar

2

3 Self-Paced Learning for Latent Structural SVM
M. Pawan Kumar Benjamin Packer Daphne Koller

4 Aim Input x Output y  Y Hidden Variable h  H
To learn accurate parameters for latent structural SVM Input x Output y  Y Hidden Variable h  H “Deer” Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

5 Aim (y*,h*) = maxyY,hH wT(x,y,h) Feature (x,y,h) (HOG, BoW)
To learn accurate parameters for latent structural SVM Feature (x,y,h) (HOG, BoW) Parameters w (y*,h*) = maxyY,hH wT(x,y,h)

6 Motivation FAILURE … BAD LOCAL MINIMUM Real Numbers Imaginary Numbers
Math is for losers !! Real Numbers Imaginary Numbers eiπ+1 = 0 FAILURE … BAD LOCAL MINIMUM

7 Motivation SUCCESS … GOOD LOCAL MINIMUM Real Numbers Imaginary Numbers
Euler was a Genius!! Real Numbers Imaginary Numbers eiπ+1 = 0 SUCCESS … GOOD LOCAL MINIMUM

8 Motivation Simultaneously estimate easiness and parameters
Start with “easy” examples, then consider “hard” ones Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances Easy vs. Hard Expensive Easy for human  Easy for machine

9 Outline Latent Structural SVM Concave-Convex Procedure
Self-Paced Learning Experiments

10 Latent Structural SVM Training samples xi Ground-truth label yi
Felzenszwalb et al, 2008, Yu and Joachims, 2009 Training samples xi Ground-truth label yi Loss Function (yi, yi(w), hi(w))

11 (yi(w),hi(w)) = maxyY,hH wT(x,y,h)
Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h) min ||w||2 + C∑i(yi, yi(w), hi(w)) Non-convex Objective Minimize an upper bound

12 Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h)
min ||w||2 + C∑i i maxhiwT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Still non-convex Difference of convex CCCP Algorithm - converges to a local minimum

13 Outline Latent Structural SVM Concave-Convex Procedure
Self-Paced Learning Experiments

14 Concave-Convex Procedure
Start with an initial estimate w0 Update hi = maxhH wtT(xi,yi,h) Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i 14

15 Concave-Convex Procedure
Looks at all samples simultaneously “Hard” samples will cause confusion Start with “easy” samples, then consider “hard” ones 15

16 Outline Latent Structural SVM Concave-Convex Procedure
Self-Paced Learning Experiments

17 Self-Paced Learning REMINDER
Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances 17

18 wT(xi,yi,hi) - wT(xi,y,h)
Self-Paced Learning Start with an initial estimate w0 Update hi = maxhH wtT(xi,yi,h) Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i 18

19 wT(xi,yi,hi) - wT(xi,y,h)
Self-Paced Learning min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i 19

20 wT(xi,yi,hi) - wT(xi,y,h)
Self-Paced Learning vi  {0,1} min ||w||2 + C∑i vii wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Trivial Solution 20

21 Self-Paced Learning min ||w||2 + C∑i vii - ∑ivi/K
wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K 21

22 Self-Paced Learning min ||w||2 + C∑i vii - ∑ivi/K
Alternating Convex Search Biconvex Problem vi  [0,1] min ||w||2 + C∑i vii - ∑ivi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K 22

23 Self-Paced Learning hi = maxhH wtT(xi,yi,h)
Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i vii - ∑i vi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Decrease K  K/ 23

24 Outline Latent Structural SVM Concave-Convex Procedure
Self-Paced Learning Experiments

25 Object Detection Input x - Image Output y  Y Latent h - Box
 - 0/1 Loss Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” } Feature (x,y,h) - HOG

26 Object Detection Mammals Dataset 271 images, 6 classes
90/10 train/test split 4 folds

27 Object Detection CCCP Self-Paced

28 Object Detection CCCP Self-Paced

29 Object Detection CCCP Self-Paced

30 Object Detection CCCP Self-Paced

31 Object Detection Objective value Test error

32 Handwritten Digit Recognition
Input x - Image Output y  Y Latent h - Rotation  - 0/1 Loss MNIST Dataset Y = {0, 1, … , 9} Feature (x,y,h) - PCA + Projection

33 Handwritten Digit Recognition
SPL C C C - Significant Difference

34 Handwritten Digit Recognition
SPL C C C - Significant Difference

35 Handwritten Digit Recognition
SPL C C C - Significant Difference

36 Handwritten Digit Recognition
SPL C C C - Significant Difference

37 Feature (x,y,h) - Ng and Cardie, ACL 2002
Motif Finding Input x - DNA Sequence Output y  Y Y = {0, 1} Latent h - Motif Location  - 0/1 Loss Feature (x,y,h) - Ng and Cardie, ACL 2002

38 Motif Finding UniProbe Dataset 40,000 sequences 50/50 train/test split
5 folds

39 Motif Finding Average Hamming Distance of Inferred Motifs SPL SPL SPL

40 Motif Finding SPL Objective Value

41 Motif Finding SPL Test Error

42 Noun Phrase Coreference
Input x - Nouns Output y - Clustering Latent h - Spanning Forest over Nouns Feature (x,y,h) - Yu and Joachims, ICML 2009

43 Noun Phrase Coreference
MUC6 Dataset 60 documents 50/50 train/test split 1 predefined fold

44 Noun Phrase Coreference
MITRE Loss Pairwise Loss - Significant Improvement - Significant Decrement

45 Noun Phrase Coreference
SPL MITRE Loss SPL Pairwise Loss

46 Noun Phrase Coreference
SPL MITRE Loss SPL Pairwise Loss

47 Summary Automatic Self-Paced Learning Concave-Biconvex Procedure
Generalization to other Latent models Expectation-Maximization E-step remains the same M-step includes indicator variables vi Kumar, Packer and Koller, NIPS 2010


Download ppt "Self-Paced Learning for Semantic Segmentation"

Similar presentations


Ads by Google