Presentation is loading. Please wait.

Presentation is loading. Please wait.

Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer.

Similar presentations


Presentation on theme: "Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer."— Presentation transcript:

1 Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer

2 Aim To learn accurate parameters for latent structural SVM Input x Output y  Y “Deer” Hidden Variable h  H Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

3 Aim To learn accurate parameters for latent structural SVM Feature  (x,y,h) (HOG, BoW) (y*,h*) = max y  Y,h  H w T  (x,y,h) Parameters w

4 Motivation Real Numbers Imaginary Numbers e iπ +1 = 0 Math is for losers !! FAILURE … BAD LOCAL MINIMUM

5 Motivation Real Numbers Imaginary Numbers e iπ +1 = 0 Euler was a Genius!! SUCCESS … GOOD LOCAL MINIMUM Curriculum Learning: Bengio et al, ICML 2009

6 Motivation Start with “easy” examples, then consider “hard” ones Easy vs. Hard Expensive Easy for human  Easy for machine Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances

7 Outline Latent Structural SVM Concave-Convex Procedure Curriculum Learning Experiments

8 Latent Structural SVM Training samples x i Ground-truth label y i Loss Function  (y i, y i (w), h i (w)) Felzenszwalb et al, 2008, Yu and Joachims, 2009

9 Latent Structural SVM (y i (w),h i (w)) = max y  Y,h  H w T  (x,y,h) min ||w|| 2 + C∑ i  (y i, y i (w), h i (w)) Non-convex Objective Minimize an upper bound

10 Latent Structural SVM min ||w|| 2 + C∑ i  i max h i w T  (x i,y i,h i ) - w T  (x i,y,h) ≥  (y i, y, h) -  i Still non-convexDifference of convex CCCP Algorithm - converges to a local minimum (y i (w),h i (w)) = max y  Y,h  H w T  (x,y,h)

11 Outline Latent Structural SVM Concave-Convex Procedure Curriculum Learning Experiments

12 Concave-Convex Procedure Start with an initial estimate w 0 Update Update w t+1 by solving a convex problem min ||w|| 2 + C∑ i  i w T  (x i,y i,h i ) - w T  (x i,y,h) ≥  (y i, y, h) -  i h i = max h  H w t T  (x i,y i,h)

13 Concave-Convex Procedure Looks at all samples simultaneously “Hard” samples will cause confusion Start with “easy” samples, then consider “hard” ones

14 Outline Latent Structural SVM Concave-Convex Procedure Curriculum Learning Experiments

15 Curriculum Learning REMINDER Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances

16 Curriculum Learning Start with an initial estimate w 0 Update Update w t+1 by solving a convex problem min ||w|| 2 + C∑ i  i w T  (x i,y i,h i ) - w T  (x i,y,h) ≥  (y i, y, h) -  i h i = max h  H w t T  (x i,y i,h)

17 Curriculum Learning min ||w|| 2 + C∑ i  i w T  (x i,y i,h i ) - w T  (x i,y,h) ≥  (y i, y, h) -  i

18 Curriculum Learning min ||w|| 2 + C∑ i v i  i w T  (x i,y i,h i ) - w T  (x i,y,h) ≥  (y i, y, h) -  i v i  {0,1} Trivial Solution

19 Curriculum Learning v i  {0,1} Large KMedium KSmall K min ||w|| 2 + C∑ i v i  i - ∑ i v i /K w T  (x i,y i,h i ) - w T  (x i,y,h) ≥  (y i, y, h) -  i

20 Curriculum Learning v i  [0,1] min ||w|| 2 + C∑ i v i  i - ∑ i v i /K w T  (x i,y i,h i ) - w T  (x i,y,h) ≥  (y i, y, h) -  i Large KMedium KSmall K Biconvex Problem

21 Curriculum Learning Start with an initial estimate w 0 Update Update w t+1 by solving a convex problem min ||w|| 2 + C∑ i v i  i - ∑ i v i /K w T  (x i,y i,h i ) - w T  (x i,y,h) ≥  (y i, y, h) -  i h i = max h  H w t T  (x i,y i,h) Decrease K  K/ 

22 Outline Latent Structural SVM Concave-Convex Procedure Curriculum Learning Experiments

23 Object Detection Feature  (x,y,h) - HOG Input x - Image Output y  Y Latent h - Box  - 0/1 Loss Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

24 Object Detection 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset

25 Object Detection CCCP Curriculum

26 Object Detection CCCP Curriculum

27 Object Detection CCCP Curriculum

28 Object Detection CCCP Curriculum

29 Objective valueTest error Object Detection

30 Handwritten Digit Recognition Feature  (x,y,h) - PCA + Projection Input x - Image Output y  Y Y = {0, 1, …, 9} Latent h - Rotation MNIST Dataset  - 0/1 Loss

31 Handwritten Digit Recognition - Significant Difference C C C

32 Handwritten Digit Recognition - Significant Difference C C C

33 Handwritten Digit Recognition - Significant Difference C C C

34 Handwritten Digit Recognition - Significant Difference C C C

35 Motif Finding Feature  (x,y,h) - Ng and Cardie, ACL 2002 Input x - DNA Sequence Output y  Y Y = {0, 1} Latent h - Motif Location  - 0/1 Loss

36 Motif Finding 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset

37 Motif Finding Average Hamming Distance of Inferred Motifs

38 Motif Finding Objective Value

39 Motif Finding Test Error

40 Noun Phrase Coreference Feature  (x,y,h) - Yu and Joachims, ICML 2009 Input x - NounsOutput y - Clustering Latent h - Spanning Forest over Nouns

41 Noun Phrase Coreference 60 documents 50/50 train/test split 1 predefined fold MUC6 Dataset

42 Noun Phrase Coreference - Significant Improvement - Significant Decrement MITRE Loss Pairwise Loss

43 Noun Phrase Coreference MITRE Loss Pairwise Loss

44 Noun Phrase Coreference MITRE Loss Pairwise Loss

45 Summary Automatic Curriculum Learning Concave-Biconvex Procedure Generalization to other Latent models – Expectation-Maximization – E-step remains the same – M-step includes indicator variables v i


Download ppt "Curriculum Learning for Latent Structural SVM M. Pawan Kumar (under submission) Daphne KollerBenjamin Packer."

Similar presentations


Ads by Google