Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online

Similar presentations


Presentation on theme: "Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online"— Presentation transcript:

1 Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar http://www.robots.ox.ac.uk/~oval/ Slides available online http://mpawankumar.infohttp://mpawankumar.info

2 Segmentation Information Log (Size) ~ 2000 Computer Vision Data

3 Segmentation Log (Size) ~ 2000 Information Bounding Box ~ 1 M Computer Vision Data

4 Segmentation Log (Size) Bounding Box Image-Level ~ 2000 ~ 1 M > 14 M “Car” “Chair” Information Computer Vision Data

5 Segmentation Log (Size) Image-Level Noisy Label ~ 2000 > 14 M > 6 B Information Bounding Box ~ 1 M Computer Vision Data

6 Learn with missing information (latent variables) Detailed annotation is expensive Sometimes annotation is impossible Desired annotation keeps changing Computer Vision Data

7 Annotation Mismatch Input x Annotation y Latent h x y = “jumping” h Action Classification Mismatch between desired and available annotations Exact value of latent variable is not “important” Desired output during test time is y

8 Output Mismatch Input x Annotation y Latent h x y = “jumping” h Action Classification

9 Output Mismatch Input x Annotation y Latent h x y = “jumping” h Action Detection Mismatch between output and available annotations Exact value of latent variable is important Desired output during test time is (y,h)

10 Annotation Mismatch Input x Annotation y Latent h x y = “jumping” h Action Classification Output mismatch is out of scope We will focus on this case Desired output during test time is y

11 Latent SVM Optimization Practice Outline Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009

12 Weakly Supervised Data Input x Output y  {-1,+1} Hidden h x y = +1 h

13 Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) x y = +1 h

14 Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,+1,h) Φ(x,h) 0 = x y = +1 h

15 Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h) = x y = +1 h

16 Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) Score f : Ψ(x,y,h)  (-∞, +∞) Optimize score over all possible y and h x y = +1 h

17 Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Latent SVM Parameters

18 Learning Latent SVM  (y i, y i (w)) ΣiΣi Empirical risk minimization min w No restriction on the loss function Annotation mismatch Training data {(x i,y i ), i = 1,2,…,n}

19 Learning Latent SVM  (y i, y i (w)) ΣiΣi Empirical risk minimization min w Non-convex Parameters cannot be regularized Find a regularization-sensitive upper bound

20 Learning Latent SVM - w T  (x i,y i (w),h i (w))  (y i, y i (w)) w T  (x i,y i (w),h i (w)) +

21 Learning Latent SVM  (y i, y i (w)) w T  (x i,y i (w),h i (w)) + - max h i w T  (x i,y i,h i ) y(w),h(w) = argmax y,h w T Ψ(x,y,h)

22 Learning Latent SVM  (y i, y) w T  (x i,y,h) + max y,h - max h i w T  (x i,y i,h i ) ≤ ξ i min w ||w|| 2 + C Σ i ξ i Parameters can be regularized Is this also convex?

23 Learning Latent SVM  (y i, y) w T  (x i,y,h) + max y,h - max h i w T  (x i,y i,h i ) ≤ ξ i min w ||w|| 2 + C Σ i ξ i Convex - Difference of convex (DC) program

24 min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - max h i w T Ψ(x i,y i,h i ) ≤ ξ i Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Learning Recap

25 Latent SVM Optimization Practice Outline

26 Learning Latent SVM  (y i, y) w T  (x i,y,h) + max y,h - max h i w T  (x i,y i,h i ) ≤ ξ i min w ||w|| 2 + C Σ i ξ i Difference of convex (DC) program

27 Concave-Convex Procedure +  (y i, y) w T  (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Linear upper-bound of concave part

28 Concave-Convex Procedure +  (y i, y) w T  (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Optimize the convex upper bound

29 Concave-Convex Procedure +  (y i, y) w T  (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Linear upper-bound of concave part

30 Concave-Convex Procedure +  (y i, y) w T  (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Until Convergence

31 Concave-Convex Procedure +  (y i, y) w T  (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Linear upper bound?

32 Linear Upper Bound - max h i w T  (x i,y i,h i ) -w T  (x i,y i,h i *) h i * = argmax h i w t T  (x i,y i,h i ) Current estimate = w t ≥ - max h i w T  (x i,y i,h i )

33 CCCP for Latent SVM Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C∑ i  i w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y) -  i h i * = argmax h i  H w t T  (x i,y i,h i ) Repeat until convergence

34 Latent SVM Optimization Practice Outline

35 Action Classification Input x Output y = “Using Computer” PASCAL VOC 2011 80/20 Train/Test Split 5 Folds Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking Train Input x i Output y i

36 0-1 loss function Poselet-based feature vector 4 seeds for random initialization Code + Data Train/Test scripts with hyperparameter settings Setup http://mpawankumar.info/tutorials/cvpr2013/

37 Objective

38 Train Error

39 Test Error

40 Time

41 Latent SVM Optimization Practice –Annealing the Tolerance –Annealing the Regularization –Self-Paced Learning –Choice of Loss Function Outline

42 Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C∑ i  i w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y) -  i h i * = argmax h i  H w t T  (x i,y i,h i ) Repeat until convergence Overfitting in initial iterations

43 Repeat until convergence ε’ = ε/K and ε’ = ε Start with an initial estimate w 0 Update Update w t+1 as the ε’-optimal solution of min ||w|| 2 + C∑ i  i w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y) -  i h i * = argmax h i  H w t T  (x i,y i,h i )

44 Objective

45

46 Train Error

47

48 Test Error

49

50 Time

51

52 Latent SVM Optimization Practice –Annealing the Tolerance –Annealing the Regularization –Self-Paced Learning –Choice of Loss Function Outline

53 Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C∑ i  i w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y) -  i h i * = argmax h i  H w t T  (x i,y i,h i ) Repeat until convergence Overfitting in initial iterations

54 Repeat until convergence C’ = C x K and C’ = C Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C’∑ i  i w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y) -  i h i * = argmax h i  H w t T  (x i,y i,h i )

55 Objective

56

57 Train Error

58

59 Test Error

60

61 Time

62

63 Latent SVM Optimization Practice –Annealing the Tolerance –Annealing the Regularization –Self-Paced Learning –Choice of Loss Function Outline Kumar, Packer and Koller, NIPS 2010

64 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Math is for losers !! FAILURE … BAD LOCAL MINIMUM CCCP for Human Learning

65 Euler was a Genius!! SUCCESS … GOOD LOCAL MINIMUM 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Self-Paced Learning

66 Start with “easy” examples, then consider “hard” ones Easy vs. Hard Expensive Easy for human  Easy for machine Self-Paced Learning Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances

67 Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C∑ i  i w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y) -  i h i * = argmax h i  H w t T  (x i,y i,h i ) CCCP for Latent SVM

68 min ||w|| 2 + C∑ i  i w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y, h) -  i Self-Paced Learning

69 min ||w|| 2 + C∑ i v i  i w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y, h) -  i v i  {0,1} Trivial Solution Self-Paced Learning

70 v i  {0,1} Large KMedium KSmall K min ||w|| 2 + C∑ i v i  i - ∑ i v i /K w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y, h) -  i Self-Paced Learning

71 v i  [0,1] min ||w|| 2 + C∑ i v i  i - ∑ i v i /K w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y, h) -  i Large KMedium KSmall K Biconvex Problem Alternating Convex Search Self-Paced Learning

72 Start with an initial estimate w 0 Update min ||w|| 2 + C∑ i  i - ∑ i v i /K w T  (x i,y i,h i *) - w T  (x i,y,h) ≥  (y i, y) -  i h i * = argmax h i  H w t T  (x i,y i,h i ) Decrease K  K/  SPL for Latent SVM Update w t+1 as the ε-optimal solution of

73 Objective

74

75 Train Error

76

77 Test Error

78

79 Time

80

81 Latent SVM Optimization Practice –Annealing the Tolerance –Annealing the Regularization –Self-Paced Learning –Choice of Loss Function Outline Behl, Mohapatra, Jawahar and Kumar, PAMI 2015

82 Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1

83 Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1Accuracy = 1 Average Precision = 0.92Accuracy = 0.67Average Precision = 0.81

84 Ranking During testing, AP is frequently used During training, a surrogate loss is used Contradictory to loss-based learning Optimize AP directly

85 Results Statistically significant improvement

86 Questions?


Download ppt "Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online"

Similar presentations


Ads by Google