Presentation is loading. Please wait.

Presentation is loading. Please wait.

Max-Margin Latent Variable Models M. Pawan Kumar.

Similar presentations


Presentation on theme: "Max-Margin Latent Variable Models M. Pawan Kumar."— Presentation transcript:

1 Max-Margin Latent Variable Models M. Pawan Kumar

2 Max-Margin Latent Variable Models M. Pawan Kumar Daphne Koller Ben Packer Kevin Miller, Rafi Witten, Tim Tang, Danny Goodman, Haithem Turki, Dan Preston, Dan Selsam, Andrej Karpathy

3 Computer Vision Data Segmentation Information Log (Size) ~ 2000

4 Computer Vision Data Segmentation Log (Size) Bounding Box ~ 2000 ~ 12000 Information

5 Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level ~ 2000 ~ 12000 > 14 M “Car” “Chair” Information

6 Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level Noisy Label ~ 2000 ~ 12000 > 14 M > 6 B Learn with missing information (latent variables) Information

7 Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

8 Annotation Mismatch Learn to classify an image Image x Annotation a = “Deer” Mismatch between desired and available annotations h Exact value of latent variable is not “important”

9 Annotation Mismatch Learn to classify a DNA sequence Mismatch between desired and possible annotations Exact value of latent variable is not “important” Sequence x Annotation a  {+1, -1} Latent Variables h

10 Output Mismatch Learn to segment an image Image xOutput y

11 Output Mismatch Learn to segment an image Bird (x, a) (a, h)

12 Output Mismatch Learn to segment an image Mismatch between desired output and available annotations Exact value of latent variable is important (x, a) (a, h) Cow

13 Output Mismatch Learn to classify actions (x, y)

14 Output Mismatch Learn to classify actions + “jumping” xh a = +1 hbhb

15 Output Mismatch Learn to classify actions + “jumping” xh a = -1 hbhb Mismatch between desired output and available annotations Exact value of latent variable is important

16 Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

17 Latent SVM Features  (x,a,h) wT(x,a,h)wT(x,a,h) Parameters w Image x Annotation a = “Deer” h Andrews et al, 2001; Smola et al, 2005; Felzenszwalb et al, 2008; Yu and Joachims, 2009 (a(w),h(w)) = max a,h

18 Parameter Learning Score of Ground-Truth > Score of All Other Outputs Best Completion of

19 Parameter Learning max h w T  (x i,a i,h) > wT(x,a,h)wT(x,a,h)

20 Parameter Learning max h w T  (x i,a i,h) ≥ wT(x,a,h)wT(x,a,h) + Δ(a i,a) - ξ i min ||w|| 2 + CΣ i ξ i Annotation Mismatch

21 Optimization Update h i * = argmax h w T  (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i  i w T  (x i,a i,h i *) - w T  (x i,a,h) ≥  (a i, a) -  i Repeat until convergence

22 Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

23 Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Math is for losers !! FAILURE … BAD LOCAL MINIMUM

24 Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 Euler was a Genius!! SUCCESS … GOOD LOCAL MINIMUM 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0

25 Optimization Update h i * = argmax h w T  (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i  i Repeat until convergence vivi v i  {0,1} λ  λμλ  λμ - λ∑ i v i w T  (x i,a i,h i *) - w T  (x i,a,h) ≥  (a i, a) -  i

26 Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset

27 Image Classification Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL HOG-Based Model. Dalal and Triggs, 2005

28 Image Classification ~ 5000 images 50/50 train/test split 5 folds PASCAL VOC 2007 Dataset Car vs. Not-Car

29 Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Objective HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples

30 Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Mean Average Precision HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples

31 Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding

32 Motif Finding Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL Motif + Markov Background Model. Yu and Joachims, 2009

33 Semantic Segmentation + Train - 572 images Validation - 53 images Test - 90 images Train - 1274 images Validation - 225 images Test - 750 images Stanford BackgroundVOC Segmentation 2009

34 Semantic Segmentation ImageNetVOC Detection 2009 + Train - 1564 imagesTrain - 1000 images Bounding Box Data Image-Level Data

35 Semantic Segmentation Kumar, Turki, Preston and Koller, ICCV 2011 SUP CCCP SPL SUP CCCP SPL Region-based Model. Gould, Fulton and Koller, 2009 SUP – Supervised Learning (Segmentation Data Only)

36 Action Classification PASCAL VOC 2011 Train – 3000 instances Train - 10000 images Bounding Box Data Noisy Data + Test – 3000 instances

37 Action Classification Packer, Kumar, Tang and Koller, In Preparation SUP CCCP SPL Poselet-based Model. Maji, Bourdev and Malik, 2011

38 Self-Paced Multiple Kernel Learning Kumar, Packer and Koller, In Preparation 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers USE A FIXED MODEL

39 Kumar, Packer and Koller, In Preparation 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers ADAPT THE MODEL COMPLEXITY Self-Paced Multiple Kernel Learning

40 Optimization Update h i * = argmax h w T  (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i  i Repeat until convergence vivi v i  {0,1} λ  λμλ  λμ - λ∑ i v i w T  (x i,a i,h i *) - w T  (x i,a,h) ≥  (a i, a) -  i K ij =  (x i,a i,h i ) T  (x j,a j,h j ) K = Σ k c k K k ^ and c

41 Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset

42 Image Classification Kumar, Packer and Koller, In Preparation FIXED SPMKL FIXED SPMKL HOG-Based Model. Dalal and Triggs, 2005

43 Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding

44 Motif Finding Kumar, Packer and Koller, NIPS 2010 FIXED SPMKL FIXED SPMKL Motif + Markov Background Model. Yu and Joachims, 2009

45 Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

46 0.00 0.25 0.000.250.00 0.25 Pr(a,h|x) = exp( w T  (x,a,h)) Z(x) Pr(a 1,h|x) MAP Inference

47 0.00 0.25 0.000.250.00 0.25 Pr(a 1,h|x) 0.00 0.01 0.000.240.00 Pr(a 2,h|x) MAP Inference min a,h – log (Pr(a,h|x)) Value of latent variable? Pr(a,h|x) = exp( w T  (x,a,h)) Z(x)

48 min a – log (Pr(a|x)) Min-Entropy Inference + H α (Pr(h|a,x)) min a H α (Q(a; x, w)) Q(a; x, w) = Set of all {Pr(a,h|x)} Renyi entropy of generalized distribution

49 min ||w|| 2 + C∑ i  i H α (Q(a; x, w))- H α (Q(a i ; x, w)) ≥  (a i, a) -  i  i ≥ 0 Like latent SVM, minimizes  (a i, a i (w)) In fact, when α = ∞... Max-Margin Min-Entropy Models Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

50 min ||w|| 2 + C∑ i  i max h w T  (x,a i,h)-max h w T  (x,a,h) ≥  (a i, a) -  i  i ≥ 0 In fact, when α = ∞... Latent SVM Max-Margin Min-Entropy Models Like latent SVM, minimizes  (a i, a i (w)) Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

51 Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset

52 Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005

53 Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005

54 Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005

55 Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding

56 Motif Finding Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 Motif + Markov Background Model. Yu and Joachims, 2009

57 Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

58 Very Large Datasets Initialize parameters using supervised data Impute latent variables (inference) Select easy samples (very efficient) Update parametersusing incremental SVM Refine efficiently with proximal regularization

59 Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w and θ

60 Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) (a 2,h) Pr θ (h,a|x)

61 Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) Pr θ (h,a|x) (a 2,h)

62 Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)

63 Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)

64 Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)

65 Questions?


Download ppt "Max-Margin Latent Variable Models M. Pawan Kumar."

Similar presentations


Ads by Google