Download presentation

Presentation is loading. Please wait.

Published byEverett Malcolm Owens Modified about 1 year ago

1
Max-Margin Latent Variable Models M. Pawan Kumar

2
Max-Margin Latent Variable Models M. Pawan Kumar Daphne Koller Ben Packer Kevin Miller, Rafi Witten, Tim Tang, Danny Goodman, Haithem Turki, Dan Preston, Dan Selsam, Andrej Karpathy

3
Computer Vision Data Segmentation Information Log (Size) ~ 2000

4
Computer Vision Data Segmentation Log (Size) Bounding Box ~ 2000 ~ Information

5
Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level ~ 2000 ~ > 14 M “Car” “Chair” Information

6
Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level Noisy Label ~ 2000 ~ > 14 M > 6 B Learn with missing information (latent variables) Information

7
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

8
Annotation Mismatch Learn to classify an image Image x Annotation a = “Deer” Mismatch between desired and available annotations h Exact value of latent variable is not “important”

9
Annotation Mismatch Learn to classify a DNA sequence Mismatch between desired and possible annotations Exact value of latent variable is not “important” Sequence x Annotation a {+1, -1} Latent Variables h

10
Output Mismatch Learn to segment an image Image xOutput y

11
Output Mismatch Learn to segment an image Bird (x, a) (a, h)

12
Output Mismatch Learn to segment an image Mismatch between desired output and available annotations Exact value of latent variable is important (x, a) (a, h) Cow

13
Output Mismatch Learn to classify actions (x, y)

14
Output Mismatch Learn to classify actions + “jumping” xh a = +1 hbhb

15
Output Mismatch Learn to classify actions + “jumping” xh a = -1 hbhb Mismatch between desired output and available annotations Exact value of latent variable is important

16
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

17
Latent SVM Features (x,a,h) wT(x,a,h)wT(x,a,h) Parameters w Image x Annotation a = “Deer” h Andrews et al, 2001; Smola et al, 2005; Felzenszwalb et al, 2008; Yu and Joachims, 2009 (a(w),h(w)) = max a,h

18
Parameter Learning Score of Ground-Truth > Score of All Other Outputs Best Completion of

19
Parameter Learning max h w T (x i,a i,h) > wT(x,a,h)wT(x,a,h)

20
Parameter Learning max h w T (x i,a i,h) ≥ wT(x,a,h)wT(x,a,h) + Δ(a i,a) - ξ i min ||w|| 2 + CΣ i ξ i Annotation Mismatch

21
Optimization Update h i * = argmax h w T (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i i w T (x i,a i,h i *) - w T (x i,a,h) ≥ (a i, a) - i Repeat until convergence

22
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

23
Self-Paced Learning Kumar, Packer and Koller, NIPS = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Math is for losers !! FAILURE … BAD LOCAL MINIMUM

24
Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 Euler was a Genius!! SUCCESS … GOOD LOCAL MINIMUM = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0

25
Optimization Update h i * = argmax h w T (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i i Repeat until convergence vivi v i {0,1} λ λμλ λμ - λ∑ i v i w T (x i,a i,h i *) - w T (x i,a,h) ≥ (a i, a) - i

26
Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset

27
Image Classification Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL HOG-Based Model. Dalal and Triggs, 2005

28
Image Classification ~ 5000 images 50/50 train/test split 5 folds PASCAL VOC 2007 Dataset Car vs. Not-Car

29
Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Objective HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples

30
Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Mean Average Precision HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples

31
Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding

32
Motif Finding Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL Motif + Markov Background Model. Yu and Joachims, 2009

33
Semantic Segmentation + Train images Validation - 53 images Test - 90 images Train images Validation images Test images Stanford BackgroundVOC Segmentation 2009

34
Semantic Segmentation ImageNetVOC Detection Train imagesTrain images Bounding Box Data Image-Level Data

35
Semantic Segmentation Kumar, Turki, Preston and Koller, ICCV 2011 SUP CCCP SPL SUP CCCP SPL Region-based Model. Gould, Fulton and Koller, 2009 SUP – Supervised Learning (Segmentation Data Only)

36
Action Classification PASCAL VOC 2011 Train – 3000 instances Train images Bounding Box Data Noisy Data + Test – 3000 instances

37
Action Classification Packer, Kumar, Tang and Koller, In Preparation SUP CCCP SPL Poselet-based Model. Maji, Bourdev and Malik, 2011

38
Self-Paced Multiple Kernel Learning Kumar, Packer and Koller, In Preparation = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers USE A FIXED MODEL

39
Kumar, Packer and Koller, In Preparation = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers ADAPT THE MODEL COMPLEXITY Self-Paced Multiple Kernel Learning

40
Optimization Update h i * = argmax h w T (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i i Repeat until convergence vivi v i {0,1} λ λμλ λμ - λ∑ i v i w T (x i,a i,h i *) - w T (x i,a,h) ≥ (a i, a) - i K ij = (x i,a i,h i ) T (x j,a j,h j ) K = Σ k c k K k ^ and c

41
Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset

42
Image Classification Kumar, Packer and Koller, In Preparation FIXED SPMKL FIXED SPMKL HOG-Based Model. Dalal and Triggs, 2005

43
Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding

44
Motif Finding Kumar, Packer and Koller, NIPS 2010 FIXED SPMKL FIXED SPMKL Motif + Markov Background Model. Yu and Joachims, 2009

45
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

46
Pr(a,h|x) = exp( w T (x,a,h)) Z(x) Pr(a 1,h|x) MAP Inference

47
Pr(a 1,h|x) Pr(a 2,h|x) MAP Inference min a,h – log (Pr(a,h|x)) Value of latent variable? Pr(a,h|x) = exp( w T (x,a,h)) Z(x)

48
min a – log (Pr(a|x)) Min-Entropy Inference + H α (Pr(h|a,x)) min a H α (Q(a; x, w)) Q(a; x, w) = Set of all {Pr(a,h|x)} Renyi entropy of generalized distribution

49
min ||w|| 2 + C∑ i i H α (Q(a; x, w))- H α (Q(a i ; x, w)) ≥ (a i, a) - i i ≥ 0 Like latent SVM, minimizes (a i, a i (w)) In fact, when α = ∞... Max-Margin Min-Entropy Models Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

50
min ||w|| 2 + C∑ i i max h w T (x,a i,h)-max h w T (x,a,h) ≥ (a i, a) - i i ≥ 0 In fact, when α = ∞... Latent SVM Max-Margin Min-Entropy Models Like latent SVM, minimizes (a i, a i (w)) Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

51
Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset

52
Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005

53
Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005

54
Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005

55
Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding

56
Motif Finding Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 Motif + Markov Background Model. Yu and Joachims, 2009

57
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

58
Very Large Datasets Initialize parameters using supervised data Impute latent variables (inference) Select easy samples (very efficient) Update parametersusing incremental SVM Refine efficiently with proximal regularization

59
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w and θ

60
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) (a 2,h) Pr θ (h,a|x)

61
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) Pr θ (h,a|x) (a 2,h)

62
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)

63
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)

64
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)

65
Questions?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google