Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differentiable Sparse Coding David Bradley and J. Andrew Bagnell NIPS 2008 1.

Similar presentations


Presentation on theme: "Differentiable Sparse Coding David Bradley and J. Andrew Bagnell NIPS 2008 1."— Presentation transcript:

1 Differentiable Sparse Coding David Bradley and J. Andrew Bagnell NIPS 2008 1

2 Joint Optimization 100,000 ft View Complex Systems 2 Voxel Features Voxel Classifier 2-D Planner Cameras Ladar Voxel Grid Column Cost Y (Path to goal) Initialize with “cheap” data

3 10,000 ft view Sparse Coding = Generative Model Semi-supervised learning KL-Divergence Regularization Implicit Differentiation 3 Optimization Unlabeled DataLatent Variable Classifier Loss Gradient

4 Sparse Coding Understand X 4

5 As a combination of factors B 5

6 6 Sparse coding uses optimization Projection (feed-forward): Some vector Want to use to classify x Reconstruction loss function

7 Sparse vectors: Only a few significant elements 7

8 Example: X=Handwritten Digits 8 Basis vectors colored by w

9 Input Basis Projection 9 Optimization vs. Projection

10 Input Basis KL-regularized Optimization 10 Optimization vs. Projection Outputs are sparse for each example

11 Generative Model Latent variables are Independent PriorLikelihood Examples are Independent 11 i

12 PriorLikelihood Sparse Approximation MAP Estimate 12

13 Sparse Approximation Distance between reconstruction and input Distance between weight vector and prior mean Regularization Constant 13

14 Example: Squared Loss + L1 Convex + sparse (widely studied in engineering) Sparse coding solves for B as well (non-convex for now…) Shown to generate useful features on diverse problems 14 Tropp, Signal Processing, 2006 Donoho and Elad, Proceedings of the National Academy of Sciences, 2002 Raina, Battle, Lee, Packer, Ng, ICML, 2007

15 L1 Sparse Coding Shown to generate useful features on diverse problems Optimize B over all examples 15

16 Differentiable Sparse Coding Bradley & Bagnell, “Differentiable Sparse Coding”, NIPS 2008 Y X Learning Module (θ) Loss Function Optimization Module (B) Unlabeled Data Reconstruction Loss Labeled Data X W 16 Sparse Coding Raina, Battle, Lee, Packer, Ng, ICML, 2007

17 L1 Regularization is Not Differentiable Bradley & Bagnell, “Differentiable Sparse Coding”, NIPS 2008 Y X Learning Module (θ) Loss Function Optimization Module (B) Unlabeled Data Reconstruction Loss Labeled Data X W 17 Sparse Coding

18 Why is this unsatisfying? 18

19 Problem #1: Instability L1 Map Estimates are discontinuous Outputs are not differentiable Instead use KL-divergence 19 Proven to compete with L1 in online learning

20 Problem #2: No closed-form Equation At the MAP estimate: 20

21 Solution: Implicit Differentiation Differentiate both sides with respect to an element of B: Since is a function of B: 21 Solve for this

22 KL-Divergence Example: Squared Loss, KL prior 22

23 Handwritten Digit Recognition 50,000 digit training set 10,000 digit validation set 10,000 digit test set 23

24 Handwritten Digit Recognition Unsupervised Sparse Coding L 2 loss and L 1 prior Training Set Step #1: 24 Raina, Battle, Lee, Packer, Ng, ICML, 2007

25 Handwritten Digit Recognition Sparse Approximation Step #2: Maxent Classifier Loss Function 25

26 Handwritten Digit Recognition Supervised Sparse Coding Step #3: Maxent Classifier Loss Function 26

27 KL Maintains Sparsity 27 Weight concentrated in few elements Log scale

28 KL adds Stability X W (KL)W (backprop) W (L 1 ) 28 Duda, Hart, Stork. Pattern classification. 2001

29 Performance vs. Prior 29 Better

30 Classifier Comparison 30 Better

31 Comparison to other algorithms 31 Better

32 Transfer to English Characters 8 16 24,000 character training set 12,000 character validation set 12,000 character test set 32

33 Transfer to English Characters Sparse Approximation Step #1: Digits Basis Maxent Classifier Loss Function 33 Raina, Battle, Lee, Packer, Ng, ICML, 2007

34 Transfer to English Characters Step #2: Maxent Classifier Loss Function Supervised Sparse Coding 34

35 Transfer to English Characters 35 Better

36 Text Application X Unsupervised Sparse Coding KL loss + KL prior 5,000 movie reviews 10 point sentiment scale 1=hated it, 10=loved it Step #1: 36 Pang, Lee, Proceeding of the ACL, 2005

37 Text Application X Sparse Approximation 5-fold Cross Validation 10 point sentiment scale 1=hated it, 10=loved it Linear Regression L2 Loss Step #2: 37

38 Text Application X 10 point sentiment scale 1=hated it, 10=loved it 5-fold Cross Validation Step #3: Linear Regression L2 Loss Supervised Sparse Coding 38

39 Movie Review Sentiment 39 Better Unsupervised basis Supervised basis State of the art graphical model Blei, McAuliffe, NIPS, 2007

40 RGB Camera NIR Camera Ladar Future Work Sparse Coding Engineered Features Labeled Training Data Example Paths Voxel Classifier MMP Camera Laser 40

41 Future Work: Convex Sparse Coding Sparse approximation is convex Sparse coding is not because fixed-size basis is a non-convex constraint Sparse coding ↔ sparse approximation on infinitely large basis + non-convex rank constraint – Relax to a convex L 1 rank constraint Use boosting for sparse approximation directly on infinitely large basis 41 Bengio, Le Roux, Vincent, Dellalleau, Marcotte, NIPS, 2005 Zhao, Yu. Feature Selection for Data Mining, 2005 Riffkin, Lippert. Journal of Machine Learning Research, 2007

42 Questions? 42


Download ppt "Differentiable Sparse Coding David Bradley and J. Andrew Bagnell NIPS 2008 1."

Similar presentations


Ads by Google