Presentation is loading. Please wait.

Presentation is loading. Please wait.

Visual Dictionaries Toyota Technological Institute at Chicago George Papandreou CVPR 2014 Tutorial on BASIS.

Similar presentations

Presentation on theme: "Visual Dictionaries Toyota Technological Institute at Chicago George Papandreou CVPR 2014 Tutorial on BASIS."— Presentation transcript:

1 Visual Dictionaries Toyota Technological Institute at Chicago George Papandreou CVPR 2014 Tutorial on BASIS

2 2 Additive Image Patch Modeling The patch-based image modeling approach. How to span the space of all 8x8 image patches? α1α1 Σ α2α2 α3α3 K D

3 3 Additive Image Patch Modeling α1α1 Σ α2α2 α3α3 K D The patch-based image modeling approach. How to span the space of all 8x8 image patches?

4 4 Two Modeling Goals Image reconstruction Use dictionary to build image prior Tasks: Compression, denoising, deblurring, inpainting,… Image interpretation Use dictionary for feature extraction Tasks: Classification, recognition,…

5 5 Three Modeling Regimes Two inter-related properties: How big is the dictionary? Over-completeness: How many non-zero components? sparsity PCA Sparse Coding Clustering

6 6 Where Does the Dictionary Come From? (1) Dictionary is fixed, e.g., basis or union of bases JPEG image compression DCT Wavelets

7 7 Where Does the Dictionary Come From? (2) Learn generic dictionary from a collection of images Many algorithms possible (see later)

8 8 Where Does the Dictionary Come From? (3) Learn an image-specific (image-adapted) dictionary Many algorithms possible (see later)

9 9 Where Does the Dictionary Come From? (4) Non-parametric: Dictionary is the set of all overlapping image patches (one or many images) Non-local means, patch transform, etc.

10 10 Beyond Bases: Hierarchical Dictionaries (1) Multi-scale image modeling Apply same dictionary to image at different scales Gaussian+Laplacian pyramids, wavelets, … (2) Recursive hierarchical models Build recursive dictionaries Deep learning

11 11 Key Problems Coding  Find the expansion coefficients given the dictionary Dictionary learning  Given data, learn a dictionary Hierarchical modeling K D

12 12 Image Coding Problem: Least Squares Least squares criterion. Equivalent formulations: Solution (Tikhonov regularization, Wiener filtering): Columns of V are the dual filters (dual dictionary). Fast processing (inner products). Yields dense code.

13 13 Image Coding Problem: Vector Quantization Equivalent formulations: Solution: Exact O(DK): one inner product for each basis Approximate O(D logK): ANN search

14 14 Sparse Coding Problem Assume only L non-zero coefficients: This is a much harder combinatorial problem. In the worst case there are possible active sets. If we knew the active set of coefs, then LS problem. Two very effective families of approximate algorithms: Greedy algorithms Relaxation algorithms

15 15 Greedy Sparse Coding: Matching Pursuit Greedily add T terms one by one Algorithm (Basic Matching Pursuit): 1.Initialize the residual r = x 2.Find atom that best explains the residual 3.Update the residual 4.Return if stopping criterion met, otherwise go to 2. Many variants (e.g., OMP). Efficient implementations. Mallat (2009) SPAMS VQ problem at each iteration

16 16 Basic Matching Pursuit Convergence Analysis Exponential convergence (recall VQ analysis): Dictionary coherence: Note that if spans. Basic matching pursuit costs T times more than VQ.

17 17 Relaxed Sparse Coding Continuous relaxation of the combinatorial problem Prominent case: p = 1 (L1 convex optimization)

18 18 Basis Pursuit Coding L1-penalized problem (a.k.a. basis pursuit, LASSO) Global optimum (convex optimization) Huge literature:  Algorithms for large-scale problems  Recovery guarantees: compressed sensing  Extensions: TV minimization, ADMM Extensions:  Re-weighted L1  Non-convex relaxations: 0 < p < 1 Mallat (2009), Elad (2010) SPAMS

19 19 Thresholding Algorithms Lp-optimization with orthonormal basis Decompose into separable problem: Elad (2010) Look-up table 1-D optimization:  L0 / L1: hard/soft thresholding  L2: linear shrinkage L2 invariant to rotation Lp norm is separable

20 20 Recap: (Sparse) Coding Problem:  Find the expansion coefficients given the dictionary Exact methods  p = 2 (Fourier, PCA, etc): Linear system  p = 0 and = 1 (VQ): Fast search  Orthonormal dictionary: Separable 1-D optimization Approximate methods for sparse coding  p = 0: Greedy matching pursuit  p = 1: Convex relaxation

21 21 Dictionary Learning Find a dictionary W that best fits a dataset Exact solution for L2 norm via the SVD (PCA) For sparse norms this is a hard non-convex problem even if the coding problem is convex Main approach: alternating minimization Recent advances in theory

22 22 Alternating Minimization Methods Update dictionary, given codes Update codes given dictionary  Use any greedy/ relaxation sparse coding algorithm Olshausen & Field (1996); Engan+ (1999); Aharon+ (2006); Mairal+ (2010) Least squares Method converges to local minimum K-SVD: Updates dictionaries sequentially Online version much faster for large datasets

23 23 K-Means as Dictionary Learning Method Update dictionary, given codes Update codes given dictionary such that Aharon+ (2006); Coates, Lee, Ng (2011) Special case of K-SVD using OMP-1 for coding Extremely fast

24 24 Learned Dictionaries Aharon+ (2006) Generic KSVD Barbara KSVD

25 25 Learned Dictionaries Aharon+ (2006), Coates+ (2011), Papandreou+ (2014) Generic KSVD Generic K-Means

26 26 Image Denoising with Learned Dictionaries Noisy 22.1dB Denoised KSVD 30.8dB Aharon+ (2006)

27 27 Image Inpainting with Learned Dictionaries Mairal+ (2010) Joint dictionary learning and image inpainting

28 28 K-SVD vs K-Means Dictionaries in Denoising Noisy 22.12 dB KSVD 32.43 dB OMP-32, 84 sec K-Means 32.25 dB OMP-1, 22 sec Replace K-SVD with K-Means in dictionary learning step of the denoising algorithm.

29 29 Recap: Dictionary Learning Non-convex problem Greedy alternating optimization methods The K-means algorithm is very fast and works well for small image patches Find a dictionary W that best fits a dataset

30 30 Image Patch Dictionaries in Visual Recognition Dictionary >10K words Classifier Patches SIFT SIFT-based Bag-of-Words classification pipeline

31 31 Patch Dictionaries in Image Classification Image classification without SIFT Varma, Zisserman (2003); Coates+ (2011) Key insights:  K-means works well  Whitening is crucial  Using larger dictionaries boosts recognition rate  Encoding has a huge effect on performance  Promising results on CIFAR but not on large image datasets

32 32 Histograms of Sparse Codes for Object Detection Ren, Ramanan (2013); Also see Dikmen, Hoiem, Huang (2012) Key idea: Build a HOG-like descriptor on top of K-SVD learned patch dictionary instead of gradients, then DPM

33 33 Hierarchical Modeling and Dictionary Learning So far: Modeling the appearance of small image patches, say 8x8 pixels. How about dictionaries of larger visual patterns? 1.Multiscale modeling  Work with image pyramids 2.Hierarchical modeling  Model higher order statistics of feature responses  Recursively compose complex visual patterns  Use unsupervised or supervised objectives

34 34 Hierarchical Models of Objects Fidler & Leonardis (2007); Zhu+ (2010)

35 35 Hierarchical Matching Pursuit (K-SVD) Bo, Ren, Fox (2013)

36 36 Deep Convolutional Networks LeCun+ (1998); Krizhevsky+ (2012)

37 37 Transformation Aware Dictionaries How to span the space of all 8x8 image patches? α1α1 Σ α2α2 α3α3 K D

38 38 Sources of Redundancy in Patch Dictionaries How to build less redundant dictionaries? Same pattern, different position Same pattern, opposite polarity (x2 redundancy) Same pattern, different contrast

39 39 The Epitome Data Structure Epitomes: Jojic, Frey, Kannan, ICCV-03 PatchEpitome

40 40 Generating Patches from an Epitome

41 41 Generating Patches from an Epitome A single epitome essentially is a large collection of translated copies of a visual pattern.

42 42 Position and Appearance Transformations

43 43 Epitomic Image Matching Epitomes: Jojic, Frey, Kannan, ICCV-03

44 44 Dictionary of Mini-Epitomes Papandreou, Chen, Yuille, CVPR-14

45 45 Coding and Learning with Epitomic Dictionaries Dictionary learning: Variational inference on GMM model (Jojic+ '01) Sparse dictionary learning (Aharon, Elad '08; Mairal+ '11) Epitomic K-Means (Papandreou+ '14) Patch coding in epitomic dictionaries: Epitomic dictionary equivalent to standard dictionary with patches at all possible positions in epitome:

46 46 K-Means for the Mini-Epitome Model Generative model: 1.Select mini-epitome k with probabilityz 2.Select position p within epitome uniformly 3.Generate the patch Epitomic K-means (hard-EM) 1.Epitomic matching (hard assignment) 2.Epitome update 3.Diverse initialization with K-means++ (optional) Papandreou, Chen, Yuille, CVPR-14

47 47 K-Means for the Mini-Epitome Model Max likelihood, hard EM – essentially epitomic adaptation of K-Means. Generative model: 1. Select mini-epitome k with probability 2. Select position p within epitome uniformly 3. Generate the patch Faster convergence using diverse initialization of mini- epitomes by epitomic adaptation of K-Means++.

48 48 A Generic Mini-Epitome Dictionary Epitomic dictionary 256 mini-epitomes (16x16) Non-Epitomic dictionary 1024 elements (8x8) Both trained on 10,000 Pascal images

49 49 Evaluation on Image Reconstruction Original image Epitome reconstr. PSNR: 29.2 dB Improvement over non- epitome

50 50 Evaluation on Image Reconstruction

51 51 Evaluation on VOC-07 Image Classification

52 52 Max-Pooling vs. Epitomic Convolution Max-pooling Epitomic convolution

53 53 Deep Epitomic Convolutional Nets Convolution+ max-pooling Epitomic convolution Papandreou arXiv-14 Imagenet top-5 error: 14.2(max-pool)  13.6 (epitome)

54 54 Epitomic Patch Matching 1. We have K mini-epitomes (say patch size is 8x8 pixels and mini-epitome size is 12x12 pixels). 2. For each patch in the image and each mini-epitome k = 1:K, find the patch at position p in the epitome which minimizes the reconstruction error (whitening omitted): (12-8+1)^2 = 25 candidate positions/epitome in this example 3. Algorithms: Exact search (GPU, <0.5 sec/image) or ANN or dynamic programming algorithm.

55 55 Epitomic Match vs. Max Pooling 1. Position search equivalent to epitomic convolution: 2. Epitomic convolution is an image-centric alternative to convolution followed by “max-pooling”: * It is much easier to define image prob models based on EC than MP * Evaluation in discr. tasks underway

56 56 Recap: Transformation Aware Dictionaries Reduce dictionary redundancy by explicitly modeling nuisance variables Compact dictionaries for image reconstruction and recognition Epitomes as translation aware data structurez Epitomic convolution as alternative to a pair of consecutive convolution and max-pooling layers in deep networks.

Download ppt "Visual Dictionaries Toyota Technological Institute at Chicago George Papandreou CVPR 2014 Tutorial on BASIS."

Similar presentations

Ads by Google