Download presentation

Published byDalton Topliff Modified over 3 years ago

1
**Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS**

Toyota Technological Institute at Chicago Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS

2
**Additive Image Patch Modeling**

The patch-based image modeling approach. How to span the space of all 8x8 image patches? α1 Σ α2 α3 K D Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article)

3
**Additive Image Patch Modeling**

The patch-based image modeling approach. How to span the space of all 8x8 image patches? α1 α2 Σ α3 K D Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article)

4
**Two Modeling Goals Image reconstruction**

Use dictionary to build image prior Tasks: Compression, denoising, deblurring, inpainting,… Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article) Image interpretation Use dictionary for feature extraction Tasks: Classification, recognition,…

5
**Three Modeling Regimes**

Two inter-related properties: How big is the dictionary? Over-completeness: How many non-zero components? sparsity PCA Sparse Coding Clustering Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article)

6
**Where Does the Dictionary Come From?**

(1) Dictionary is fixed, e.g., basis or union of bases JPEG image compression DCT Wavelets

7
**Where Does the Dictionary Come From?**

(2) Learn generic dictionary from a collection of images Many algorithms possible (see later)

8
**Where Does the Dictionary Come From?**

(3) Learn an image-specific (image-adapted) dictionary Many algorithms possible (see later)

9
**Where Does the Dictionary Come From?**

(4) Non-parametric: Dictionary is the set of all overlapping image patches (one or many images) Non-local means, patch transform, etc.

10
**Beyond Bases: Hierarchical Dictionaries**

(1) Multi-scale image modeling Apply same dictionary to image at different scales Gaussian+Laplacian pyramids, wavelets, … (2) Recursive hierarchical models Build recursive dictionaries Deep learning

11
**Key Problems Coding Dictionary learning Hierarchical modeling**

Find the expansion coefficients given the dictionary Dictionary learning Given data, learn a dictionary Hierarchical modeling K D

12
**Image Coding Problem: Least Squares**

Least squares criterion. Equivalent formulations: Solution (Tikhonov regularization, Wiener filtering): Columns of V are the dual filters (dual dictionary). Fast processing (inner products). Yields dense code.

13
**Image Coding Problem: Vector Quantization**

Equivalent formulations: Solution: Exact O(DK): one inner product for each basis Approximate O(D logK): ANN search

14
**Sparse Coding Problem Assume only L non-zero coefficients:**

This is a much harder combinatorial problem. In the worst case there are possible active sets. If we knew the active set of coefs, then LS problem. Two very effective families of approximate algorithms: Greedy algorithms Relaxation algorithms

15
**Greedy Sparse Coding: Matching Pursuit**

Greedily add T terms one by one Algorithm (Basic Matching Pursuit): Initialize the residual r = x Find atom that best explains the residual Update the residual Return if stopping criterion met, otherwise go to 2. VQ problem at each iteration Many variants (e.g., OMP). Efficient implementations. Mallat (2009) SPAMS

16
**Basic Matching Pursuit Convergence Analysis**

Exponential convergence (recall VQ analysis): Dictionary coherence: Note that if spans Basic matching pursuit costs T times more than VQ.

17
Relaxed Sparse Coding Continuous relaxation of the combinatorial problem Prominent case: p = 1 (L1 convex optimization)

18
Basis Pursuit Coding L1-penalized problem (a.k.a. basis pursuit, LASSO) Global optimum (convex optimization) Huge literature: Algorithms for large-scale problems Recovery guarantees: compressed sensing Extensions: TV minimization, ADMM Extensions: Re-weighted L1 Non-convex relaxations: 0 < p < 1 Mallat (2009), Elad (2010) SPAMS

19
**Thresholding Algorithms**

Lp-optimization with orthonormal basis Decompose into separable problem: L2 invariant to rotation Lp norm is separable Look-up table 1-D optimization: L0 / L1: hard/soft thresholding L2: linear shrinkage Elad (2010)

20
**Recap: (Sparse) Coding**

Problem: Find the expansion coefficients given the dictionary Exact methods p = 2 (Fourier, PCA, etc): Linear system p = 0 and = 1 (VQ): Fast search Orthonormal dictionary: Separable 1-D optimization Approximate methods for sparse coding p = 0: Greedy matching pursuit p = 1: Convex relaxation

21
**Dictionary Learning Find a dictionary W that best fits a dataset**

Exact solution for L2 norm via the SVD (PCA) For sparse norms this is a hard non-convex problem even if the coding problem is convex Main approach: alternating minimization Recent advances in theory

22
**Alternating Minimization Methods**

Update codes given dictionary Use any greedy/ relaxation sparse coding algorithm Update dictionary , given codes Least squares Method converges to local minimum K-SVD: Updates dictionaries sequentially Online version much faster for large datasets Olshausen & Field (1996); Engan+ (1999); Aharon+ (2006); Mairal+ (2010)

23
**K-Means as Dictionary Learning Method**

Update codes given dictionary such that Update dictionary , given codes Special case of K-SVD using OMP-1 for coding Extremely fast Aharon+ (2006); Coates, Lee, Ng (2011)

24
Learned Dictionaries Generic KSVD Barbara KSVD Aharon+ (2006)

25
**Learned Dictionaries Generic KSVD Generic K-Means**

Aharon+ (2006), Coates+ (2011), Papandreou+ (2014)

26
**Image Denoising with Learned Dictionaries**

Noisy 22.1dB Denoised KSVD 30.8dB Aharon+ (2006)

27
**Image Inpainting with Learned Dictionaries**

Joint dictionary learning and image inpainting Mairal+ (2010)

28
**K-SVD vs K-Means Dictionaries in Denoising**

Replace K-SVD with K-Means in dictionary learning step of the denoising algorithm. KSVD dB OMP-32, 84 sec K-Means dB OMP-1, 22 sec Noisy dB

29
**Recap: Dictionary Learning**

Find a dictionary W that best fits a dataset Non-convex problem Greedy alternating optimization methods The K-means algorithm is very fast and works well for small image patches

30
**Image Patch Dictionaries in Visual Recognition**

SIFT-based Bag-of-Words classification pipeline Dictionary >10K words Patches SIFT Classifier

31
**Patch Dictionaries in Image Classification**

Image classification without SIFT Key insights: K-means works well Whitening is crucial Using larger dictionaries boosts recognition rate Encoding has a huge effect on performance Promising results on CIFAR but not on large image datasets Varma, Zisserman (2003); Coates+ (2011)

32
**Histograms of Sparse Codes for Object Detection**

Key idea: Build a HOG-like descriptor on top of K-SVD learned patch dictionary instead of gradients, then DPM Ren, Ramanan (2013); Also see Dikmen, Hoiem, Huang (2012)

33
**Hierarchical Modeling and Dictionary Learning**

So far: Modeling the appearance of small image patches, say 8x8 pixels. How about dictionaries of larger visual patterns? Multiscale modeling Work with image pyramids Hierarchical modeling Model higher order statistics of feature responses Recursively compose complex visual patterns Use unsupervised or supervised objectives

34
**Hierarchical Models of Objects**

Fidler & Leonardis (2007); Zhu+ (2010)

35
**Hierarchical Matching Pursuit (K-SVD)**

Bo, Ren, Fox (2013)

36
**Deep Convolutional Networks**

LeCun+ (1998); Krizhevsky+ (2012)

37
**Transformation Aware Dictionaries**

How to span the space of all 8x8 image patches? α1 α2 Σ α3 K D Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article)

38
**Sources of Redundancy in Patch Dictionaries**

Same pattern, different position Same pattern, opposite polarity (x2 redundancy) Same pattern, different contrast How to build less redundant dictionaries?

39
**The Epitome Data Structure**

Patch Epitome Epitomes: Jojic, Frey, Kannan, ICCV-03

40
**Generating Patches from an Epitome**

41
**Generating Patches from an Epitome**

A single epitome essentially is a large collection of translated copies of a visual pattern.

42
**Position and Appearance Transformations**

43
**Epitomic Image Matching**

Epitomes: Jojic, Frey, Kannan, ICCV-03

44
**Dictionary of Mini-Epitomes**

Papandreou, Chen, Yuille, CVPR-14

45
**Coding and Learning with Epitomic Dictionaries**

Patch coding in epitomic dictionaries: Epitomic dictionary equivalent to standard dictionary with patches at all possible positions in epitome: Dictionary learning: Variational inference on GMM model (Jojic+ '01) Sparse dictionary learning (Aharon, Elad '08; Mairal+ '11) Epitomic K-Means (Papandreou+ '14)

46
**K-Means for the Mini-Epitome Model**

Generative model: Select mini-epitome k with probabilityz Select position p within epitome uniformly Generate the patch Epitomic K-means (hard-EM) Epitomic matching (hard assignment) Epitome update Diverse initialization with K-means++ (optional) Papandreou, Chen, Yuille, CVPR-14

47
**K-Means for the Mini-Epitome Model**

Generative model: 1. Select mini-epitome k with probability 2. Select position p within epitome uniformly 3. Generate the patch Max likelihood, hard EM – essentially epitomic adaptation of K-Means. Faster convergence using diverse initialization of mini- epitomes by epitomic adaptation of K-Means++.

48
**A Generic Mini-Epitome Dictionary**

Epitomic dictionary 256 mini-epitomes (16x16) Non-Epitomic dictionary 1024 elements (8x8) Both trained on 10,000 Pascal images

49
**Evaluation on Image Reconstruction**

Original image Epitome reconstr. PSNR: 29.2 dB Improvement over non- epitome

50
**Evaluation on Image Reconstruction**

51
**Evaluation on VOC-07 Image Classification**

52
**Max-Pooling vs. Epitomic Convolution**

53
**Deep Epitomic Convolutional Nets**

Convolution+ max-pooling Epitomic convolution Imagenet top-5 error: 14.2(max-pool) 13.6 (epitome) Papandreou arXiv-14

54
**Epitomic Patch Matching**

1. We have K mini-epitomes (say patch size is 8x8 pixels and mini-epitome size is 12x12 pixels). 2. For each patch in the image and each mini-epitome k = 1:K, find the patch at position p in the epitome which minimizes the reconstruction error (whitening omitted): (12-8+1)^2 = 25 candidate positions/epitome in this example 3. Algorithms: Exact search (GPU, <0.5 sec/image) or ANN or dynamic programming algorithm.

55
**Epitomic Match vs. Max Pooling**

1. Position search equivalent to epitomic convolution: 2. Epitomic convolution is an image-centric alternative to convolution followed by “max-pooling”: * It is much easier to define image prob models based on EC than MP * Evaluation in discr. tasks underway

56
**Recap: Transformation Aware Dictionaries**

Reduce dictionary redundancy by explicitly modeling nuisance variables Compact dictionaries for image reconstruction and recognition Epitomes as translation aware data structurez Epitomic convolution as alternative to a pair of consecutive convolution and max-pooling layers in deep networks.

Similar presentations

OK

Color Imaging 2004 1 Analysis of Spatio-chromatic Decorrelation for Colour Image Reconstruction Mark S. Drew and Steven Bergner

Color Imaging 2004 1 Analysis of Spatio-chromatic Decorrelation for Colour Image Reconstruction Mark S. Drew and Steven Bergner

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google