Presentation on theme: "Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS"— Presentation transcript:
1 Visual Dictionaries George Papandreou CVPR 2014 Tutorial on BASIS Toyota Technological Institute at ChicagoVisual DictionariesGeorge PapandreouCVPR 2014 Tutorial on BASIS
2 Additive Image Patch Modeling The patch-based image modeling approach.How to span the space of all 8x8 image patches?α1Σα2α3KDAtoms shown correspond to real part of the filters (source the SPM DT-CWT review article)
3 Additive Image Patch Modeling The patch-based image modeling approach.How to span the space of all 8x8 image patches?α1α2Σα3KDAtoms shown correspond to real part of the filters (source the SPM DT-CWT review article)
4 Two Modeling Goals Image reconstruction Use dictionary to build image priorTasks: Compression, denoising, deblurring, inpainting,…Atoms shown correspond to real part of the filters (source the SPM DT-CWT review article)Image interpretationUse dictionary for feature extractionTasks: Classification, recognition,…
5 Three Modeling Regimes Two inter-related properties:How big is the dictionary? Over-completeness:How many non-zero components?sparsityPCASparse CodingClusteringAtoms shown correspond to real part of the filters (source the SPM DT-CWT review article)
6 Where Does the Dictionary Come From? (1) Dictionary is fixed, e.g., basis or union of basesJPEG image compressionDCTWavelets
7 Where Does the Dictionary Come From? (2) Learn generic dictionary from a collection of imagesMany algorithms possible (see later)
8 Where Does the Dictionary Come From? (3) Learn an image-specific (image-adapted) dictionaryMany algorithms possible (see later)
9 Where Does the Dictionary Come From? (4) Non-parametric: Dictionary is the set of all overlapping image patches (one or many images)Non-local means, patch transform, etc.
10 Beyond Bases: Hierarchical Dictionaries (1) Multi-scale image modelingApply same dictionary to image at different scalesGaussian+Laplacian pyramids, wavelets, …(2) Recursive hierarchical modelsBuild recursive dictionariesDeep learning
11 Key Problems Coding Dictionary learning Hierarchical modeling Find the expansion coefficients given the dictionaryDictionary learningGiven data, learn a dictionaryHierarchical modelingKD
12 Image Coding Problem: Least Squares Least squares criterion. Equivalent formulations:Solution (Tikhonov regularization, Wiener filtering):Columns of V are the dual filters (dual dictionary).Fast processing (inner products). Yields dense code.
13 Image Coding Problem: Vector Quantization Equivalent formulations:Solution:Exact O(DK): one inner product for each basisApproximate O(D logK): ANN search
14 Sparse Coding Problem Assume only L non-zero coefficients: This is a much harder combinatorial problem. In the worst case there are possible active sets.If we knew the active set of coefs, then LS problem.Two very effective families of approximate algorithms:Greedy algorithmsRelaxation algorithms
15 Greedy Sparse Coding: Matching Pursuit Greedily add T terms one by oneAlgorithm (Basic Matching Pursuit):Initialize the residual r = xFind atom that best explains the residualUpdate the residualReturn if stopping criterion met, otherwise go to 2.VQ problem at each iterationMany variants (e.g., OMP). Efficient implementations.Mallat (2009)SPAMS
16 Basic Matching Pursuit Convergence Analysis Exponential convergence (recall VQ analysis):Dictionary coherence:Note that if spansBasic matching pursuit costs T times more than VQ.
17 Relaxed Sparse CodingContinuous relaxation of the combinatorial problemProminent case: p = 1 (L1 convex optimization)
18 Basis Pursuit CodingL1-penalized problem (a.k.a. basis pursuit, LASSO)Global optimum (convex optimization)Huge literature:Algorithms for large-scale problemsRecovery guarantees: compressed sensingExtensions: TV minimization, ADMMExtensions:Re-weighted L1Non-convex relaxations: 0 < p < 1Mallat (2009), Elad (2010)SPAMS
19 Thresholding Algorithms Lp-optimization with orthonormal basisDecompose into separable problem:L2 invariant to rotationLp norm is separableLook-up table 1-D optimization:L0 / L1: hard/soft thresholdingL2: linear shrinkageElad (2010)
20 Recap: (Sparse) Coding Problem:Find the expansion coefficients given the dictionaryExact methodsp = 2 (Fourier, PCA, etc): Linear systemp = 0 and = 1 (VQ): Fast searchOrthonormal dictionary: Separable 1-D optimizationApproximate methods for sparse codingp = 0: Greedy matching pursuitp = 1: Convex relaxation
21 Dictionary Learning Find a dictionary W that best fits a dataset Exact solution for L2 norm via the SVD (PCA)For sparse norms this is a hard non-convex problem even if the coding problem is convexMain approach: alternating minimizationRecent advances in theory
22 Alternating Minimization Methods Update codes given dictionaryUse any greedy/ relaxation sparse coding algorithmUpdate dictionary , given codesLeast squaresMethod converges to local minimumK-SVD: Updates dictionaries sequentiallyOnline version much faster for large datasetsOlshausen & Field (1996); Engan+ (1999); Aharon+ (2006); Mairal+ (2010)
23 K-Means as Dictionary Learning Method Update codes given dictionary such thatUpdate dictionary , given codesSpecial case of K-SVD using OMP-1 for codingExtremely fastAharon+ (2006); Coates, Lee, Ng (2011)
27 Image Inpainting with Learned Dictionaries Joint dictionary learning and image inpaintingMairal+ (2010)
28 K-SVD vs K-Means Dictionaries in Denoising Replace K-SVD with K-Means in dictionary learning step of the denoising algorithm.KSVD dBOMP-32, 84 secK-Means dBOMP-1, 22 secNoisy dB
29 Recap: Dictionary Learning Find a dictionary W that best fits a datasetNon-convex problemGreedy alternating optimization methodsThe K-means algorithm is very fast and works well for small image patches
31 Patch Dictionaries in Image Classification Image classification without SIFTKey insights:K-means works wellWhitening is crucialUsing larger dictionaries boosts recognition rateEncoding has a huge effect on performancePromising results on CIFAR but not on large image datasetsVarma, Zisserman (2003); Coates+ (2011)
32 Histograms of Sparse Codes for Object Detection Key idea: Build a HOG-like descriptor on top of K-SVD learned patch dictionary instead of gradients, then DPMRen, Ramanan (2013); Also see Dikmen, Hoiem, Huang (2012)
33 Hierarchical Modeling and Dictionary Learning So far: Modeling the appearance of small image patches, say 8x8 pixels.How about dictionaries of larger visual patterns?Multiscale modelingWork with image pyramidsHierarchical modelingModel higher order statistics of feature responsesRecursively compose complex visual patternsUse unsupervised or supervised objectives
44 Dictionary of Mini-Epitomes Papandreou, Chen, Yuille, CVPR-14
45 Coding and Learning with Epitomic Dictionaries Patch coding in epitomic dictionaries:Epitomic dictionary equivalent to standard dictionary with patches at all possible positions in epitome:Dictionary learning:Variational inference on GMM model (Jojic+ '01)Sparse dictionary learning (Aharon, Elad '08; Mairal+ '11)Epitomic K-Means (Papandreou+ '14)
46 K-Means for the Mini-Epitome Model Generative model:Select mini-epitome k with probabilityzSelect position p within epitome uniformlyGenerate the patchEpitomic K-means (hard-EM)Epitomic matching (hard assignment)Epitome updateDiverse initialization with K-means++ (optional)Papandreou, Chen, Yuille, CVPR-14
47 K-Means for the Mini-Epitome Model Generative model:1. Select mini-epitome k with probability2. Select position p within epitome uniformly3. Generate the patchMax likelihood, hard EM – essentially epitomic adaptation of K-Means.Faster convergence using diverse initialization of mini- epitomes by epitomic adaptation of K-Means++.
48 A Generic Mini-Epitome Dictionary Epitomic dictionary256 mini-epitomes (16x16)Non-Epitomic dictionary1024 elements (8x8)Both trained on 10,000 Pascal images
49 Evaluation on Image Reconstruction Original imageEpitome reconstr.PSNR: 29.2 dBImprovement over non- epitome
54 Epitomic Patch Matching 1. We have K mini-epitomes (say patch size is 8x8 pixels and mini-epitome size is 12x12 pixels).2. For each patch in the image and each mini-epitome k = 1:K, find the patch at position p in the epitome which minimizes the reconstruction error (whitening omitted):(12-8+1)^2 = 25 candidate positions/epitome in this example3. Algorithms: Exact search (GPU, <0.5 sec/image) or ANN or dynamic programming algorithm.
55 Epitomic Match vs. Max Pooling 1. Position search equivalent to epitomic convolution:2. Epitomic convolution is an image-centric alternative to convolution followed by “max-pooling”:* It is much easier to define image prob models based on EC than MP* Evaluation in discr. tasks underway
56 Recap: Transformation Aware Dictionaries Reduce dictionary redundancy by explicitly modeling nuisance variablesCompact dictionaries for image reconstruction and recognitionEpitomes as translation aware data structurezEpitomic convolution as alternative to a pair of consecutive convolution and max-pooling layers in deep networks.