LOCUS: Learning Object Classes with Unsupervised Segmentation 16-721: Advanced Perception Nik Melchior April 10, 2006
Learning Flexible Sprites in Video Layers (Jojic & Frey 2001) Outline Learning Flexible Sprites in Video Layers (Jojic & Frey 2001) LOCUS 4/10/06 Nik Melchior - LOCUS
Motivation Challenges Object category recognition and segmentation in cluttered scenes Challenges Spatial variability Texture variability Occlusion Pose Lighting 4/10/06 Nik Melchior - LOCUS slide: Kumar, Torr, Zisserman
Motivation Segmentation should (ideally) be Given an image and object category, to segment the object Object Category Model Segmentation Cow Image so far, we've seen attempts to do this using: - template-like outline matching - DT, Chamfer distance - HoG - fragments - ISM Segmented Cow Segmentation should (ideally) be shaped like the object e.g. cow-like obtained efficiently in an unsupervised manner able to handle self-occlusion 4/10/06 Nik Melchior - LOCUS slide: Fei-Fei
Flexible Sprites Provides a stable segmentation of whole moving objects in video Each flexible sprite is a separate layer containing the transformation of a rigid sprite 4/10/06 Nik Melchior - LOCUS
Big Picture Generative model Seeks the best explanation for the pixels of all images, constrained by number of layers Operates directly on pixels rather than low-level features Rigorous formulation similar to DDMCMC 4/10/06 Nik Melchior - LOCUS
Flexible Sprites User specifies number of layers Each layer contains: Sprite appearance 's' Real-valued mask 'm' Discretized simple transformation 'T' “permutation matrix that rearranges the pixels in s” 𝑥=𝑇𝑚∗𝑇𝑠𝑇 𝑚 ∗𝑏𝑛𝑜𝑖𝑠𝑒 4/10/06 Nik Melchior - LOCUS
Appearance, mask, and transform Zero-mean Gaussian noise Recursive formula for image appearance Formulation 𝑝𝑥∣ 𝑠 𝑙 , 𝑚 𝑙 , 𝑇 𝑙 =𝑁 𝑥; 𝐿 𝐿 𝑇 𝑖 𝑚 𝑖 ∗ 𝑇 𝑙 𝑚 𝑙 ∗ 𝑇 𝑙 𝑠 𝑙 , Assuming independence of each sprite class, appearance, mask, and transformation in each layer Transformation prior (uniform) Class prior (to be learned) Mask Appearance priors: - gaussian for s, m - uniform for T need to learn: - prior over c (pi) - means and variances of sprite appearances and masks - noise beta - uniform prior over transformation due in part to discretization used 𝑝𝑥, 𝑠 𝑙 , 𝑚 𝑙 , 𝑇 𝑙 =𝑁 𝑥; 𝐿 𝐿 𝑇 𝑖 𝑚 𝑖 ∗ 𝑇 𝑙 𝑚 𝑙 ∗ 𝑇 𝑙 𝑠 𝑙 , ∗ 𝐿 𝑁 𝑠 𝑙 ; 𝑐 𝑙 , 𝑐 𝑙 𝑁 𝑚 𝑙 ; 𝑐 𝑙 , 𝑐 𝑙 𝑐 𝑙 𝑇 𝑙 4/10/06 Nik Melchior - LOCUS
Formulation Use EM to maximize the posterior 𝑝 𝑐 𝑙 , 𝑇 𝑙 , 𝑠 𝑙 , 𝑚 𝑙 ∣𝑋 = 𝑝 𝑥, 𝑐 𝑙 , 𝑇 𝑙 , 𝑠 𝑙 , 𝑚 𝑙 𝑝 𝑥, 𝑐 𝑙 , 𝑇 𝑙 , 𝑠 𝑙 , 𝑚 𝑙 ≈ 𝐿 𝑞 𝑐 𝑙 , 𝑇 𝑙 𝑞 𝑠 𝑙 𝑞 𝑚 𝑙 exact posterior intractable - (CJ)^L - C = # classes - J = # transformations - L = # layers - factorize to make it linear CJL compute KL divergence of 2 distributions 𝐷= 𝐿 𝑞 𝑐 𝑙 , 𝑇 𝑙 𝑞 𝑠 𝑙 𝑞 𝑚 𝑙 =ln 𝑝 𝑐 𝑙 , 𝑇 𝑙 , 𝑠 𝑙 , 𝑚 𝑙 ∣𝑥 𝐿 𝑞 𝑐 𝑙 , 𝑇 𝑙 𝑞 𝑠 𝑙 𝑞 𝑚 𝑙 4/10/06 Nik Melchior - LOCUS
Formulation Maximize F 𝐹=𝐷ln𝑝𝑥 = 𝐿 𝑞 𝑐 𝑙 , 𝑇 𝑙 𝑞 𝑠 𝑙 𝑞 𝑚 𝑙 =ln 𝑝 𝑥, 𝑐 𝑙 , 𝑇 𝑙 , 𝑠 𝑙 , 𝑚 𝑙 𝐿 𝑞 𝑐 𝑙 , 𝑇 𝑙 𝑞 𝑠 𝑙 𝑞 𝑚 𝑙 Maximize F log of numerator is a sum of Mahalanobis distances q ln 1/q is the entropy of q 4/10/06 Nik Melchior - LOCUS
E step For each single image: the mean should stay close to the overall mean, but large where image does not match background variance should be high where the transformed image is similar to the background similar update for xi_T, the transform, which ensures that image matches transformed sprite 4/10/06 Nik Melchior - LOCUS
M step 4/10/06 Nik Melchior - LOCUS
Results 1 fps at 320x240 4/10/06 Nik Melchior - LOCUS
LOCUS Learning Object Classes with Unsupervised Segmentation 4/10/06 Nik Melchior - LOCUS
Comparison to Flexible Sprites Classes instead of single sprites more complex appearance model addition of edge model 2 layers: object and model weak prior of palettes (appearance) one for object, one for background 4/10/06 Nik Melchior - LOCUS
Background appearance λ0 LOCUS model Shared between images Class shape π Class edge sprite μo,σo Deformation field D Position & size T Different for each image Emphasise class model (edge + mask) (shared) – all other variables per-image. Emphasise LEARN EVERYTHING SIMULTANEOUSLY. Mask m Edge image e Background appearance λ0 Object appearance λ1 4/10/06 Nik Melchior - LOCUS Image Winn and Jojic, 2005
LOCUS model Deformation field D Markov Random Field Transform T scaling and translation rotation or full affine transform possible Mask m Edge model e Appearance model Gaussian mixture model (histogram of colors or textures) D: MRF penalises squared distance b/w neighboring vectors m: favors short segmentation boundaries e: not applied to background due to expected noise 4/10/06 Nik Melchior - LOCUS
EM formulation Posterior similar to flexible sprites 𝑃, 𝑜 , 𝑜 , 𝑏 , 𝑏 ∣𝐼≈ 𝑖 𝑄 𝑖 𝑄 𝑖 𝑜 , 𝑖 𝑜 𝑄 𝑖 𝑏 , 𝑖 𝑏 Posterior similar to flexible sprites Iteratively optimize over 9 latent variables - mixture model parameters - binary mask - position and size transformation - deformation field - real-valued mask - object edge model - background edge model ,𝑚,𝑇,𝐷,, 𝑜 , 𝑜 , 𝑏 , 𝑏 4/10/06 Nik Melchior - LOCUS
Results 4/10/06 Nik Melchior - LOCUS
Results Comparison to Borenstein fragments Percentages indicate correctly labeled pixels across all images Last row removes class shape and edge information 4/10/06 Nik Melchior - LOCUS
Results 4/10/06 Nik Melchior - LOCUS