Generative Models for Image Understanding Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois.

Slides:



Advertisements
Similar presentations
Real-time on-line learning of transformed hidden Markov models Nemanja Petrovic, Nebojsa Jojic, Brendan Frey and Thomas Huang Microsoft, University of.
Advertisements

Part 2: Unsupervised Learning
Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.
Bayesian Belief Propagation
Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
Image Modeling & Segmentation
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Computer vision: models, learning and inference Chapter 18 Models for style and identity.
Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.
Reducing Drift in Parametric Motion Tracking
3D Human Body Pose Estimation from Monocular Video Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,
EE-148 Expectation Maximization Markus Weber 5/11/99.
Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,
Transformed Component Analysis: Joint Estimation of Image Components and Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
1 Robust Video Stabilization Based on Particle Filter Tracking of Projected Camera Motion (IEEE 2009) Junlan Yang University of Illinois,Chicago.
Principal Component Analysis
Lecture 5: Learning models using EM
Dimensional reduction, PCA
Conditional Random Fields
Audio-Visual Graphical Models Matthew Beal Gatsby Unit University College London Nebojsa Jojic Microsoft Research Redmond, Washington Hagai Attias Microsoft.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Continuous Latent Variables --Bishop
A Unifying Review of Linear Gaussian Models
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Crash Course on Machine Learning
Multimodal Interaction Dr. Mike Spann
Computer vision: models, learning and inference Chapter 19 Temporal models.
INDEPENDENT COMPONENT ANALYSIS OF TEXTURES based on the article R.Manduchi, J. Portilla, ICA of Textures, The Proc. of the 7 th IEEE Int. Conf. On Comp.
Visual Tracking Conventional approach Build a model before tracking starts Use contours, color, or appearance to represent an object Optical flow Incorporate.
University of Toronto Aug. 11, 2004 Learning the “Epitome” of a Video Sequence Information Processing Workshop 2004 Vincent Cheung Probabilistic and Statistical.
Particle Filters.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
2/14/00 Computer Vision. 2/14/00 Computer Vision Lecturer: Ir. Resmana Lim, M.Eng. Text: 1) Computer Vision -- A Modern Approach.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Lecture 2: Statistical learning primer for biologists
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Point Distribution Models Active Appearance Models Compilation based on: Dhruv Batra ECE CMU Tim Cootes Machester.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Visual and auditory scene analysis using graphical models Nebojsa Jojic
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Ch 12. Continuous Latent Variables ~ 12
Classification of unlabeled data:
LOCUS: Learning Object Classes with Unsupervised Segmentation
Dynamical Statistical Shape Priors for Level Set Based Tracking
Probabilistic Models with Latent Variables
Modelling data static data modelling.
SMEM Algorithm for Mixture Models
Transformation-invariant clustering using the EM algorithm
The EM Algorithm With Applications To Image Epitome
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Generative Models for Image Understanding Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois

Problem: Summarization of High Dimensional Data Pattern Analysis: –For several classes c=1,..,C of the data, define probability distribution functions p(x| c) Compression: –Define a probabilistic model p(x) and devise an optimal coding approach Video Summary: –Drop most of the frames in a video sequence and keep interesting information that summarizes it.

Generative density modeling Find a probability model that –reflects desired structure –randomly generates plausible images, –represents the data by parameters ML estimation p(image|class) used for recognition, detection,...

Problems we attacked Transformation as a discrete variable in generative models of intensity images Tracking articulated objects in dense stereo maps Unsupervised learning for video summary Idea - the structure of the generative model reveals the interesting objects we want to extract.

Mixture of Gaussians c z The probability of pixel intensities z given that the image is from cluster c is p(z|c) = N(z;  c,  c ) P(c) =  c

Mixture of Gaussians c P(c) =  c z p(z|c) = N(z;  c,  c ) Parameters  c,  c and  c represent the data For input z, the cluster responsibilities are P(c|z) = p(z|c)P(c) /  c p(z|c)P(c)

Example: Simulation c =1 P(c) =  c z=z= p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

Example: Simulation c =2 P(c) =  c z=z= p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

Example: Learning - E step c =1 Images from data set z=z= c =2 P(c|z) c      1 = 0.5,   2 = 0.5,

Example: Learning - E step Images from data set z=z= c c =1 c =2 P(c|z)      1 = 0.5,   2 = 0.5,

Example: Learning - M step c      1 = 0.5,   2 = 0.5, z Set  1 to the average of zP(c =1 |z) Set  2 to the average of zP(c =2 |z)

Example: Learning - M step c      1 = 0.5,   2 = 0.5, z Set  1 to the average of diag((z-  1 ) T (z-  1 ))P(c =1 |z) Set  2 to the average of diag((z-  2 ) T (z-  2 ))P(c =2 |z)

Transformation as a Discrete Latent Variable with Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute & ECE, Univ of Illinois at Urbana

Kind of data we’re interested in Even after tracking, the features still have unknown positions, rotations, scales, levels of shearing,...

One approach Normalization Pattern Analysis Images Normalized images Labor

Our approach Joint Normalization and Pattern Analysis Images

A continuous transformation moves an image,, along a continuous curve Our subspace model should assign images near this nonlinear manifold to the same point in the subspace What transforming an image does in the vector space of pixel intensities

Tractable approaches to modeling the transformation manifold \ Linear approximation - good locally Discrete approximation - good globally

Adding “transformation” as a discrete latent variable Say there are N pixels We assume we are given a set of sparse N x N transformation generating matrices G 1,…,G l,…,G L These generate points from point

Transformed Mixture of Gaussians  l,  c,  c and  c represent the data The cluster/transf responsibilities, P(c,l|x), are quite easy to compute p(x|z,l) = N(x; G l z,  ) x P(l) =  l l p(z|c) = N(z;  c,  c ) c z P(c) =  c

Example: Simulation     l =1 c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x=x=

ML estimation of a Transformed Mixture of Gaussians using EM x l c z E step: Compute P(l|x), P(c|x) and p(z|c,x) for each x in data M step: Set –  c = avg of P(c|x) –  l = avg of P(l|x) –  c = avg mean of p(z|c,x) –  c = avg variance of p(z|c,x) –  = avg var of p(x-G l z|x)

Face Clustering Examples of 400 outdoor images of 2 people (44 x 28 pixels)

Mixture of Gaussians 15 iterations of EM (MATLAB takes 1 minute) Cluster means c = 1 c = 2 c = 3 c = 4

30 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians

Video Analysis Using Generative Models with Brendan Frey, Nemanja Petrovic and Thomas Huang

Idea Use generative models of video sequences to do unsupervised learning Use the resulting model for video summarization, filtering, stabilization, recognition of objects, retrieval, etc.

Transformed Hidden Markov Model x l c z x l c z tt-1 P(c,l|past)

THMM Transition Models Independent probability distributions for class and transformations; relative motion P(c t, l t | past)= P(c t | c t-1 ) P(d(l t, l t-1 )) Relative motion dependent on the class P(c t, l t | past)= P(c t | c t-1 ) P(d(l t, l t-1 ) | c t ) Autoregressive model for transformation distribution

Inference in THMM Tasks: –Find the most likely state at time t given the whole observed sequence {x t } and the model parameters (class means and variances, transition probabilities, etc.) –Find the distribution over states for each time t –Find the most likely state sequence –Learn the parameters that maximize he likelihood of the observed data

Video Summary and Filtering x l c z p(x|z,l) = N(x; G l z,  ) p(z|c) = N(z;  c,  c ) Video summary Image segmentation Removal of sensor noise Image Stabilization

Example: Learning Hand-held camera Moving subject Cluttered background DATA  c 1 class 121 translations (11 vertical and 11 horizontal shifts)  c 5 classes cccc

Examples Normalized sequence Simulated sequence De-noising Seeing through distractions

Future work Fast approximate learning and inference Multiple layers Learning transformations from images Nebojsa Jojic:

Subspace models of images Example: Image, R 1200 = f (y, R 2 ) Frown Shut eyes

y z The density of pixel intensities z given subspace point y is p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Factor analysis (generative PCA) Manifold: f (y) =  +  y, linear

Parameters ,  represent the manifold Observing z induces a Gaussian p(y|z): COV[y|z] = (      I)  E[y|z] = COV[y|z]     z y z p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Factor analysis (generative PCA)

Example: Simulation Shut eyes Frown  = y z p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Frn SE  =

Example: Simulation Shut eyes Frown  = y z p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Frn SE  =

Example: Simulation Shut eyes Frown  = y z p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Frn SE  =

y z p(z|y) = N(z;  +  y,  ) Transformed Component Analysis l P(l) =  l p(y) = N(y; 0, I) The probability of observed image x is p(x|z,l) = N(x; G l z,  ) x

Example: Simulation Shut eyes Frown  =  = G 1 = shift left & up, G 2 = I, G 3 = shift right & up z l=3 y Frn SE x

Example: Inference G 1 = shift left & up, G 2 = I, G 3 = shift right & up z l=3 x y Frn SE z l=2 x y Frn SE z l=1 x y Frn SE G a r b a g e G a r b a g e P(l=1|x) =  P(l=3|x) =  P(l=2|x) = 

EM algorithm for TCA Initialize ,  , ,  to random values E Step –For each training case x (t), infer q (t) (l,z,y) = p(l,z,y |x (t) ) M Step –Compute  new,  new,  new,  new,  new to maximize  t E[ log p(y) p(z|y) P(l) p(x (t) |z,l)], where E[] is wrt q (t) (l,z,y) Each iteration increases log p(Data)

A tough toy problem 144, 9 x 9 images 1 shape (pyramid) 3-D lighting cluttered background 25 possible locations

1st 8 principal components: TCA: 3 components 81 transformations - 9 horiz shifts - 9 vert shifts 10 iters of EM Model generates realistic examples  :1  :2  :3

Expression modeling x 24 training images variation in expression imperfect alignment

PCA: Mean + 1st 10 principal components Factor Analysis: Mean + 10 factors after 70 its of EM TCA: Mean + 10 factors after 70 its of EM

Fantasies from FA modelFantasies from TCA model

Modeling handwritten digits x 8 images of each digit preprocessing normalizes vert/horiz translation and scale different writing angles (shearing) - see “7”

TCA: - 29 shearing + translation combinations - 10 components per digit - 30 iterations of EM per digit Mean of each digit Transformed means

FA: Mean + 10 components per digit TCA: Mean + 10 components per digit

Classification Performance Training: 200 cases/digit, 20 components, 50 EM iters Testing: 1000 cases, p(x|class) used for classification Results: MethodError rate k-nearest neighbors (optimized k)7.6% Factor analysis3.2% Tranformed component analysis2.7% Bonus: P(l|x) infers the writing angle!

Wrap-up Papers, MATLAB scripts: Other domains: audio, bioinfomatics, … Other latent image models, p(z) –mixtures of factor analyzers (NIPS99) –layers, multiple objects, occlusions –time series (in preparation)

Wrap-up Discrete+Linear Combination: Set some components equal to derivatives of  wrt transformations Multiresolution approach Fast variational methods, belief propagation,...

Other generative models Modeling human appearance in stereo images: articulated, self-occluding Gaussians