Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generative Models for Image Understanding Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois.

Similar presentations


Presentation on theme: "Generative Models for Image Understanding Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois."— Presentation transcript:

1 Generative Models for Image Understanding Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois

2 Problem: Summarization of High Dimensional Data Pattern Analysis: –For several classes c=1,..,C of the data, define probability distribution functions p(x| c) Compression: –Define a probabilistic model p(x) and devise an optimal coding approach Video Summary: –Drop most of the frames in a video sequence and keep interesting information that summarizes it.

3 Generative density modeling Find a probability model that –reflects desired structure –randomly generates plausible images, –represents the data by parameters ML estimation p(image|class) used for recognition, detection,...

4 Problems we attacked Transformation as a discrete variable in generative models of intensity images Tracking articulated objects in dense stereo maps Unsupervised learning for video summary Idea - the structure of the generative model reveals the interesting objects we want to extract.

5 Mixture of Gaussians c z The probability of pixel intensities z given that the image is from cluster c is p(z|c) = N(z;  c,  c ) P(c) =  c

6 Mixture of Gaussians c P(c) =  c z p(z|c) = N(z;  c,  c ) Parameters  c,  c and  c represent the data For input z, the cluster responsibilities are P(c|z) = p(z|c)P(c) /  c p(z|c)P(c)

7 Example: Simulation c =1 P(c) =  c z=z= p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

8 Example: Simulation c =2 P(c) =  c z=z= p(z|c) = N(z;  c,  c )      1 = 0.6,   2 = 0.4,

9 Example: Learning - E step c =1 Images from data set z=z= c =2 P(c|z) c 0.52 0.48      1 = 0.5,   2 = 0.5,

10 Example: Learning - E step Images from data set z=z= c c =1 c =2 P(c|z) 0.48 0.52      1 = 0.5,   2 = 0.5,

11 Example: Learning - M step c      1 = 0.5,   2 = 0.5, z Set  1 to the average of zP(c =1 |z) Set  2 to the average of zP(c =2 |z)

12 Example: Learning - M step c      1 = 0.5,   2 = 0.5, z Set  1 to the average of diag((z-  1 ) T (z-  1 ))P(c =1 |z) Set  2 to the average of diag((z-  2 ) T (z-  2 ))P(c =2 |z)

13 Transformation as a Discrete Latent Variable with Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute & ECE, Univ of Illinois at Urbana

14 Kind of data we’re interested in Even after tracking, the features still have unknown positions, rotations, scales, levels of shearing,...

15 One approach Normalization Pattern Analysis Images Normalized images Labor

16 Our approach Joint Normalization and Pattern Analysis Images

17 A continuous transformation moves an image,, along a continuous curve Our subspace model should assign images near this nonlinear manifold to the same point in the subspace What transforming an image does in the vector space of pixel intensities

18 Tractable approaches to modeling the transformation manifold \ Linear approximation - good locally Discrete approximation - good globally

19 Adding “transformation” as a discrete latent variable Say there are N pixels We assume we are given a set of sparse N x N transformation generating matrices G 1,…,G l,…,G L These generate points from point

20 Transformed Mixture of Gaussians  l,  c,  c and  c represent the data The cluster/transf responsibilities, P(c,l|x), are quite easy to compute p(x|z,l) = N(x; G l z,  ) x P(l) =  l l p(z|c) = N(z;  c,  c ) c z P(c) =  c

21 Example: Simulation     l =1 c =1 G 1 = shift left and up, G 2 = I, G 3 = shift right and up z=z= x=x=

22 ML estimation of a Transformed Mixture of Gaussians using EM x l c z E step: Compute P(l|x), P(c|x) and p(z|c,x) for each x in data M step: Set –  c = avg of P(c|x) –  l = avg of P(l|x) –  c = avg mean of p(z|c,x) –  c = avg variance of p(z|c,x) –  = avg var of p(x-G l z|x)

23 Face Clustering Examples of 400 outdoor images of 2 people (44 x 28 pixels)

24 Mixture of Gaussians 15 iterations of EM (MATLAB takes 1 minute) Cluster means c = 1 c = 2 c = 3 c = 4

25 30 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4 Transformed mixture of Gaussians

26 Video Analysis Using Generative Models with Brendan Frey, Nemanja Petrovic and Thomas Huang

27 Idea Use generative models of video sequences to do unsupervised learning Use the resulting model for video summarization, filtering, stabilization, recognition of objects, retrieval, etc.

28 Transformed Hidden Markov Model x l c z x l c z tt-1 P(c,l|past)

29 THMM Transition Models Independent probability distributions for class and transformations; relative motion P(c t, l t | past)= P(c t | c t-1 ) P(d(l t, l t-1 )) Relative motion dependent on the class P(c t, l t | past)= P(c t | c t-1 ) P(d(l t, l t-1 ) | c t ) Autoregressive model for transformation distribution

30 Inference in THMM Tasks: –Find the most likely state at time t given the whole observed sequence {x t } and the model parameters (class means and variances, transition probabilities, etc.) –Find the distribution over states for each time t –Find the most likely state sequence –Learn the parameters that maximize he likelihood of the observed data

31 Video Summary and Filtering x l c z p(x|z,l) = N(x; G l z,  ) p(z|c) = N(z;  c,  c ) Video summary Image segmentation Removal of sensor noise Image Stabilization

32 Example: Learning Hand-held camera Moving subject Cluttered background DATA  c 1 class 121 translations (11 vertical and 11 horizontal shifts)  c 5 classes cccc

33 Examples Normalized sequence Simulated sequence De-noising Seeing through distractions

34 Future work Fast approximate learning and inference Multiple layers Learning transformations from images Nebojsa Jojic: www.ifp.uiuc.edu/~jojic

35 Subspace models of images Example: Image, R 1200 = f (y, R 2 ) Frown Shut eyes

36 y z The density of pixel intensities z given subspace point y is p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Factor analysis (generative PCA) Manifold: f (y) =  +  y, linear

37 Parameters ,  represent the manifold Observing z induces a Gaussian p(y|z): COV[y|z] = (      I)  E[y|z] = COV[y|z]     z y z p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Factor analysis (generative PCA)

38 Example: Simulation Shut eyes Frown  = y z p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Frn SE  =

39 Example: Simulation Shut eyes Frown  = y z p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Frn SE  =

40 Example: Simulation Shut eyes Frown  = y z p(z|y) = N(z;  +  y,  ) p(y) = N(y; 0, I) Frn SE  =

41 y z p(z|y) = N(z;  +  y,  ) Transformed Component Analysis l P(l) =  l p(y) = N(y; 0, I) The probability of observed image x is p(x|z,l) = N(x; G l z,  ) x

42 Example: Simulation Shut eyes Frown  =  = G 1 = shift left & up, G 2 = I, G 3 = shift right & up z l=3 y Frn SE x

43 Example: Inference G 1 = shift left & up, G 2 = I, G 3 = shift right & up z l=3 x y Frn SE z l=2 x y Frn SE z l=1 x y Frn SE G a r b a g e G a r b a g e P(l=1|x) =  P(l=3|x) =  P(l=2|x) = 

44 EM algorithm for TCA Initialize ,  , ,  to random values E Step –For each training case x (t), infer q (t) (l,z,y) = p(l,z,y |x (t) ) M Step –Compute  new,  new,  new,  new,  new to maximize  t E[ log p(y) p(z|y) P(l) p(x (t) |z,l)], where E[] is wrt q (t) (l,z,y) Each iteration increases log p(Data)

45 A tough toy problem 144, 9 x 9 images 1 shape (pyramid) 3-D lighting cluttered background 25 possible locations

46 1st 8 principal components: TCA: 3 components 81 transformations - 9 horiz shifts - 9 vert shifts 10 iters of EM Model generates realistic examples  :1  :2  :3

47 Expression modeling 100 16 x 24 training images variation in expression imperfect alignment

48 PCA: Mean + 1st 10 principal components Factor Analysis: Mean + 10 factors after 70 its of EM TCA: Mean + 10 factors after 70 its of EM

49 Fantasies from FA modelFantasies from TCA model

50 Modeling handwritten digits 200 8 x 8 images of each digit preprocessing normalizes vert/horiz translation and scale different writing angles (shearing) - see “7”

51 TCA: - 29 shearing + translation combinations - 10 components per digit - 30 iterations of EM per digit Mean of each digit Transformed means

52 FA: Mean + 10 components per digit TCA: Mean + 10 components per digit

53 Classification Performance Training: 200 cases/digit, 20 components, 50 EM iters Testing: 1000 cases, p(x|class) used for classification Results: MethodError rate k-nearest neighbors (optimized k)7.6% Factor analysis3.2% Tranformed component analysis2.7% Bonus: P(l|x) infers the writing angle!

54 Wrap-up Papers, MATLAB scripts: www.ifp.uiuc.edu/~jojic www.cs.uwaterloo.ca/~frey Other domains: audio, bioinfomatics, … Other latent image models, p(z) –mixtures of factor analyzers (NIPS99) –layers, multiple objects, occlusions –time series (in preparation)

55 Wrap-up Discrete+Linear Combination: Set some components equal to derivatives of  wrt transformations Multiresolution approach Fast variational methods, belief propagation,...

56 Other generative models Modeling human appearance in stereo images: articulated, self-occluding Gaussians


Download ppt "Generative Models for Image Understanding Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois."

Similar presentations


Ads by Google