Dynamic Scene Reconstruction using GPCA

Dynamic Scene Reconstruction using GPCA
René Vidal Center for Imaging Science Johns Hopkins University Add d15.avi

Vision based landing of a UAV
Landing on the ground Tracking two meter waves Rene Vidal

Probabilistic pursuit-evasion games
Hierarchical control architecture High-level: map building and pursuit policy Mid-level: trajectory planning and obstacle avoidance Low-level: regulation and control Rene Vidal

Formation control of nonholonomic robots
Examples of formations Flocks of birds School of fish Satellite clustering Automatic highway systems Rene Vidal

Reconstruction of static scenes
Structure from motion and 3D reconstruction Input: Corresponding points in multiple images Output: camera motion, scene structure, calibration Theory Multiview geometry: Multiple view matrix Multiple view normalized epipolar constraint Linear self-calibration Algorithms Multiple view matrix factorization algorithm Multiple view factorization for planar motions Rene Vidal

Reconstruction of dynamic scenes
Multibody Structure from Motion 2D Motion Segmentation: multibody brightness constancy constraint 3D Motion Segmentation: multibody fundamental matrix multibody trifocal tensor Generalized PCA Segmentation of mixtures of subspaces of unknown and varying dimensions Segmentation of algebraic varieties: bilinear & trilinear Rene Vidal

Principal Component Analysis (PCA)
Applications: data compression, regression, image analysis (eigenfaces), pattern recognition Identify a linear subspace S from sample points As we all know, Principal Component Analysis problem refers to the problem of estimating a SINGLE subspace from sample data points. Although there are various ways of solving PCA, a simple solution consist of building a matrix with all the data points, computing its SVD, and then extracting a basis for the subspace from the columns of the U matrix, and the dimension of the subspace from the rank of the U matrix. There is no question that PCA is one of the most popular techniques for dimensionality reduction in various engineering disciplines. In computer vision, in particular, a successful application has been found in face recognition under the name of eigenfaces. Basis for S Rene Vidal

Generalized PCA (GPCA)
Extensions of PCA Probabilistic PCA (Tipping-Bishop ’99) Identify subspace from noisy data Gaussian noise: standard PCA Noise in exponential family (Collins et al.’01) Nonlinear PCA (Scholkopf-Smola-Muller ’98) Identify a nonlinear manifold from sample points Embed data in a higher dimensional space and apply standard PCA What embedding should be used? There have been various attempts to generalize PCA in different directions. For example, Probabilistic PCA considers the case of noisy data and tries to estimate THE subspace in a maximum likelihood sense. For the case of noise in the exponential family, Collins has shown that this can be done using convex optimization techniques. Other extensions considered the case of data lying on manifold, the so-called NonLinear PCA or Kernel PCA. This problem is usually solved by embedding the data into a higher dimensional space and then assuming that the embedded data DOES live in a linear subspace. Of course the correct embedding to use depends on the problem at hand, and learning the embedding is a current a topic of research in the machine learning community. A third extension considers the case of identifying multiple subspaces at the same time. It is this case extension the one I will talk about in this talk under the name of Generalized PCA. Mixtures of PCA (Tipping-Bishop ’99) Identify a collection of subspaces from sample points Generalized PCA (GPCA) Rene Vidal

Applications of GPCA in vision and control
Geometry Vanishing points Image compression Segmentation Intensity (black-white) Texture Motion (2-D, 3-D) Scene (host-guest) Recognition Faces (Eigenfaces) Man - Woman Human Gaits Dynamic Textures Water-steam Biomedical imaging Hybrid systems identification One of the reasons we are interested in GPCA is because there are various problems in computer vision that have to do with the simultaneous estimation of multiple models from visual data. Consider for example segmenting an image into different regions based on intensity, texture of motion information. Consider also the recognition of various static and dynamic processes such as human faces or human gaits from visual data. Although in this talk I will only consider the first class of problems, it turns out that, at least from a mathematical perspective, all the above problems can be converted into following generalization of principal component analysis, which we conveniently refer to as GPCA Rene Vidal

Generalized Principal Component Analysis
Given points on multiple subspaces, identify The number of subspaces and their dimensions A basis for each subspace The segmentation of the data points “Chicken-and-egg” problem Given segmentation, estimate subspaces Given subspaces, segment the data Prior work Iterative algorithms: K-subspace (Ho et al. ’03), RANSAC, subspace selection and growing (Leonardis et al. ’02) Probabilistic approaches learn the parameters of a mixture model using e.g. EM (Tipping-Bishop ‘99): Initialization? Geometric approaches: 2 planes in R3 (Shizawa-Maze ’91) Factorization approaches: independent, orthogonal subspaces of equal dimension (Boult-Brown ‘91, Costeira-Kanade ‘98, Kanatani ’01) Given a set of data points lying on a collection of linear subspaces, without knowing which point corresponds to which subspace and without even knowing the number of subspaces, estimate the number of subspaces, a basis for each subspace and the segmentation of the data. As stated, the GPCA problem is one of those challenging CHICKEN-AND-EGG problems in the following sense. If we knew the segmentation of the data points, and hence the number of subspaces, then we could solve the problem by applying standard PCA to each group. On the other hand, if we had a basis for each subspace, we could easily assign each point to each cluster. The main challenge is that we neither know the segmentation of the data points nor a basis for each subspace, and the questions is how to estimate everything from the data only. Previous geometric approaches to GPCA have been proposed in the context of motion segmentation, and under the assumption that the subspaces do NOT intersect. Such approaches are based on first clustering the data, using for example K-means or spectral clustering techniques, and then applying standard PCA to each group. Statistical methods, on the other hand, model the data as a mixture model whose parameters are the mixing proportions and the basis for each subspace. The estimation of the mixture is a (usually non-convex) optimization problem that can be solved with various techniques. One of the most popular ones is the expectation maximization algorithm (EM) which alternates between the clustering of the data and the estimation of the subspaces. Unfortunately, the convergence of such iterative algorithms depends on good initialization. At present, initialization is mostly done either at random, using another iterative algorithm such as K-means, or else in an ad-hoc fashion depending on the particular application. It is here where our main motivation comes into the picture. We would like to know if we can find a way of initializing iterative algorithms in an algebraic fashion. For example, if you look at the two lines in this picture, there has to be a way of estimating those two lines directly from data in a one shot solution, at least in the ABSENCE of noise. Rene Vidal

Our approach to segmentation: GPCA
Towards an analytic solution to segmentation Can we estimate ALL models simultaneously using ALL data? When can we do so analytically? In closed form? Is there a formula for the number of models? We consider the most general case Subspaces of unknown and possibly different dimensions Subspaces may intersect arbitrarily (not only at the origin) Subspaces do not need to be orthogonal We propose an algebraic geometric approach to data segmentation Number of subspaces = degree of a polynomial Subspace basis = derivatives of a polynomial Subspace clustering is algebraically equivalent to Polynomial fitting Polynomial differentiation We are therefore interested in finding an analytic solution to the above segmentation problem. In particular we would like to know if we can estimate all the subspaces simultaneously from all the data without having to iterate between data clustering and subspace estimation. Furthermore we would like to know if we can do so in an ANALYTIC fashion, and even more, we would like to understand under what conditions we can do it in closed form and whether there is a formula for the number of subspaces. It turns out that all these questions can be answered using tools from algebraic geometry. Since at first data segmentation and algebraic geometry may seem to be totally disconnected, let me first tell you what the connection is. We will model the number of subspaces as the degree of a polynomial and the subspace parameters (basis) as the roots of a polynomial, or more precisely as the factors of the polynomial. With this algebraic geometric algebraic geometric interpretation in mind, we will be able to prove the following the GPCA problem. First, there are some geometric constraints that do not depend on the segmentation of the data. Therefore, the chicken-and-egg dilemma can be resolved since the segmentation of the dada can be algebraically eliminated. Second, one can prove that the problem has a unique solution, which can be obtained in closed form if and only if the number of subspaces is less than or equal to four And third, the solution can be computed using linear algebraic techniques. In the presence of noise, one can use the algebraic solution to GPCA to initialize any iterative algorithm, for example EM. In the case of zero-mean Gaussian noise, we will show that the E-step can be algebraically eliminated, since we will derive an objective function that depends on the motion parameters only. Rene Vidal

Introductory example: algebraic clustering in 1D
A -> c Number of groups? Rene Vidal

How to compute n, c, b’s? Number of clusters Cluster centers Solution is unique if Solution is closed form if A -> c Rene Vidal

What about dimension 2? What about higher dimensions? Complex numbers in higher dimensions? How to find roots of a polynomial of quaternions? Instead Project data onto one or two dimensional space Apply same algorithm to projected data Rene Vidal

Intensity-based image segmentation
Apply GPCA to vector of image intensities N=#pixels n=3 groups Black Gray White n=3 groups Black Gray White Say what the correct segmentation is , number of groups Say what kind of segmentation we are looking for, nor recognition, not objects, just intensity. Rene Vidal

Intensity-based image segmentation
Black group Gray group White group 147 x 221 Put times here Time: 2 (sec) 10 (sec) (sec) Rene Vidal

Representing one subspace
One plane One line One subspace can be represented with Set of linear equations Set of polynomials of degree 1 Rene Vidal

Representing n subspaces
Two planes One plane and one line Plane: Line: A union of n subspaces can be represented with a set of homogeneous polynomials of degree n De Morgan’s rule Rene Vidal

Fitting polynomials to data points
Polynomials can be written linearly in terms of the vector of coefficients by using polynomial embedding Coefficients of the polynomials can be computed from nullspace of embedded data Solve using least squares N = #data points Veronese map Rene Vidal

Finding a basis for each subspace
Case of hyperplanes: Only one polynomial Number of subspaces Basis are normal vectors Problems Computing roots may be sensitive to noise The estimated polynomial may not perfectly factor with noisy Cannot be applied to subspaces of different dimensions Polynomials are estimated up to change of basis, hence they may not factor, even with perfect data Polynomial Factorization Alg. (PFA) [CVPR 2003] Find roots of polynomial of degree in one variable Solve linear systems in variables Solution obtained in closed form for Rene Vidal

Finding a basis for each subspace
To learn a mixture of subspaces we just need one positive example per class Theorem: GPCA by differentiation Rene Vidal

Choosing one point per subspace
With noise and outliers Polynomials may not be a perfect union of subspaces Normals can estimated correctly by choosing points optimally Distance to closest subspace without knowing segmentation? Rene Vidal

Subspaces of different dimensions
There are multiple polynomials fitting the data The derivative of each polynomial gives a different normal vector Can obtain a basis for the subspace by applying PCA to normal vectors Rene Vidal

GPCA: Algorithm summary
Apply polynomial embedding to projected data Obtain multiple subspace model by polynomial fitting Solve to obtain Need to know number of subspaces Obtain bases & dimensions by polynomial differentiation Optimally choose one point per subspace using distance Rene Vidal

Dealing with high-dimensional data
Minimum number of points K = dimension of ambient space n = number of subspaces In practice the dimension of each subspace ki is much smaller than K Number and dimension of the subspaces is preserved by a linear projection onto a subspace of dimension Can remove outliers by robustly fitting the subspace Open problem: how to choose projection? PCA? Subspace 1 Subspace 2 Rene Vidal

GPCA with spectral clustering
Build a similarity matrix between pairs of points Use eigenvectors to cluster data How to define a similarity for subspaces? Want points in the same subspace to be close Want points in different subspaces to be far Use GPCA to get basis Distance: subspace angles Rene Vidal

Comparison of PFA, PDA, K-sub, EM
Rene Vidal

Image compression using GPCA (courtesy of K Huang)
More efficient representations? Pixel-based representation Fixed Basis Discrete Fourier transform (DFT) or discrete cosine transform (DCT) (Ahmed ’74) . JPEG uses DCT transform. Wavelet transforms (multi-resolution) (Daubechies’88, Mallat’92). JPEG-2000 uses multi-resolution bior4.4 wavelet. Adaptive Basis Karhunen-Loeve transform (KLT), also known as PCA (Jain’89) KLT (PCA) is known to be the optimal transform if signals are assumed to be drawn from one stationary 2nd-order stochastic model (Effros’95). Sparse Component Analysis (Fields’96, Donoho et. al’00) Rene Vidal

Single adaptive linear model PCA stack Multiple adaptive linear models stack GPCA Rene Vidal

KLT Harr Wavelet GPCA Original Number of basis. Rene Vidal

Texture-based image segmentation
GPCA: 24 (sec) Human Rene Vidal

Texture-based image segmentation
Rene Vidal

Face clustering under varying illumination
Yale Face Database B n = 3 faces N = 64 views per face K = 30x40 pixels per face Apply PCA to obtain 3 principal components Apply GPCA in homoge-neous coordinates ( ) n=3 subspaces dimensions 3, 2 and 2 Rene Vidal

2-D and 3-D motion segmentation
A static scene: multiple 2-D motion models A dynamic scene: multiple 3-D motion models Given an image sequence, determine Number of motion models (affine, Euclidean, etc.) Motion model: affine (2-D) or Euclidean (3-D) Segmentation: model to which each pixel belongs Rene Vidal

A unified approach to motion segmentation
Applies to most motion models in computer vision All motion models can be segmented algebraically by Fitting multibody model: real or complex polynomial to all data Fitting individual model: differentiate polynomial at a data point Rene Vidal

3-D motion segmentation problem
Mathematical solution Extract a set of points in each frame Track those points in time and/or establish correspondences Cluster point trajectories in time Estimate motion of each group GPCA for affine cameras or linear motion models Quadratic surface analysis for bilinear motion models such as fundamental matrices or planar homographies Rene Vidal

3-D motion segmentation for affine cameras
Structure = 3D surface Motion = camera position and orientation Affine camera model p = point f = frame Motion of one rigid-body lives in a 4-D subspace (Boult and Brown ’91, Tomasi and Kanade ‘92) P = #points F = #frames Rene Vidal

3-D motion segmentation for affine cameras
Sequence A Sequence B Sequence C Percentage of correct classification Rene Vidal

Temporal video segmentation
Segmenting N=30 frames of a sequence containing n=3 scenes Host Guest Both Image intensities are output of linear system Apply GPCA to fit n=3 observability subspaces dynamics appearance images x t + 1 = A v y C w Rene Vidal

Temporal video segmentation
Segmenting N=60 frames of a sequence containing n=3 scenes Burning wheel Burnt car with people Burning car Image intensities are output of linear system Apply GPCA to fit n=3 observability subspaces dynamics appearance images x t + 1 = A v y C w Rene Vidal

Modeling nonmoving dynamic textures
Output of a linear dynamical model (Soatto et al. 2001) Dynamic texture segmentation using level-sets (Doretto et al. 2003) dynamics x t + 1 = A v y C w images appearance Rene Vidal

Optical flow of a moving dynamic texture
Time varying model Static textures: optical flow is computed from brightness constancy constraint (BCC) Dynamic textures: optical flow computed from dynamic texture constancy constraint (DTCC) dynamics images appearance Rene Vidal

Segmenting nonmoving dynamic textures
One dynamic texture lives in the observability subspace Multiple textures live in multiple subspaces Apply PCA to image intensities Apply GPCA to projected data images Rene Vidal

Optical flow of multiple moving dynamic textures
Segmentation of multiple moving subspaces Apply PCA to intensities in a moving time window Apply GPCA to projected data Rene Vidal

MRI-based heart motion analysis
Goal: recognize multiple types of arrhythmias using heart MRI images Model: visual dynamics are modeled as multiple dynamic textures Heart motion: nonrigid Chest motion: respiration Segmentation by Intensity Original image Rene Vidal Dynamic texture segmentation Morphological operations

MRI-based heart motion analysis
Heart/Chest segmentation result for various slices nash Rene Vidal

DTI-based spinal cord injury detection
Diffusion tensor imaging: at each voxel have Intensity of the voxel Orientation of the voxel: tensor/ellipsoid Can reconstruct fibers in the tissue Spinal cord injury detection: regions where fibers no longer exist Rene Vidal

DTI-based spinal cord injury detection
Axial segmentation of fibers in white matter Injury detection Rene Vidal

Fiber clustering results
Rene Vidal

Summary GPCA: algorithm for clustering subspaces
Deals with unknown and possibly different dimensions Deals with arbitrary intersections among the subspaces Our approach is based on Projecting data onto a low-dimensional subspace Fitting polynomials to projected subspaces Differentiating polynomials to obtain a basis Applications in image processing and computer vision Image compression and segmentation 2-D and 3-D motion segmentation Temporal video segmentation Dynamic texture segmentation Hybrid system identification In this paper, we have presented a new way of looking at segmentation problems that is based on algebraic geometric techniques. In essence we showed that segmentation is equivalent to polynomial factorization and presented an algebraic solution for the case of mixtures of subspaces, which is a natural generalization of the well known Principal Component Analysis (PCA). Of course, as this theory is fairly new, there are still a lot of open problems. For example, how to do polynomial factorization in the presence of noise, or how to deal with outliers. We are currently working on such problems and expect to report our results in the near future. Also, we are currently working on generalizing the ideas in this paper to the cases of subspaces of different dimensions and to other types of data. In fact, I cannot resist doing some advertising for my talk tomorrow, where I will talk about the segmentation of multiple rigid motions, which corresponds to the case of bilinear data. Finally, before taking your questions, I would like to say that I think it is time that we start thinking about putting segmentation problems in a theoretical footing. I think this can be done by building a mathematical theory of segmentation that not only incorporates well known statistical modeling techniques, but also the geometric constraints inherent in various applications in computer vision. We hope that by combining the algebraic geometric approach to segmentation presented in this paper with the more standard statistical approaches, we should be able to build a new mathematical theory of segmentation in the followi should be able to build such a theory, which a nice mathematical theory of segmentation. Such a theory should find applications in various disciplines, including of course lots of segmentation and recognition problems in computer vision. Rene Vidal

References Rene Vidal

Thanks to collaborators and PhD students
University of Illinois Yi Ma Kun Huang Johns Hopkins University Jacopo Piazzi Alvina Goh Avinash Ravichandran Australian National University Richard Hartley UCLA Stefano Soatto UC Berkeley Shankar Sastry Rene Vidal

Dynamic Scene Reconstruction using GPCA

Similar presentations

Presentation on theme: "Dynamic Scene Reconstruction using GPCA"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Scene Reconstruction using GPCA

Similar presentations

Presentation on theme: "Dynamic Scene Reconstruction using GPCA"— Presentation transcript:

Similar presentations

About project

Feedback