Bilinear models and Riemannian metrics for motion classification Fabio Cuzzolin Microsoft Research, Cambridge, UK 11/7/2006.

Slides:



Advertisements
Similar presentations
The geometry of of relative plausibilities
Advertisements

Lectureship Early Career Fellowship School of Technology, Oxford Brookes University 19/6/2008 Fabio Cuzzolin INRIA Rhone-Alpes.
On the credal structure of consistent probabilities Department of Computing School of Technology, Oxford Brookes University 19/6/2008 Fabio Cuzzolin.
Machine learning and imprecise probabilities for computer vision
Learning Riemannian metrics for motion classification Fabio Cuzzolin INRIA Rhone-Alpes Computational Imaging Group, Pompeu Fabra University, Barcellona.
Coherent Laplacian 3D protrusion segmentation Oxford Brookes Vision Group Queen Mary, University of London, 11/12/2009 Fabio Cuzzolin.
Gestures Recognition. Image acquisition Image acquisition at BBC R&D studios in London using eight different viewpoints. Sequence frame-by-frame segmentation.
1 Gesture recognition Using HMMs and size functions.
We consider situations in which the object is unknown the only way of doing pose estimation is then building a map between image measurements (features)
Bilinear models for action and identity recognition Oxford Brookes Vision Group 26/01/2009 Fabio Cuzzolin.
Machine Learning for Vision-Based Motion Analysis Learning pullback metrics for linear models Oxford Brookes Vision Group Oxford Brookes University 17/10/2008.
Evidential modeling for pose estimation Fabio Cuzzolin, Ruggero Frezza Computer Science Department UCLA.
On the properties of relative plausibilities Computer Science Department UCLA Fabio Cuzzolin SMC05, Hawaii, October
Lectureship A proposal for advancing computer graphics, imaging and multimedia design at RGU Robert Gordon University Aberdeen, 20/6/2008 Fabio Cuzzolin.
FEATURE PERFORMANCE COMPARISON FEATURE PERFORMANCE COMPARISON y SC is a training set of k-dimensional observations with labels S and C b C is a parameter.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
Patch to the Future: Unsupervised Visual Prediction
Chapter 4: Linear Models for Classification
Face Recognition & Biometric Systems, 2005/2006 Face recognition process.
Shape and Dynamics in Human Movement Analysis Ashok Veeraraghavan.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Shape and Dynamics in Human Movement Analysis Ashok Veeraraghavan.
Principal Component Analysis
Probabilistic video stabilization using Kalman filtering and mosaicking.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Face Recognition Based on 3D Shape Estimation
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Lightseminar: Learned Representation in AI An Introduction to Locally Linear Embedding Lawrence K. Saul Sam T. Roweis presented by Chan-Su Lee.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Computer vision: models, learning and inference
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Face Recognition: An Introduction
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
Separating Style and Content with Bilinear Models Joshua B. Tenenbaum, William T. Freeman Computer Examples Barun Singh 25 Feb, 2002.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Chapter 8. Learning of Gestures by Imitation in a Humanoid Robot in Imitation and Social Learning in Robots, Calinon and Billard. Course: Robots Learning.
2D-LDA: A statistical linear discriminant analysis for image matrix
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Learning pullback action manifolds Heriot Watt University, 26/5/2010 Fabio Cuzzolin Oxford Brookes Vision Group.
Big data classification using neural network
Nonlinear Dimensionality Reduction
Intrinsic Data Geometry from a Training Set
Motion Segmentation with Missing Data using PowerFactorization & GPCA
Recognizing Deformable Shapes
Unsupervised Riemannian Clustering of Probability Density Functions
Computer Vision, Robotics, Machine Learning and Control Lab
René Vidal and Xiaodong Fan Center for Imaging Science
René Vidal Time/Place: T-Th 4.30pm-6pm, Hodson 301
Machine Learning Basics
Dynamical Statistical Shape Priors for Level Set Based Tracking
Machine Learning Dimensionality Reduction
Outline Multilinear Analysis
The Functional Space of an Activity Ashok Veeraraghavan , Rama Chellappa, Amit Roy-Chowdhury Avinash Ravichandran.
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
K Nearest Neighbor Classification
Learning with information of features
Generally Discriminant Analysis
Separating Style and Content with Bilinear Models Joshua B
Concave Minimization for Support Vector Machine Classifiers
Separating Style and Content with Bilinear Models Joshua B
Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.
Presentation transcript:

Bilinear models and Riemannian metrics for motion classification Fabio Cuzzolin Microsoft Research, Cambridge, UK 11/7/2006

Myself gesture recognition Masters thesis on gesture recognition at the University of Padova Visiting student, ESSRL, Washington University in St. Louis theory of belief functions Ph.D. thesis on the theory of belief functions Young researcher in Milan with the Image and Sound Processing group Post-doc at UCLA in the Vision Lab

My research research Discrete mathematics linear independence on lattices Belief functions and imprecise probabilities geometric approach algebraic analysis combinatorial analysis Computer vision object and body tracking data association gesture and action recognition identity recognition

Todays talk Motion classification Motion classification is one of most popular vision problems Applications: surveillance, biometric, human- computer interaction Issue: influence of nuisance factors Bilinear models for invariant gaitID Issue: choice of distance function Learning Riemannian metrics for motion classification

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distance between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

GaitID Biometrics increasingly popular Cooperative methods: face recognition, retinal analysis Surveillance context: non-cooperative users The problem: recognizing the identity of humans from their gait Methods: dimensionality reduction, silhouette analysis Issues: nuisance factors, viewpoint dependence

A brief review Gait signatures: Silhouettes [Collins 02, Wang 03] Optical flow, velocity moments, shape symmetry, static body parameters Baseline algorithm [Sarkar 05] Computes similarity scores between a probe sequence and each gallery (training) sequence by pairwise frame correlation Methodologies: mostly pattern recognition after dimensionality reduction Eigenspaces [Abdelkader 01] PCA/MDA [Tolliver 03, Han 04] Stochastic models (HMMs): [Kale 02, Debrunner 00] KL-divergence between Markov models

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distance between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

The view-invariance issue nuisance factors Many different nuisance factors are involved Viewpoint Illumination Clothes, shoes, carried objects trajectory view-invariance Issue: view-invariance possible approaches: 3D tracking Virtual view reconstruction Static body parameters

Approches to view-invariant gait ID [Cunado 99]: Evidence gathering technique coupled oscillators, Fourier description, inclination of thigh and leg [Urtasun,Fua 04]: fitting 3D temporal motion models to synchronized video sequences Motion parameters: coefficients of the singular value decomposition of the estimated model angles [Bhanu,Han 02] matching a 3D kinematic model to 2D silhouettes extracting a number of feature angles from the fitted model [Kale 03]: synthetic side-view of the moving person using a single camera [Shakhnarovich 01]:view-normalization from volumetric intersection of the visual hulls [Johnson, Bobick 01]: static body parameters recovered across multiple views

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distance between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

Bilinear models style invariance From view-invariance to style invariance motions usually possess several labels: action, identity, viewpoint, emotional state, etc. Bilinear models Bilinear models (Tenenbaum) can be used to separate the influence of two of those factors, called style and content (the label to classify) y SC is a training set of k-dimensional observations with labels S and C b C is a parameter vector representing content, while A S is a style-specific linear map mapping the content space onto the observation space

Bilinear models The content of an observation can be thought of as a vector in an abstract content space of some dimension J bCbC ASAS y SC Observations are then derived from content vector linearly, through a map which depends on the style parameter S

Learning an asymmetric bilinear model Given an observation sequence y SC … SVD Y=SUV of a stacked observation matrix an asymmetric bilinear model can fitted to the data through the SVD Y=SUV of a stacked observation matrix The symmetric model can be written as Y=AB where least square optimal style and content parameters are

Content classification of unknown style Consider a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint) when new motions are acquired in which a known person is walking from a different viewpoint (unknown style)… … an iterative EM procedure can be set up to classify the content (identity) E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for the unknown style s

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distance between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

Hidden Markov models Finite-state representation of an observation process State process {X k } is a Markov chain Given a sequence os observations (feature matrix) EM algorithm for parameter learning (Moore) A->transition probabilities (motion dynamics) C-> means of state-output distributions (poses)

Motions as stacked HMMs Interpretation of the C matrix: columns of C are means of the output distributions associated with the states of the model Interpretation of the C matrix: columns of C are means of the output distributions associated with the states of the model In gaitID (cyclic motions) the dynamics is the same for all sequences (A neglected) stacked columns of the C matrix A sequence can then be represented as a collection of poses: stacked columns of the C matrix

Three-layer model First layer (feature representation): projection of the contour of the silhouette on a sheaf of lines passing through the center 1 Third layer: bilinear model of HMMs 3 2 In the second layer each sequence is encoded as a Markov model, its C matrix is stacked in an observation vector, and a bilinear model is trained over those vectors

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distance between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

6 cameras Mobo database: 25 people performing 4 different walking actions, from 6 cameras action, id, view Each sequence has three labels: action, id, view MOBO database

Four experiments one label is chosen as contentanother one as style We can then set up four experiments in which one label is chosen as content, another one as style, and the remaining is considered as a nuisance factor contentstylenuisance actionview-invariant action recognition viewIDactionID-invariant action recognition IDviewIDaction-invariantgaitID actionviewIDview-invariantgaitID viewaction

Results – ID versus VIEW baseline algorithm Compared performances with baseline algorithm and straight k-NN on sequence HMMs

Results – ID versus action ID vs action experiment Performance of the bilinear classifier in the ID vs action experiment as a function of the nuisance (view=1:5), averaged over all the possible choices of the test action. The average best-match performance of the bilinear classifier is shown in solid red, (minimum and maximum in magenta). The best-3 matches ratio is in dotted red. The average performance of the KL-nearest neighbor classifier is shown in solid black, minimum and maximum in blue. Pure chance is in dashed black.

Feature extraction projection of the contour Type 1: projection of the contour of the silhouette on a sheaf of lines passing through the center Type 2: size functions [Frosini 90] Lees moments Type 3: Lees moments

Results - influence of features ID-invariant action recognition Left: ID-invariant action recognition using the bilinear classifier. The entire dataset is considered, regardless the viewpoint. The correct classification percentage is shown as a function of the test identity in black (for models using Lee's features) and red (contour projections). Related mean levels are drawn as dotted lines. View-invariant action recognition Right: View-invariant action recognition.

Conclusions Nuisance factors Nuisance factors of paramount importance in gaitID Bilinear-multilinear models Bilinear-multilinear models provide a way to separate different factors three-layer model Proposed a three-layer model in which sequence are represented through HMMs expensive and sensitive Some approaches to view-invariance are expensive and sensitive Experiments on the Mobo database show how much separating factor is effective for motion classification Future: multilinear models, testing on more realistic setups (many factors, UCF database)

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distances between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

Distances between dynamical models Problem: motion classification linear dynamical model Approach: representing each movement as a linear dynamical model for instance, each image sequence can be mapped to an ARMA, or AR linear model distance function in the space of dynamical models Classification is then reduced to find a suitable distance function in the space of dynamical models We can then use this distance in any distance-based classification scheme: k-NN, SVM, etc.

A review of the literature Some distances have been proposed Fisher information matrix a family of probability distributions depending on a n- dimensional parameter can be regarded in fact as an n- dimensional manifold, with Fisher information matrix [Amari] Kullback-Leibler divergence Kullback-Leibler divergence Gap metric Gap metric [Zames,El-Sakkary]: compares graphs associated with linear systems thought of as input-output maps Cepstrum norm Cepstrum norm [Martin] Subspace angles Subspace angles between column spaces of the observability matrices

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distances between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

Learning metrics from a training set All those metrics are task-specific Besides, it makes no sense to choose a single distance for all possible classification problems as… Labels can be assigned arbitrarily to dynamical systems, no matter what the underlying structure is When some a-priori info is available (training set).... we can learn in a supervised fashion the best metric for the classification problem!.. we can learn in a supervised fashion the best metric for the classification problem! volume minimization of A feasible approach: volume minimization of pullback metrics pullback metrics

Learning distances Of course many unsupervised algorithms take an input dataset and embed it in some other space, implicitly learning a metric (LLE, Laplacian Eigenmaps, etc.) they fail to learn a full metric for the whole input space, but only images of a set of samples optimal Mahalanobis distance [Xing, Jordan]: maximizes classification performance for linear maps y=A 1/2 x > optimal Mahalanobis distance reduces to convex optimization relevant component analysis [Shental et al]: relevant component analysis – changes the feature space by a global linear transformation which assigns large weights to relevant dimensions" and low weights to irrelevant dimensions

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distances between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

Learning pullback metrics Some notions of differential geometry give us a tool to build a parameterized family of metrics pullback metrics The diffeomorphism F induces on M a family of pullback metrics geodesics The geodesics of the pullback metric are the liftings of the geodesics associated with the original metric Consider than a family of diffeomorphisms F between the original space M and a metric space N M F N D

Pullback metrics - detail Diffeomorphism Diffeomorphism on M: Push-forward Push-forward map: Given a metric on M, g:TM TM, the pullback metric pullback metric is

Inverse volume Inverse volume: Inverse volume maximization The natural criterion would be to optimize the classification performance In a nonlinear setup this is hard to formulate and solve Reasonable to choose a different but related objective function Effect: finding the manifold which better interpolates the data (i.e. forcing the geodesics to pass through crowded regions)

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distances between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

Space of AR(2) models Given an input sequence, we can identify the parameters of the linear model which better describes it We chose the class of autoregressive models of order 2 AR(2) Fisher metric on AR(2) to get a distance: compute the geodesics of the pullback metric on M

Under stability (|a|<1) and minimality (b 0) this family forms a manifold Space of M(1,1,1) models Consider instead the class of stable discrete-time linear systems of order 1 After choosing a canonical setting c = 1 the transfer function becomes h(z) = b/(z a) Fisher tensor:

Families of diffeomorphisms We chose two different families of diffeomorphisms For AR(2) systems: For M(1,1,1) systems:

Bilinear models for invariant gaitID The identity recognition problem View-invariance in gaitID Bilinear models HMMs and a three-layer model Four experiments on the Mobo database Riemannian metrics for classification Distances between dynamical models Learning a metric from a training set Pullback metrics Spaces of linear systems and Fisher metric Experiments on scalar models

Classification of scalar models recognition of actions and identities from image sequences we used the Mobo database scalar feature, AR(2) and M(1,1,1) models compared performance of all known distances, with pullback Fisher metric built the geodesic distance used NN algorithm to classify new sequences

Results - action Action recognition performance, all views considered – second best distance function Action recognition performance, all views considered – pullback Fisher metric Action recognition, view 5 only – difference between classification rates pullback metric – second best

Results – action 2 Recognition performance of the second-best distance (blue) and the optimal pull-back metric (red), increasing size of training set View 1View 5 View 3View 6

Effect of the training set The size of the training set obviously affects the recognition rate Systems of the class M(1,1,1) Increasing size of the training set on the abscissae All views considered View 2 only

Conclusions Movements can be represented as dynamical systems Motion classification then reduces to finding a distance between dynamical model having a training set of such models we can learn the best metric for a given classification problem… … and use it to classify new sequences Pullback metrics induced by the Fisher metric structure on linear models is a possible choice Design of a family of diffeomorphisms Future: multidimensional observations, better objective function