Learning pullback action manifolds Heriot Watt University, 26/5/2010 Fabio Cuzzolin Oxford Brookes Vision Group.

Learning pullback action manifolds Heriot Watt University, 26/5/2010 Fabio Cuzzolin Oxford Brookes Vision Group

Learning pullback action manifolds The role of dynamics in action recognition Pullback metrics and distances Pullback distances b/w dynamical models The case of autoregressive AR2 models Results on gait recognition The case of hidden Markov models Results on action recognition

Action recognition action recognition is a very natural application of vision however, action/activity recognition is a much harder problem than it may look extreme variability, nuisance factors: illumination, background, viewpoint, locality, time warping, temporal segmentation... recently, methods that “ignore” or avoid to encode dynamics have proven effective most common: features extracted from spatio-temporal volumes [Kim and Cipolla, Laptev et al]

Spatio-temporal methods actions “in the wild”, Hollywood dataset Spatio-temporal shapes [Gorelick et al, PAMI07]

Dynamical models for action recognition encoding action dynamics can be useful when dynamics is discriminative effective for time-warping and temporal segmentation HMMs, for instance, (or LDS, [Doretto]) have long been used for “elementary action” recognition less effective for activity representation → more complex models (NLDS, [Chaudry et al 2009])

Identifying dynamical models idea: suppose the an observation sequence is generated by some sort of dynamical model such model can be deterministic and linear (Linear Dynamical System), deterministic and non linear (NLDS) stochastic (hidden Markov model, VLMM) typically, the observation sequence is that generated by feature extraction on individual images... IDENTIFICATION MODEL PARAMETERS IMAGE SEQUENCE FEATURE SEQUENCE

Classifying dynamical models classify actions becomes classifying models: how? suppose a training set has been acquired: if models are stochastic → pick label of maximal likelihood one OR, measure distances b/w them, and pick label of closest one TRAINING SET TEST MODEL

it makes no sense to choose a single distance for all possible classification problems as…... labels can be assigned arbitrarily to dynamical systems, no matter what their structure is Example: identity, action, emotional state, etc when some a-priori info is available (training set).... we can learn in a supervised fashion the “best” metric for the classification problem! Learning metrics

in the linear case, we can find linear maps that optimize classification [Xing, Jordan]: maximizes classification performance for linear maps y=A 1/2 x  > optimal Mahalanobis distance [Shental et al]: relevant component analysis – changes the feature space by a global linear transformation which assigns large weights to “relevant dimensions” but dynamical models live in nonlinear spaces! central notion: Riemannian manifold will still look for (nonlinear) mappings The linear case

Learning pullback metrics consider a Riemannian manifold (endowed with a metric) consider a family of automorphisms (differentiable maps) F between the original space M and itself any automorphism F induces on M a “pullback” metric

Pullback metrics - detail automorphism on M: push-forward map: given a metric on M, g:TM  TM , the pullback metric is

Families of pullback metrics in a Riemannian metric, distances are measured along geodesics (shortest paths) for pullback metrics, geodesics are “liftings” of the original ones we can then easily compute distances using pullbacks now, if we design a family of automorphisms depending on one parameter lambda...... we get an entire parameterized family of pullback metrics but then, we can optimize upon it to find the “optimal” one! of course, we need to decide what to optimize!

Manifolds of dynamical models Now, dynamical models can form manifolds! Fisher information matrix [Amari] on a family of probability distributions for ARMA or ARX systems metrics/norms have been defined → gap metric [Zames], cepstrum norm [Martin], subspace angles [DeCock]... all task specific! for some others (e.g. HMMs) only pseudo distances (Kullback-Leibler divergence)

Framework: pullback metrics/distances b/w models General framework for learning pullback metrics/distances.... of dynamical models of a given class

Procedure Step 1: use each image sequence to identify a dynamical model Step 2: understand the manifold structure of the space of models Step 3: design an appropriate parameterized family of automorphisms Step 4: optimize an appropriate objective function

natural criterion → optimize classification performance can be done if training set of models is labeled in a nonlinear setup this is hard to formulate and solve → use cross validation purely geometric objective function, inverse volume around the dataset D finds the manifold which better interpolates the data, as geodesics have to pass through “crowded” regions sensible when training set is unlabeled What do we optimize, exactly?

Space of scalar AR(2,1) models For each input sequence...... we identify the parameters of an autoregressive model of order 2 AR(2)‏ Fisher metric on AR(2)‏ Compute the geodesics of the pullback metric on M

An automorphism for AR(2,1) multiplies each simplicial coordinate by a normalized factor lambda stretches the triangle towards the vertex with the largest lambda

Multidimensional AR(2,p) case case of AR models with p output channels (p-dimensional observations) necessary to cope with realistic feature vectors to simplify, we can assume the channels independent Manifold AR(2,p) becomes simply the product of the individual triangles for AR(2,1)

Product and global automph we can easily design two automorphisms: Product automorphism → simply apply a scalar automorphism to each channel separately Global automorphism → multiply by a normalized positive value each coordinate of the system in the simplex AR(2,p)

Exps on ID recognition on Mobo database experiments on action and ID recognition on the Mobo database Silhouette based features (come with the dataset) Use NN to classify videos identified as AR2 models

What is it that we did identity of 25 people from 6 different views (hard!)‏ pullback metrics based on the two different automorphisms: product and global for the product automorphism, we were able to analytically compute the parameters which optimize the inverse volume in both cases, we also maximized classification performance by cross validation compared performances with a few classical applicable a-priori distances: Fisher, Frobenius distance b/w HMMs...

Correct ID versus viewpoint performance of competing metrics computed for image sequences coming from a single view, from 1 to 6, 15 people Fisher geodesic, Frobenius HMM, optimal pullback 1, optimal pullback 2, inverse volume opt

Dependence on #IDs The performance degrades (but not much!) when the problem becomes more challenging (more people to distinguish) Tests on all viewpoints, separately

Influence of parameters Left: performance plotted vs size of training set Right: performance plotted vs # automorphism parameter samples

Hidden Markov models Have been extensively used for both action and identity recognition Encode dynamics and allow temporal segmentation, time warping Too simple for complex activities, but good for elementary actions Finite-state Markovian approximation of motion trajectory

Example of HMM

Some distances b/w HMMs HMMs do not enjoy a proper manifold structure (at least, it has not been proved yet!) No Fisher metric is analytically known However, several distances or pseudo-distances have been proposed: Kullback-Leibler divergence (of the output prob distributions generated by two models) Combined Frobenius norm: |A-A'| F + |C-C'| F Modified Bhattacharyya distance

The space of HMMs The parameters of a HMM are: The transition matrix A The matrix C collecting the means of all the state- output Gaussian distributions Hence, having fixed the number of states N and the dimensionality of the observations D...... the space of all HMMs is the product M = H A  H C

The transition matrix manifold Transition matrices are “stochastic” matrices All their columns are (conditional) probability distributions, hence they sum to 1 “transition manifold” → product of N such simplices

The observation manifold Observation manifold is in principle R d, but when a training set is available we can try and approximate it Option: embed all training observation (feature) vectors, and approximate the embedded cloud by mixture of Gaussians

An automorphism of H as H is a product space, we need to design an automorphism for both components, H A and H C a simple automorphism of the transition space H A is a generalization of the scalar one seen in the AR(2,p) case for H C we can: design a map of each column of C as point of the (approximate) observation space by using as coordinates of each column its density values with respect to the mixture of Gaussians

An automorphism of H C

Action recognition experiments Weizmann dataset: 9 different actions per formed by 16 subjects, quite static; KTH: 6 different actions performed by 25 subjects; four different scenarios are considered Features: “action snippets” [Schindler and Van Gool] Features are computed inside rectangular bounding box two independent pipelines associated with shape (Gabor filters) and motion features (optical flow) HMM identification: observations = 20 such features from the first pipeline. HMM parameters identified using the EM algorithm, N=3 states optimal pullback metric induced by autom. of H optimization of classification rate by cross validation

Weizmann sample videos

KTH sample videos A few sample videos from the KTH dataset Boxing, handwaving, walking...

Snippets – some details Two pipelines of “action snippets” feature extraction

Performance – KTH dataset Improvement on action recognition rates on the KTH dataset associated with pullback Frobenius (left) and pullback modied Bhattacharyya (right) distances Subsets of original datasets chosen for speed/challenge

Performance - Weizmann Effect of pullback classification on action recognition Random subsets of the Weizmann dataset with 27 training and test sequences

Effect of sampling The effect of the density of sampling in the automorphism's parameter space Weizmann dataset 15 runs: first with 110 parameter samples, second with 588 parameter samples, last with 1705 samples

Effect of training set size Effect of the size of the training set of the pullback learning procedure on performance improvement Number of training sequences: 9, 27, 45, 90

Conclusions Represent actions as dynamical systems useful Classification based on distances between systems training set -> we can learn the “best” such metric formalism of pullback metrics design suitable family of diffeomorphism Done for AR2, HMMs, fit for actions Extension to classes of models more suitable for complex activities (VLMM,?) Analytical optimization for classification rate?

Learning pullback action manifolds Heriot Watt University, 26/5/2010 Fabio Cuzzolin Oxford Brookes Vision Group.

Similar presentations

Presentation on theme: "Learning pullback action manifolds Heriot Watt University, 26/5/2010 Fabio Cuzzolin Oxford Brookes Vision Group."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning pullback action manifolds Heriot Watt University, 26/5/2010 Fabio Cuzzolin Oxford Brookes Vision Group.

Similar presentations

Presentation on theme: "Learning pullback action manifolds Heriot Watt University, 26/5/2010 Fabio Cuzzolin Oxford Brookes Vision Group."— Presentation transcript:

Similar presentations

About project

Feedback