Multimodal Interaction Dr. Mike Spann

Multimodal Interaction Dr. Mike Spann m.spann@bham.ac.uk http://www.eee.bham.ac.uk/spannm

Contents Statistical texture models A PCA-based appearance model Active appearance models Summary

Statistical texture models We have seen how we can represent a statistical shape model using the distribution of landmark points over a training set of shapes A shape representation of an object will often be sufficient in many machine vision applications For example, in hand gesture recognition But in most applications, a representation of the overall object appearance will be required For example, in face recognition

Statistical texture models To get a statistical appearance model, we have to characterize the variation of colour (or greylevel) across the surface of the object We can use PCA to do this as we did with our shape model Normally we refer to the colour variation as ‘texture’ Essentially we mean the patterns of colour variation over the object surface Whereas a shape model is just a list of 2D coordinates at a small number of points, a texture model is typically a list of (RBG) colour components at thousands of pixel position

Image Shape Texture

Statistical texture models Thus, in a similar fashion to modelling shape, we can represent a texture sample as a vector containing the colour at each pixel Each c i is the colour at pixel i which is in reality a 3- element vector of RGB components but we will keep the notation simple and imagine that for the moment it represents the brightness only

Statistical texture models We can label each texture sample c s for each training image s=1..S Typically the size of this vector will be many thousands of components and we will have a lot less training images than components in the vector (unlike the case for shape models) Also, two other problems arise Image normalization Image registration

Statistical texture models Normalization takes account of differing brightness and contrast levels of the training images We need to normalize them all so that the brightness and contrast are the same Rather similar to aligning the shapes (so effectively we are ‘normalizing’ the pose) We can think of it as ‘aligning’ the colour profiles We can use a similar iterative algorithm to our shape alignment algorithm

Statistical texture models

We also have to worry about registration of our training images Images need to be ‘warped’ to a mean shape so that the colour profiles of each training image are ‘correlated’ Otherwise, taking a mean profile results in a blurred image We take the mean shape to be that of our shape model after landmark alignment The warping algorithm is a simple piecewise affine warping computed from the mean and original landmark points

Landmarks Mean landmarks warp Warping matrix

Statistical texture models All training images are warped so that there colour profiles are obtained across a ‘shape free patch’ Thus all colour profiles landmark points coincide This enables a mean profile to be obtained whose landmarks are the mean shape landmarks in the shape model If this warping did not take place, the mean profile would not contain any facial features They would average out to approximately skin tone

Face training database Mean landmarks Training landmarks..... Warped profiles Mean profile Normalize profiles

Statistical texture models The mean profile is defined as : With warping Without warping

Statistical texture models We need to normalize our colour profiles to compensate for varying mean brightness and differing contrast The mean brightness for a training sample is expressed as : The contrast for a training sample is expressed as the colour (greylevel) standard deviation across the profile:

Statistical texture models Rather like the geometrical alignment algorithm, colour normalization is carried out using an affine transformation α controls the contrast level β controls the mean brightness We compute α and β so that the profile c is ‘aligned’ with a reference profile c r, typically the mean profile computed across the whole training set By aligned we mean that we choose α and β so that the following sum is minimized

Statistical texture models This problem is identical to our shape alignment problem except now there are only two parameters to determine and not six The solution is:

Statistical texture models We normally try and align our profiles with the mean profile But the mean profile is defined in terms of the aligned profile We need to implement an iterative alignment algorithm rather like we did for aligning our pointsets to the mean pointset Essentially we are trying to minimize the profile deviation from the mean profile across our training images

Statistical texture models We can define this deviation as: We would expect this to drop after we have more tightly aligned our profiles to the mean profile Reflects the fact that all profiles have now normalized their brightness and contrast values

‘Unaligned’ profiles ‘Aligned’ profiles

A PCA-based appearance model We can perform PCA on the set of colour profiles as we did for the landmark pointsets Typically the dimension of our vectors c s in this case is many thousands corresponding to the number of pixels in the template defined by the mean profile (in the case of face images) We will assume that the colour profiles have been normalized (aligned) and that we have computed the mean profile of these aligned profiles

A PCA-based appearance model As before, we define the covariance matrix as follows: The covariance matrix is of dimension n x n where n is of the order of several thousand To complete PCA analysis we have to determine the eigenvectors and eigenvalues of this large matrix There are some computational issues here: The large size of the matrix makes direct eigenvector computation difficult There are at most S non-zero values where typically S<<n

A PCA-based appearance model We can perform a simple trick of matrix algebra to drastically simplify our eigenvector computation Define matrix D as follows: The c i ’s are column vectors so matrix D has n rows and S columns Our normal (outer) n x n covariance matrix then becomes:

A PCA-based appearance model We also define an inner covariance matrix: B is an S x S matrix whose (ij) th element is: Usually S is small and eigenvector analysis on B is straighforward Let e i be the i th eigenvector of B with eigenvalue λ i

A PCA-based appearance model Its easy to show that if e i is an eigenvector of B then De i is an eigenvector of C: De i is one of S eigenvectors of C with non-zero eigenvalue All the other eigenvalues of C have zero eigenvector

A PCA-based appearance model As before define Φ to be the matrix whose columns are the eigenvectors of the covariance matrix of the colour profiles Our appearance model is now given in terms of a set of (texture) parameters b : c is the vector of normalized (and warped to the shape free patch) colour profiles

A PCA-based appearance model All of the concepts we introduced when we talked about shape models also apply to appearance models In particular we can look at the different modes of variation The variance of texture parameter b i is: We can look at the variation in facial appearance when we vary each b i (but restricting the variation of b i to generate shapes which resemble the training shapes) :

A PCA-based appearance model Demo\AAM Explorer\AAMExplorer.exe ImageJ demo

A PCA-based appearance model We can represent our shape and appearance model by our parameter vectors b s and b c Usually there is some correlation between the shape and appearance model For example, in a face image model, the facial gestures influence both the shape and the appearance For example, smiling alters the position of landmarks and shows teeth! We can combine our shape and appearance model vectors into an overall combined model

A PCA-based appearance model We can combine our parameter vectors into an overall parameter vector b: where: for landmark pointset x and colour profile c W s is a diagonal matrix of weights allowing for the difference in units between shape and appearance models

A PCA-based appearance model We can apply PCA to the set of overall parameter vectors b (which has zero mean): d is the vector of appearance parameters controlling both shape and colour (greylevel) Φ is the matrix of eigenvectors of the covariance of the parameter vectors b We write Φ in partitioned form as:

A PCA-based appearance model In practice we want to be able to construct our shape and colours in terms of our appearance parameters d We can easily do this using the partitioned form of matrix Φ:

A PCA-based appearance model We can then synthesis an image for a given set of appearance parameters d by generating the shape free colour or greylevel profile from vector c and then warping it using the landmark points defined by x Shape free profileLandmarked image Mean landmarks Warp matrix Warp

A PCA-based appearance model We can envisage such as system as the basis of a model based face compression system We can store appearance model data (Φ, Φ s, Φ c,W s ) and the mean landmarks and colour profile as our model Appearance model data Appearance parameters d Coder Shape and profile alignment parameters

A PCA-based appearance model MEng 4 project, 2006 Typical results Around 200 bytes of data required to compress a full RGB image representing a compression ratio of about 4000:1!

Active appearance models Active appearance models (AAM’s) attempt to find the best set of appearance model parameters d which best describe an image in some sense Its position, shape and greylevel or colour appearance Appearance modeld Further Processing eg. Face recognition

Active appearance models Given a parameter vector d we can synthesis a ‘model’ image I m and compare it with actual image I A difference image is computed in terms of the parameter vector d: The goal is to minimize |δI(d)| 2 by varying the appearance model parameter vector d:

Active appearance models The standard approach to this problem is to compute the variation of |δI(d)| 2 with respect to d: The appearance parameter vector (accounting for both location/shape and colour (greylevel) is a high dimensional vector e(d) is in general difficult if not impossible to compute explicitly The approach is to ‘learn’ the relationship between |∆I(d)| 2 and ∆d

Active appearance models We have seen how our shape x and colour (greylevel) profile c are represented in the model frame in terms of the parameter vector d: We can rewrite this as:

Active appearance models We have to be clear about comparing our images in the correct frame ‘Image frame’ The frame in which we take our image measurements ‘Model’ frame The frame in which the shape and colour (greylevel) models are generated from the model parameters d We will assume that we generate a shape in the image frame from the model frame by applying a linear transformation S involving a scaling, rotation and translation Let the linear colour (greylevel) transformation from model to image frame be T

Active appearance models Model profileModel shape Image shape Sx Sample colour/texture c’ Scale c’’->T -1 c’ and project to normalized frame c m ’’

Active appearance models We can write the difference between the model image profile and the actual measured image profile (projected to the normalized shape) as a function of the affine alignment parameters S and T and the appearance model parameter vector d: where: p represents the overall parameter vector and the task is to optimize e(p) with respect to p

Active appearance models We use a gradient based approach to optimize e(p): where: We can do a first order Taylor expansion of r(p): Note that the derivative term is a matrix whose ij th element is:

Active appearance models Such an optimization in a high dimensional space is usually an iterative search procedure where on each iteration we wish to minimize |r(p+δp)| 2 with respect to δp Equivalent to : Easy problem as p (and r(p))are assumed constant with respect to δp

Active appearance models The result is: where: At each iteration the derivative must be calculated Computationally expensive AAM search algorithms must make a simplifying assumption in order to be computationally feasible

Active appearance models The assumption is that R is approximately constant and independent of p The error is computed in the normalized model frame It can be approximated from the training set using numeric differentiation by displacing each p i and measuring the resulting ∆r The derivative can be averaged across the training set This assumption is only partially valid and accounts for the often poor performance of AAM search algorithms

Active appearance models An iterative algorithm is used to search for the best fitting model in an image Location, pose and shape/appearance model parameters For example, it can be used to find a face in an image given the model parameters of the face Implementation of active appearance search algorithms require a good initialization for them to work

Active appearance models

Summary We have looked at how we can build statistical models describing the variation of colour (or greylevel), often known as texture, across a region of an image We have seen how we can use PCA to describe this variation in terms of a model parameter resulting in an ‘appearance’ model We have seen how we can combine a shape model and an appearance model to produce an overall model described by a single parameter We have looked at a model search algorithm to fit a shape/appearance model to an image

Multimodal Interaction Dr. Mike Spann

Similar presentations

Presentation on theme: "Multimodal Interaction Dr. Mike Spann"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multimodal Interaction Dr. Mike Spann

Similar presentations

Presentation on theme: "Multimodal Interaction Dr. Mike Spann"— Presentation transcript:

Similar presentations

About project

Feedback