Multimodal Interaction Dr. Mike Spann

Multimodal Interaction Dr. Mike Spann m.spann@bham.ac.uk http://www.eee.bham.ac.uk/spannm

Contents Introduction Lip feature extraction and tracking Summary

Lip feature extraction and tracking Lip feature tracking is an important in combining audio and visual cues for speech recognition systems Typically the lip boundaries (inner/outer/both) are tracked and shape features passed to the speech recognition module Previous approaches Active contour model (snakes) Energy function minimisation used to control contour shape (curvature) and local greylevel (colour) gradient Can be dependant on weighting parameters which need to be tuned

Lip feature extraction and tracking Typically an energy function E is defined in terms of the parameterised snake v(s)=(x(s),y(s)) where s is the distance along the snake: The first two terms represent the snake’s internal energy and control it’s tension and rigidity The third term attracts the snake to object boundaries with high greylevel gradient Often an additional term is added for a ‘balloon’ snake to either inflate or deflate the snake

Lip feature extraction and tracking

More recent approaches to lip localisation and tracking have been model-based A statistical shape model of the inner and outer lip contours can be built from training data Landmarks on the contour form pointsets: We need to align the pointsets and then build a statistical model using PCA

Lip feature extraction and tracking Pointsets of lip feature landmarks must be normalized for translation, scale and rotation We can use a simple iterative algorithm to align to the mean pointset

Lip feature extraction and tracking PCA is based on the mean and covariance of the pointset vectors computed across the training set: We then compute our shape model by solving the eigenvector/eigenvalue equation: where Λ is a diagonal matrix of eigenvalues :

Lip feature extraction and tracking We can represent each landmark pointset x by a corresponding shape vector b The set of b i ’s across all of the pointsets in the database represents the i th mode of variation of the original data We can vary each b i to get realistic versions of lip shapes Typically for the i th eigenvalue λ i :

An active shape model sample greylevels perpendicular to the lip contour and centred at the model points

Lip feature extraction and tracking We sample the profiles perpendicular to each model point j Training image i then gives us a vector of greylevels g ij We concatenate all these greylevel vectors to give us a global profile vector h i We build a statistical model out of these profile vectors to enable the main modes of variation of the profiles about the model boundaries to be computed

Lip feature extraction and tracking The weight vectors b h can be used as a parameter in a cost function to determine how well the actual profile fits the model

Lip feature extraction and tracking The greylevels between profile vectors can be interpolated to visualise the greylevel models Some smoothing using a median filter helps remove any artefacts of the interpolation We can visualise several modes corresponding to the first few eigenvectors The corresponding components of the weight vector b h can be varied according to: For example we can set b hi to ±2√λ i for i=1,2,3

Lip feature extraction and tracking Mode 1 Global illumination differences Mode 2 Lower/Upper lip intensity difference Mode 3 Skin/lip contrast differences Higher modes Illumination variations, visibility of teeth and tongue etc

Lip feature extraction and tracking In order to apply an ASM search algorithm, a coarse estimate of the region of interest containing the lips region is found Can be input interactively or computed automatically using segmentation or edge-based feature extraction algorithm Provides an estimate of the scale of the lips Limits the search area

In order to use the greylevel and shape models in a search algorithm we can use the greylevel model to best fit the model greylevel profile to the current greylevel profile Shape and pose parameters can then be updated We need a cost function which describes the fit between the model greylevel profile and the profile measured in the image at the current model position Several statistical approaches possible Maximizing the probability assuming Gaussian distributions Minimizing the mean square error between the profiles

Current model position Sample profile h

Lip feature extraction and tracking We can define a error function E defining the mismatch between the actual profile h measured at the current position estimate and our model profile h m : Substituting for h m : Typically h m would comprise only the first few modes of variation

Lip feature extraction and tracking The model is initialized with the mean shape computed over aligned shapes in the training set Our goal is to minimize our energy function E in terms of translation vectors t x and t y, a scale parameter s and a rotation angle θ along with the profile parameter vector :

Lip feature extraction and tracking Optimization is carried out by perturbing individual parameters and evaluating their effects on the energy function E Typically only a few (typically 10-20) shape modes are used in the search to ease the computational burden Perturbations in b i are limited to: For a given position of the model landmarks, the profile h is sampled and b h computed according to:

Lip feature extraction and tracking We can devise an iterative algorithm to update the pose and shape parameters sequentially based on our error measure The algorithm alternates between ‘model space’ and ‘image space’ The object boundary in model space is defined by the shape parameters We can use the greylevel or colour profile information to measure the error in image space Conversion between the two spaces is done via the pose parameters

Lip feature extraction and tracking Model space - b Image space - b h

Lip feature extraction and tracking 1.Initialize the shape parameters b to zero and image points y 2.Generate the model point positions: 3. Find the pose parameters t x,t y, s, θ to best fit the model points to the image points y 4.Project the model points into the image frame x->T(x), compute the image profile vector h and at each projected model point, search normal to the model boundary and find the image points y’ which minimize E to produce new image profile vector h’

Lip feature extraction and tracking 6. Project the image points y’ into the model coordinate frame by inverting the transformation T 7. Update the model parameters 8. If not converged y->y’. Go to step 2

Lip feature extraction and tracking Image boundary Model point Nearest image point to model point

Lip feature extraction and tracking Its easier to track the outer lips than the inner ones More constant greylevel profile Easier to model for example with application to active shape modelling But, less appropriate for lip gesture recognition and speech recognition algorithms Often using a full appearance model rather than just a shape model gives better speech recognition performance For example the teeth and tongue appearance give clues to particular types of vocal sounds

Lip feature extraction and tracking Results of off centre initialization of ASM using local greylevel profiles after 5, 10, 20, iterations

Lip feature extraction and tracking Results using ASM search with local greylevel profiles

Demo http://www.ee.surrey.ac.uk/Projects/M2VTS/experimen ts/lip_tracking/index.html http://www.ee.surrey.ac.uk/Projects/M2VTS/experimen ts/lip_tracking/index.html

Summary We have looked at a shape model and a model describing greylevel or colour variation local to the shape model landmark positions can be used for finding the lip contour location in face images We have described an iterative model-based search algorithm for lip contour location We have shown lip tracking results based on this algorithm

Multimodal Interaction Dr. Mike Spann

Similar presentations

Presentation on theme: "Multimodal Interaction Dr. Mike Spann"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multimodal Interaction Dr. Mike Spann

Similar presentations

Presentation on theme: "Multimodal Interaction Dr. Mike Spann"— Presentation transcript:

Similar presentations

About project

Feedback