SecurePhone Workshop - 24/25 June 2004 1 Speaking Faces Verification Kevin McTait Raphaël Blouet Gérard Chollet Silvia Colón Guido Aversano.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

1 Gesture recognition Using HMMs and size functions.
Advanced Image Processing Student Seminar: Lipreading Method using color extraction method and eigenspace technique ( Yasuyuki Nakata and Moritoshi Ando.
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
CRICOS No J † CSIRO ICT Centre * Speech, Audio, Image and Video Research Laboratory Audio-visual speaker verification using continuous fused HMMs.
Human Identity Recognition in Aerial Images Omar Oreifej Ramin Mehran Mubarak Shah CVPR 2010, June Computer Vision Lab of UCF.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Introduction To Tracking
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Face Recognition & Biometric Systems, 2005/2006 Face recognition process.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Motion Tracking. Image Processing and Computer Vision: 82 Introduction Finding how objects have moved in an image sequence Movement in space Movement.
Instructor: Dr. G. Bebis Reza Amayeh Fall 2005
Motion Detection And Analysis Michael Knowles Tuesday 13 th January 2004.
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
Rodent Behavior Analysis Tom Henderson Vision Based Behavior Analysis Universitaet Karlsruhe (TH) 12 November /9.
Object Detection and Tracking Mike Knowles 11 th January 2005
ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer.
Augmented Reality: Object Tracking and Active Appearance Model
Comparison and Combination of Ear and Face Images in Appearance-Based Biometrics IEEE Trans on PAMI, VOL. 25, NO.9, 2003 Kyong Chang, Kevin W. Bowyer,
Multi-camera Video Surveillance: Detection, Occlusion Handling, Tracking and Event Recognition Oytun Akman.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Presented by Pat Chan Pik Wah 28/04/2005 Qualifying Examination
Smart Traveller with Visual Translator. What is Smart Traveller? Mobile Device which is convenience for a traveller to carry Mobile Device which is convenience.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Visual Speech Recognition Using Hidden Markov Models Kofi A. Boakye CS280 Course Project.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.
Oral Defense by Sunny Tang 15 Aug 2003
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Multimodal Interaction Dr. Mike Spann
Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)
TP15 - Tracking Computer Vision, FCUP, 2013 Miguel Coimbra Slides by Prof. Kristen Grauman.
Abstract Developing sign language applications for deaf people is extremely important, since it is difficult to communicate with people that are unfamiliar.
Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Alvina Goh Reading group: 07/06/06.
1 ECE 738 Paper presentation Paper: Active Appearance Models Author: T.F.Cootes, G.J. Edwards and C.J.Taylor Student: Zhaozheng Yin Instructor: Dr. Yuhen.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Introduction EE 520: Image Analysis & Computer Vision.
Multimodal Information Analysis for Emotion Recognition
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Structure Discovery of Pop Music Using HHMM E6820 Project Jessie Hsu 03/09/05.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
1 University of Texas at Austin Machine Learning Group 图像与视频处理 计算机学院 Motion Detection and Estimation.
Chapter 5 Multi-Cue 3D Model- Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.
Motion Estimation using Markov Random Fields Hrvoje Bogunović Image Processing Group Faculty of Electrical Engineering and Computing University of Zagreb.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Gait Analysis for Human Identification (GAHI)
Particle Filtering for Geometric Active Contours
Dynamical Statistical Shape Priors for Level Set Based Tracking
Multimodal Caricatural Mirror
Paper Reading Dalong Du April.08, 2011.
Announcements Project 4 out today Project 2 winners help session today
EE 492 ENGINEERING PROJECT
The “Margaret Thatcher Illusion”, by Peter Thompson
Presentation transcript:

SecurePhone Workshop - 24/25 June Speaking Faces Verification Kevin McTait Raphaël Blouet Gérard Chollet Silvia Colón Guido Aversano

SecurePhone Workshop - 24/25 June Outline - Speaking faces verification problem - State of the art in speaking faces verification - Choice of system architecture - Fusion of audio and visual modalities - Initial results using BANCA database (Becars: voice only system)

SecurePhone Workshop - 24/25 June Problem definition -Detection and tracking of lips in the video sequence: -Locate head/face in image frame -Locate mouth/lips area (Region of Interest) -Determine/calculate lip contours coordinates and intensity parameters (visual feature extraction) -Other parameters: visible teeth, tongue jaw movement, eyebrows, cheeks etc… -Modelling parameters -Model deformation of lip (or other) parameters over time: -HMMs, GMMs… -Fusion of visual and acoustic parameters/models -Calculate likelihood of model relative to client/world model in order to accept/reject -Augment in-house speaker verification system (Becars) with visual parameters

SecurePhone Workshop - 24/25 June Limitations -Limited device (storage and CPU processing power) -Subject variability (aging, beard, glasses…), pose, illumination -Low complexity algorithms -Subspace transforms, learning methods -Image based approaches, hue colouration/chromaticity clues -Model based approaches

SecurePhone Workshop - 24/25 June Active Shape Models -Identification: based on spatio-temporal analysis of video sequence -Person represented by deformable parametric model of visible speech articulators (usually lips) with their temporal characteristics - Active Shape Model consists of shape parameters (lip contours) and greyscale/colour intensity (for illumination) -Model trained on training set using PCA to recover principal modes of deformation of the model - Model used to track lips over time, model parameters recovered from lip tracking results - Shape and intensity modelled by GMMs, temporal dependencies (state transition probabilities) by HMMs -Verification: using a Viterbi algorithm, if estimation of likelihood of model generating the observed sequence of features corresponding to a client is above a threshold, then accept, else reject

SecurePhone Workshop - 24/25 June Active Shape Models -Robust detection, tracking & parameterisation of visual features -Statistical, avoids use of constraints, thresholds, penalties -Model only allowed to deform to shapes similar to those seen in training set (trained using PCA) -Represent object by set of labelled points representing contours, height width, area etc. -Model consists of 5 Bézier curves (B-spline functions), each defined as two end points P O and P 1 and one control point P 1 : P(t) = θ 0 (t)P 0 + θ 1 (t)P 1 + θ 2 (t)P 2 points distribution modelshape approximation

SecurePhone Workshop - 24/25 June Spatio-temporal model -Visual observation of speaker: O = o 1, o 2 …o T -Assumption: feature vectors follow normal distribution as in acoustic domain, modelled by GMMs -Assumption: temporal changes are piece-wise stationary and follow first order Markov process -Each state in HMM represents several consecutive feature vectors

SecurePhone Workshop - 24/25 June ASM: Training

SecurePhone Workshop - 24/25 June ASM: Tracking

SecurePhone Workshop - 24/25 June ASM: Lip Tracking Examples

SecurePhone Workshop - 24/25 June Image Based Approach -Hue and saturation levels to find lip region (ROI) -Eliminate outliers (red blobs) by constraints (geometric, gradient, saturation) -Motion constraints: difference image (1d) pixelwise absolute difference between two adjacent frames -a) greyscale image -b) hue image -c) binary hue/saturation threshholding -c) accumulated difference image -e) binary image after threshholding -f) combined binary image c AND e -Find largest connecting region

SecurePhone Workshop - 24/25 June Image Based Approach (2) -Derive lip dimensions using colour and edge information -Random Markov field framework to combine two sources of info and segment lips from background -Implementation close to completion

SecurePhone Workshop - 24/25 June Other Approaches -Deformable template/model/contour based: -Geometric shapes, shape models, eigen vectors, appearance models, deform in order to minimise energy/distance function relating to template paramaters and image, template matching (correlation), best fit template, active shape models, active appearance models, model fitting problem -Learning based approach: -MLP, SVMs… -Knowledge based approach: -Subject rules or information to find and extract features, eye/nose detection symmetry -Visual Motion analysis: -Motion analysis techniques, motion cues, difference images after thresholding and filtering -Optical flow, filter tracking (computationally expensive) -Hue and saturation threshholding -Intensity of ruddy areas, pb of removal of outliers -Image subspace transforms: -DCT, PCA, Discrete Wavelet, KLT (DWT + PCA analysis of ROI), FFT

SecurePhone Workshop - 24/25 June Fusion of audio-visual information -Instance of general classifier problem (bimodal classifier) -2 observation streams: audio + video providing info about hidden class labels -Typically each observation stream used to train a single modality classifier -Aim: combine both streams to produce bimodal classifier to recognise pertinent classes with higher level of accuracy -2 general types/levels of fusion: -Feature fusion -Decision fusion

SecurePhone Workshop - 24/25 June Feature Fusion -Feature fusion: HMM classifier, concatenated feature vector of audio and visual parameters – time synchronous features, possibly including upsampling) -Generation process of feature vector -Using single stream HMM with emission (class conditional observation) probabilities given by Gaussian distribution:

SecurePhone Workshop - 24/25 June Decision Fusion -State synchronous decision fusion -Captures reliability of each stream -HMM state level -combine single modality HMM classifier outputs -Class conditional log-likelihoods from the 2 classifiers linearly combined with appropriate weights -Various level: state (phone, syllable, word…) -multi-stream HMMs classifier, state emission probs: -Product HMMs, factorial HMMs… -Other classifiers (SVMs, Bayesian classifiers, MLP…)

SecurePhone Workshop - 24/25 June Banca: results