Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh 81271003 June 2005.

Similar presentations


Presentation on theme: "Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh 81271003 June 2005."— Presentation transcript:

1 Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh 81271003 rahimzad@ce.sharif.edu June 2005

2 System Description Inputs : Speech signal Outputs: Facial Animation A generic 3D face in MPEG4 standard Speech stream

3 Agenda MPEG4 Standard Speech Processing Different Approaches Learning Phase Face Feature Extraction Training Neural Networks Experimental Results Conclusion

4 MPEG4 Standard Multimeida Communication Standard 1999 / Moving Picture Expert Group High quality / Low bit rate Interaction of users with media Object Oriented Object Properties Scalable quality SNHC (Synthetic Natural Hybrid Coding) Synthetic faces and bodies

5 Facial Animation in MPEG4 FDP (Face Definition Parameters)  Shape 84 Feature Points  Texture FAP ( Face Animation Parameters)  For animating feature points  68 parameter  High level / Low level  Global and local parameters  FAP units

6 Face Definition Parametes

7 Face Animation Parameter Units

8 Speech Processing Phases: Noise Reduction  Simple noise Framing Feature Extraction Speech features: LPC,MFCC, Delta MFCC, Delta Delta MFCC Frame 1 Frame 2 Feature Vector X 1 Feature Vector X 2

9 Two Approaches Phoneme-Viseme Mapping Approaches Transitions among visemes Discrete phonetic units Extremely stylized Language dependent Acoustic-Visual Mapping Approaches Relation between speech features and facial expressions Functional approximation Language independent Neural networks and HMM : learning machines for mapping

10 Learning Phase Speaker Video Speech stream Feature Extraction Training NN FAP Extraction FAP Player

11 Face Feature Extraction Deformable template based approach Semi automatic Candid model A wire frame model For model based coding Parameterized 113 vertex 168 face

12 Candid Model Parameters of WFM  Global 3d Rotation, 2d Translation, Scale  Shape Units Lip Width, Eyes Distance, …  Action Units Lip Shape, Eyebrow, … Each parameter value is a real number Texture

13 New Face Generation

14 Transformation (a 1, b 1 ) P P*P*P*P*   O O*O*O*O* Y X Correspondences: (a 1, b 1 )  (x 1, y 1 ), (a 2, b 2 )  (x 2, y 2 ), (a 3, b 3 )  (x 3, y 3 ),    *    * (a 2, b 2 ) (a 3, b 3 ) (x 2, y 2 ) (x 3, y 3 ) (x 1, y 1 )  **** source target

15 Transformation (cont.)

16 New Face Generation

17 Model Adaptation Selecting Optimal Parameters Global Parameters: 3d Rotation, 2d Translation, Scale Lip Parameters:  Upper Lip  Jaw Open  Lip Width  Lip Corners Vertical Movements Full Search ( expensive ) Using Previous Frame Information

18 Lip Reading Using of color data to guess lip area Using extracted lip area to guess lip model parameters. Upper lip, jaw open, mouth width, lip corners Using related vertex of Candide model. Two regions from first frame: Lip regions Non lip regions

19 Lip Area Classification Fisher Linear Discriminant Simple Fast Two point sets X, Y in n dimensions m1 is projection of X on unit vector α m2 is projection of Y on unit vector α Find α that maximizes

20 Estimating Lip Parameters FLD is trained by first frames pixels rgb data of pixels HSV is better than RGB. Robust in different brightness conditions

21 Lip Area Classification A simple approach for estimating lip parameters. Column scanning Row scanning

22 Generating FAPs from model Generating FAP file from model FAP file format Trial and error approach Open source FAP players FAP and wave file as input

23 Training Neural Networks 60 videos as data set 45 sentences for train 15 sentences for test Multilayer Perceptrons One input layer, One hidden layer, One output layer Back propagation algorithm Nine neuron in output layer Five global parameters Four lip parameters

24 Training Neural Networks Four speech features LPC, MFCC, Delta MFCC, Delta Delta MFCC Six networks for each speech feature One feature vector as input  30, 60, 90 neuron in hidden layer Three feature vector as input  90, 120, 150 neuron in hidden layer frame rate Video : 25 fps Speech : 50 fps

25 Generating Results From NNs Generating four lip parameters for each frame

26 Assessment Criterion A performance metric to measure the predicted accuracy of audio-visual mapping Correlation Coefficients G is one if two vectors are equal k : frame number N : number of frames in the test set

27 Results For LPC Networks

28 Results For MFCC Networks

29 Results For Delta MFCC Networks

30 Results For Delta Delta MFCC Networks

31 Conclusion Speech driven facial animation is possible! Delta Delta MFCC has the best performance Using previous and next speech frames improves the performance. Using combination of different speech features

32 Future Works More train data Speaker independent train data Multi language Other speech features Combination of speech features Facial emotions HMM for storing the mappings

33 Thanks…


Download ppt "Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh 81271003 June 2005."

Similar presentations


Ads by Google