ESTIMATION OF A PHYSICAL MODEL OF THE VOCAL FOLDS VIA DYNAMIC PROGRAMMING TECHINQUES E. Marchetto 1, F. Avanzini 1, and C. Drioli 2 1 Dept. of Information.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Advances in Speech Synthesis
14 de Fevereiro de 2004, Instituto Sistemas e Robótica Emotion-Based Decision and Learning Bruno Damas.
Ye Lu  A bar with non-uniform cross-sectional area  Clamped to the mouthpiece at one end  Additional constraints provided by the mouthpiece.
Physical modeling of speech XV Pacific Voice Conference PVSF-PIXAR Brad Story Dept. of Speech, Language and Hearing Sciences University of Arizona.
Anna Barney, Antonio De Stefano ISVR, University of Southampton, UK & Nathalie Henrich LAM, Université Paris VI, France The Effect of Glottal Opening on.
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2014 – 35148: Continuous Solution for Boundary Value Problems.
Speech Group INRIA Lorraine
Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
Classification and Decision Boundaries
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Wave functions of different widths were used and then the results were analyzed next to each other. As is evident from the graphs below, the wider the.
Motion Analysis (contd.) Slides are from RPI Registration Class.
Spatial Interpolation
Job Release-Time Design in Stochastic Manufacturing Systems Using Perturbation Analysis By: Dongping Song Supervisors: Dr. C.Hicks & Dr. C.F.Earl Department.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Sparse Kernels Methods Steve Gunn.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Radial-Basis Function Networks
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
CpE- 310B Engineering Computation and Simulation Dr. Manal Al-Bzoor
Multiple-access Communication in Networks A Geometric View W. Chen & S. Meyn Dept ECE & CSL University of Illinois.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Chapter 13 Sequential File Processing. Master Files Set of files used to store companies data in areas like payroll, inventory Usually processed by batch.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
ALIGNMENT OF 3D ARTICULATE SHAPES. Articulated registration Input: Two or more 3d point clouds (possibly with connectivity information) of an articulated.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Index Building Overview Database tables Building flow (logical) Sequential Drawbacks Parallel processing Recovery Helpful rules.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Analysis of algorithms Analysis of algorithms is the branch of computer science that studies the performance of algorithms, especially their run time.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
1 Complex Images k’k’ k”k” k0k0 -k0-k0 branch cut   k 0 pole C1C1 C0C0 from the Sommerfeld identity, the complex exponentials must be a function.
A survey of different shape analysis techniques 1 A Survey of Different Shape Analysis Techniques -- Huang Nan.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Performance Comparison of Speaker and Emotion Recognition

CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
IEEE AI - BASED POWER SYSTEM TRANSIENT SECURITY ASSESSMENT Dr. Hossam Talaat Dept. of Electrical Power & Machines Faculty of Engineering - Ain Shams.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Building Adaptive Basis Function with Continuous Self-Organizing Map
Vocoders.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Linear Prediction.
Parametric calibration of speed–density relationships in mesoscopic traffic simulator with data mining Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2009/10/20.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
K Nearest Neighbor Classification
Hidden Markov Models Part 2: Algorithms
Physical Database Design
Objective of This Course
Multi-Objective Optimization
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
COSC 4335: Other Classification Techniques
Searching CLRS, Sections 9.1 – 9.3.
EE513 Audio Signals and Systems
Synthesis of Motion from Simple Animations
Physics-guided machine learning for milling stability:
Deep neural networks for spike sorting: exploring options
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

ESTIMATION OF A PHYSICAL MODEL OF THE VOCAL FOLDS VIA DYNAMIC PROGRAMMING TECHINQUES E. Marchetto 1, F. Avanzini 1, and C. Drioli 2 1 Dept. of Information Engineering, University of Padova, Italy 2 Dept. of Computer Science, University of Verona, Italy MAVEBA 2007 Firenze, Dec. 2007

Summary The physical model and its control A codebook between articulatory vectors and acoustical vectors The inverse problem and its codebook Non-univocity issue, cost function and dynamic programming Applications of the RBFNs by clustering Results with a resynthesis example Conclusions

The physical model We refer to the two- mass vocal folds model presented in [1] One-dimensional, quasi-stationary and incompressible flow Time-varying separation point Vocal tract modeled as an inertive load [2]

Control of the physical model Low-level physical parameters are not independently controlled by a speaker: more physiologically motivated control spaces are needed. In [4] a set of rules, derived from [3], was used to control a two-mass model. The rules link vocal fold geometry to the activation levels of three muscles: cricothyroid, thyroarytenoid and cricoarytenoid. We also consider the subglottal pressure. Values normalized in [0-1], except in [ ]kPa. The physical model is completely controlled by a set of only four articulatory parameters:

The direct codebook The glottal pulse is characterized by means of a set of well-known acoustic parameters:  Foundamental frequency (F0)  Open, Speed, Return Quotients (OQ, SQ, RQ)  Normalized Amplitude Quotient (NAQ) Direct codebook as a Dictionary:  Articulatory vectors are the keys  Acoustical vector are the values  Only one value for each key Articulatory vectorAcoustical vector

The direct codebook Large number of numerical simu- lations of the two-mass model (about 100k) vectors in the codebook The figure shows the distributions of the acoustical parameters

The inverse problem Given a glottal flow we want to estimate the articulatory vectors which, used as input to the simulator, lead to a re-synthesis of the given glottal flow The problem is in principle non-unique We build an inverse codebook  Each acoustical vector is associated to one or more articulatory vectors How to tackle the non-uniqueness problem during the inverse lookup process? Dynamic programming techniques

Dynamic programming Rather than work on single vectors, we sub- divide the acoustical input sequence in frames In each frame we find the optimal sequence of articulatory vectors by minimizing a cost function Three terms:  Acoustical distance between input vector and its discretized companion in the codebook  Articulatory effort: distance between each consecutive articulatory vector in the output sequence  Accumulation term: provides a way to find the global minimum for the entire frame, but causes exponential complexity

Dynamic programming We have N acoustical vectors in the frame, each associated with V k possible articulatory vectors Lookup process in brief (for each frame):  Forward: Compute the cost function for each path  Backward: Minimize the cost function and choose the optimal output sequence for the frame Dynamic Prog. cuts down the complexity from expo- nential to polynomial  Exploiting the optimal sub- structure we are able to store many values instead of recalculate them

RBFNs are defined for functions, not for multi- maps Radial Basis Function Networks A way to interpolate the articulatory space  The input vectors are rarely present in the codebook  The output can only be the nearest approximation We apply the RBFNs to interpolate from the acoustical space to the articulatory one[5]: Need to overcome the non-uniqueness

Clusters and subclusters The algorithm avoids the non- uniqueness problem  Subdivide the acoustical space in clusters  Associate to each cluster one or more subclusters in the articulatory space Cluster Acoustical space Subcluster Articulatory space Subclusters are built joining the nearest vectors  Find a sort of hyperplanes in the articulatory space and put together the nearest vectors  Create as many subclusters as are necessary to put every non-unique vector in a different subcluster

Results We apply the descripted techniques to a complete resynthesis example The process in brief:  Starting from a recorded utterance, we estimate the glottal flow and characterize it by means of the acoustical parameters before descripted  The obtained vectors are used as input for dynamic programming and eventually RBFNs  The output articulatory vectors drive the numerical simulator, which outputs a full synthetic flow  Filtering the obtained flow with tempo-variant formants (from recorded utterance) we are able to obtain the resynthetized speech

Results / Articulatory vectors About 160 vectors retrieved by dynamic programming. Notice the smoothness of the RBFNs vectors. Without RBFNs With RBFNs Legend

Results / Acoustical vectors Without RBFNs With RBFNs Legend Reference Comparison between the reference (input) acoustical vectors and the ones obtained by a look-up in the direct codebook using the vectors of the previous slide as keys

Conclusions We develop an effective approach to cope with the inverse problem, with reference to the glottal source The cost function seems to adequately model the physiological facts RBFNs have proved as a good tool in this context, but some work remains to be done (weights determination and other peculiarities) The resynthesis is perceptually good  Also the time-varying vectors are almost well followed Usually NAQ is followed with good accuracy  We recall the relation between NAQ and voice quality

References [1] N. J. C. Lous, G. C. J. Hofmans, R. N. J. Veldhuis, and A. Hirschberg, “A symmetrical two-mass vocal-fold model coupled to vocal tract and trachea, with application to prothesis design”, Acta Acustica united with Acustica, vol. 84 pp , 1998 [2] I. R. Titze and B. H. Story, “Acoustic interactions of the voice source with the lower vocal tract”, J. Acoust. Soc. Am., vol. 101(4) pp , Apr [3] -, “Rules for controlling low-dimensional vocal fold models with muscle activation”, J. Acoust. Soc. Am., vol. 112(3) pp , Sep [4] F. Avanzini, S. Maratea and C. Drioli, “Physiological control of low-dimensional glottal models with applications to voice-source parameter matching”, Acta Acustica united with Acustica, vol. 92 suppl. 1 pp , Aug [5] T. Poggio and F. Girosi, “Networks for approximation and learning”, Proceedings of the IEEE, vol. 78(9) pp , Sep. 1990