# Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based.

## Presentation on theme: "Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based."— Presentation transcript:

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based speech synthesis Contents Speech Parameter Generation Using Dynamic Features Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis Multi-Space Probability Distribution HMM Simultaneous Modelling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 2 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Speech Parameter Generation using Dynamic Features (1) 1. Problem vector sequence of speech parameter state sequence of an HMM λ vector of speech parameter at time t c t … static feature vector (e.g. cepstral coefficient) Δc t … dynamic feature vector (e.g. delta cepstral coefficient)  Determine parameter sequence c that maximizes Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 3 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Speech Parameter Generation using Dynamic Features (2) 2. Solution to the Problem For a given q, maximizing,with respect to c is equivalent to maximizing, with respect to c, since does not depend on O. … single mixture Gaussian distribution By setting to maximize we obtain a set of equations with Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 4 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Speech Parameter Generation using Dynamic Features (3) 2.Solution to the Problem … mean vector of c t … covariance matrix of c t … mean vector of Δc t … covariance matrix of Δ c t  To obtain optimal q and c the set of equations has to be solved for every possible state sequence Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 5 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Speech Parameter Generation using Dynamic Features (3) Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration 2.Solution to the Problem

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 6 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Speech Parameter Generation using Dynamic Features (4) Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [1], [2] [2] 3. Example

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 7 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Speech Parameter Generation Algorithms (1) Previously: state sequence was assumed to be known Now: (part of) state sequence is hidden Goal: determine speech parameter vector sequence so that is maximized with respect to O, where is the state and mixture sequence, i.e. (q, i) indicates the i-th mixture of state q  … speech parameter vector Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 8 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Speech Parameter Generation Algorithms (2) 1.Derived Algorithm  Based on EM-Algorithm  Find critical point of the likelihood function P(O| λ) … auxiliary function of current and new parameter vector sequence O and O’, respectively Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration … occupancy probability

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 9 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Speech Parameter Generation Algorithms (3) Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration 1.Derived Algorithm  Under condition O’=WC’, C’ which maximizes Q(O,O’) is given by  Solved with recursive algorithm for dealing with dynamic features

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 10 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Speech Parameter Generation Algorithms (4) 2.Example  5-state left-to-right HMMs  Speech sampled at 16 kHz & windowed by 25.6ms Blackman window with 5ms shift  Mel-cepstral coefficients obtained by mel-cepstral analysis  Sentence fragment “kiNzokuhiroo” Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [3]

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 11 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Observation o 1 consists of a set of space indices X 1 {1,2,G} and a 3D vector x 1 R³ x 1 is drawn from one of three spaces Ω 1, Ω 2, Ω 3 R³ Pdf for x1 : w 1 N 1 (x)+w 2 N 2 (x)+w G N G (x) Multi-Space Probability Distribution HMM (1) 1.Problem: We cannot apply both the conventional and continuous HMMs to observations which consist of continuous values and discrete symbols [4] Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [4]

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 12 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Multi-Space Probability Distribution HMM (2) 2.Example: A man fishing in a pond Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration Sample space Ω Ω 1 : 2D space for length and height of red fish Ω 2 : 2D space for length and height of blue fish Ω 3 : 1D space for diameters of tortoises Ω 4 : 0D space for articles of junk Weights w 1 to w 4 are determined by ratio of blue and red fish, tortoises and junk in the pond N 1, N 2 : 2D pdfs for sizes and heights of fish N 3 : 1D pdf for diameter of tortoise Man catches red fish: observation o=({1},x) is made Same by night: o=({1,2},x) Ω 1 =R² Ω 2 =R² Ω 3 =R¹ Ω 4 =Rº 4

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 13 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Multi-Space Probability Distribution HMM (3) 3.Algorithm Each state i has G pdfs N iG and their weights w iG The observation probability of O, P(O| λ) is calculated with the forward-backward algorithm Then, we need to maximize the observation likelihood of P(O| λ) over all parameters  reestimation formulas for the maximum likelihood estimation are calculated in analogy to the Baum-Welch-Algorithm Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [4]

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 14 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models 1.Simultaneous Modelling Pitch patterns modelled by MSD-HMM  observation sequence composed of continuous values and discrete symbols Spectrum modelled by continuous probability distribution State duration densities modelled by single Gaussian distributions (dimension of SDD equal to number of HMM states) Simultaneous modelling of Spectrum, Pitch and Duration (1) Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [5]

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 15 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Simultaneous modelling of Spectrum, Pitch and Duration (2) 2.Context Dependent Model Many contextual factors taken into account such as: Position of breath group in sentence Position of current phoneme in current accentual phrase {preceding, current succeeding} part of speech {preceding, current succeeding} phoneme (etc.) Decision-tree based context clustering applied because As contextual factors increase, their combinations increase exponentially  model parameters cannot be estimated with sufficient accuracy by limited training data Impossible to prepare speech database which includes all combinations of contextual factors Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 16 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Simultaneous modelling of Spectrum, Pitch and Duration (3) 2.Context Dependent Model spectrum, pitch and duration have their own influential contextual factors  distributions of the respective parameters are clustered independently Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [5]

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 17 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Simultaneous modelling of Spectrum, Pitch and Duration (4)  arbitrary text converted to context- based label sequence  sentence HMM constructed according to label sequence  State durations determined so as to maximize likelihood of state duration densities  Sequence of mel-cepstrum coefficients and pitch values generated  Speech synthesized directly by MLSA filter Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [5] 3. Text-to-Speech Synthesis System

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 18 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models References [1] K. Tokuda, T. Kobayashi and S. Imai, Speech Parameter Generation from HMM using Dynamic Features, Proc. ICASSP, 1995 [2] http://hts.sp.nitech.ac.jp/?Publications – see ‘Attach file’  tokuda_TTSworkshop2002.pdfhttp://hts.sp.nitech.ac.jp/?Publicationstokuda_TTSworkshop2002.pdf [3] Tokuda, Yoshimura, Masuko, Kobayashi, Kitamura, Speech Parameter Generation Algortihms for HMM- based Speech Synthesis, ICASSP, 2000 [4] Tokuda, Masuko, Miyazaki, Kobayashi, Multi-Space Probability Distribution HMM, IEICE Trans. Inf. & Syst., 2000

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 19 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models References [5] Yoshimura, Tokuda, Masuko, Kobayashi, Kitamura, Simultaneous modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis, Proc EU- ROSPEECH, 1999

Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 20 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Thanks for the attention

Download ppt "Advanced Signal Processing 2, SE Professor Horst Cerjak, 19.12.2005 1 Andrea Sereinig Graz, 2008-29-04 Basics of Hidden Markov Models Basics of HMM-based."

Similar presentations