Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK Technical University of Crete Speech Processing and Dialog Systems Group.

Similar presentations


Presentation on theme: "Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK Technical University of Crete Speech Processing and Dialog Systems Group."— Presentation transcript:

1 Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos

2 TUC - SDSG Outline Prior Work –Adaptation –Acoustic Modeling –Robust Feature Selection Bridge over to HIWIRE work-plan –Robust Features, Acoustic Modeling, Adaptation –New areas: audio-visual, microphone arrays

3 TUC - SDSG Adaptation Transformation-based adaptation MAP Adaptation (Bayesian learning approximation) Speaker Clustering / Speaker space models. Robust Feature Selection Combinations

4 TUC - SDSG Acoustic Model Adaptation: SDSG Selected Work Constrained Estimation Adaptation Maximum Likelihood Stochastic Transformations Combined Transformation-MAP adaptation MLST Basis Vectors Incremental Adaptation Dependency modeling of biases Vocal Tract Norm. with Linear Transformation

5 TUC - SDSG Constrained Estimation Adaptation (Digalakis 1995) Hypothesize a sequence of feature-space linear transformations: Adapted models (A) are then: diagonal. Adaptation is equivalent to estimating the state dependent

6 TUC - SDSG Compared to MLLR (Leggeter 1996) Both published at the same time. MLLR is only model adaptation. MLLR transforms only the model means in MLLR is block diagonal. Constrained estimation is more generic.

7 TUC - SDSG Limitations of the Linear Assumption Linear assumption may be too restrictive in modeling the training testing dependency. Goal: Try a more complex transformation. All Gaussians in a class are restricted to be transformed identically using the same transformation. Goal: Let each Gaussian in a class to decide for its own transformation. Which transformation transforms each Gaussian is predefined. Goal: Let the system to automatically choose the transformation-Gaussian couples.

8 TUC - SDSG ML Stochastic Transformations (MLST) (Diakoloukas Digalakis 1997) Hypothesize a sequence of feature-space stochastic transformations of the form:

9 TUC - SDSG MLST: model-space Use a set of MLSTs instead of linear transformations. Adapted observation densities: –MLST-Method I is diagonal –MLST-Method II is block diagonal

10 TUC - SDSG MLST: Reduce the number of mixture components The adapted mixture densities consist of Gaussians. Reduce the Gaussians back to their SI number: –HPT: Apply the component transformation with the highest probability to each Gaussian. –LCT: Linear combination of all component transforms. –MTG: Merge the transformed Gaussians.

11 TUC - SDSG Schematic representation of MLST adaptation

12 TUC - SDSG MLST properties A sj, b sj are shared at a state or state-cluster level Transformation weights l j are estimated at a Gaussian level MLST combines transformed Gaussians MLST is flexible on how to select a transformation for each Gaussian. MLST chooses arbitrary number of transformations per class.

13 TUC - SDSG MLST compared to ML Linear Transforms Hard versus Soft decision: –Choose the linear component based on the training samples. Adaptation Resolution: –Linear components are common to a transformation class –Choose the transformation at a Gaussian level –Increased adaptation resolution - robust estimation

14 TUC - SDSG MLST basis transforms (Boulis Diakoloukas Digalakis 2000) Algorithm steps: –Cluster the training speaker space into classes –Train MLST component transforms using data from each training speaker class –Adaptation data is used to estimate the transformation weight It is like having a-priori knowledge to the estimation process Results in rapid speaker adaptation Significant gains for medium and small data sets

15 TUC - SDSG Combined Transformation Bayesian (Digalakis Neumeyer 1996) MAP estimation can be expressed as: Retain the asymptotic properties of MAP Retain fast adaptation rates of transformations.

16 TUC - SDSG Rapid Speech Recognizer Adaptation (Digalakis et.al 2000) Dependence models of the bias components of cascaded transforms. Techniques: –Gaussian multiscale process –Hierarchical tree-structured prior –Explicit correlation models –Markov Random Fields

17 TUC - SDSG VTN with Linear Transformation (Potamianos and Rose 1997, Potamianos and Narayanan 1998) Vocal Tract Normalization: Select optimal warping factor  according to  = arg max P(Xª|a,, H) where H is the transcription, and Xª frequency warped observation vector by factor a. VTN with linear transformation { ,  } = arg max P(Xª|a, ,, H) where h  () is a parametric linear transformation with parameter 

18 TUC - SDSG Acoustic Modeling: SDSG Selected Work Genones: Generalized Gaussian mixture tying scheme Stochastic Segment Models (SSMs)

19 TUC - SDSG Genones: Generalized Mixture Tying (Digalakis Monaco Murveit 1996) Algorithm Steps: –Clustering of HMM states based on the similarity of their distributions –Splitting: Construct seed codebooks for each state cluster Either identify the most likely mixture component subset Or cluster down the original codebook –Reestimation of the parameters using Baum-Welch Better trade-off between modelling resolution and robustness Genones are used in Decipher and Nuance

20 TUC - SDSG Segment Models HMM limitations: –Weak duration modelling –Conditional independence of observations assumption –Restrictions on feature extraction imposed by frame-based observations Segment models motivation: –Larger number of degrees of freedom in the model –Use segmental features –Model correlation of frame-based features –Powerful modelling of transitions and longer-range speech dynamics –Less distortion for segmental coding  segmental recognition more efficient

21 TUC - SDSG General Stochastic Segment Models A segment s in an utterance of N frames is s = {(τ a, τ b ): 1≤ τ a ≤ τ b ≤ N} Segment model density: Segment models generate a variable-length sequence of frames

22 TUC - SDSG Stochastic Segment Model (Ostendorf Digalakis 1992) Problem: Model time correlation within a segment Solution: Gaussian model variations based on assumptions about the form of statistical dependency –Gauss-Markov model –Dynamical System model –Target State model.

23 TUC - SDSG SSM Viterbi Decoding (Ostendorf Digalakis Kimball 1996) HMM Viterbi recognition: State to Word sequence mapping: SSM analogous solution: Map the segment label sequence to the appropriate word sequence:

24 TUC - SDSG From HMMs to Segment Models (Ostendorf Digalakis 1996) Unified view of stochastic modeling General stochastic model that encompasses most SM type models Similarities in terms of correlation and parameter tying assumptions Analogies between segment models and HMMs

25 TUC - SDSG Robust Feature Selection Time-Frequency Representation for ASR (Potamianos and Maragos 1999) Confidence Measure Estimation for ASR Features sent over wireless channels (“missing features”) (Potamianos and Weerackody 2001) AM-FM Model Based Features (Dimitriadis et al 2002)

26 TUC - SDSG Other Work Multiple source separation using microphone arrays (Sidiropoulos et al. 2001)

27 TUC - SDSG Prior Work Overview MLST. Constr. Est. Adapt. MAP (Bayes) Adapt. Genones Segment Models VTLN Combinations Robust Features

28 TUC - SDSG HIWIRE Work Proposal Adaptation Bayes optimal class. Audio Visual ASR Baseline experiments Microphone Arrays Speech/Noise Separation Feature Selection AM-FM Features Acoustic Modeling Segment Models

29 TUC - SDSG Bayes optimal classification (HIWIRE proposal) Classifier decision for a test data vector x test : Choose the class that results in the highest value:

30 TUC - SDSG Bayes optimal versus MAP Assumption: the posterior is sufficiently peaked around the most probable point MAP approximation: θ MAP is the set of parameters that maximize:

31 TUC - SDSG Why Bayes optimal classification Optimal classification criterion The prediction of all the parameter hypotheses is combined Better discrimination Less training data Faster asymptotic convergence to the ML estimate However: –Computationally more expensive –Difficult to find analytical solutions –....hence some approximations should still be considered

32 TUC - SDSG Segment Models Phone Transition modeling –New features Combine with HMMs Parametric modeling of feature trajectories

33 TUC - SDSG AM-FM Features See NTUA presentation

34 TUC - SDSG Audio-Visual ASR Baseline

35 TUC - SDSG Microphone Array Speech – Noise source separation algorithms


Download ppt "Feature Selection, Acoustic Modeling and Adaptation SDSG REVIEW of recent WORK Technical University of Crete Speech Processing and Dialog Systems Group."

Similar presentations


Ads by Google