Presentation is loading. Please wait.

Presentation is loading. Please wait.

HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.

Similar presentations


Presentation on theme: "HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University."— Presentation transcript:

1 HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos

2 Outline  Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR  Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation  Work package 3 Task 1: Fixed platform integration

3 Blind Speech Separation (BSS) problem

4 : mixing impulse response matrix : spatial signature of the i-th speaker for lag τ : additive noise vector Objective: Estimate the inverse-channel impulse response matrix W(τ) from the observed signal L : Channel order Data Model – Problem Statement

5 BSS permutation problem  Permutation problem: “Order” of mics may be different in the solution for each frequency bin  To solve permutation combine Spatial constraints Continuity constraints in frequency domain  Solution to the permutation problem can be formulated using ILS minimization criterion

6 Recent progress  Improved solution to permutation problem Combining spatial and continuity constraints Trying out different continuity criteria  Created a synthetic database using typical room impulse responses  First ASR experiments using the “synthetic” database

7 Outline  Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR  Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation  Work package 3 Task 1: Fixed platform integration

8 Motivation  Combining classifiers/information sources is an important problem in machine learning apps.  Simple, yet powerful, way to combine classifiers is “multi-stream” approach; assumes independent information sources  Unsupervised stream weight computation for multi- stream classifiers is an open problem

9 Problem Definition Compute “optimal” exponent weights for each stream s [ HMM Gaussian mixture formulation; similar expressions for MM, naïve Bayes, Euclidean/Mahalonobois classifier] Optimality in the sense of minimizing “total classification error”

10 Optimal Stream Weights: Result I  Equal error rate in single-stream classifiers optimal stream weights are inversely proportional to the total stream estimation error variance

11 Optimal Stream Weights: Result II  Equal estimation error variance in each stream optimal weights are approximately inversely proportional to the single stream classification error

12 Recent Progress  Experiments with synthetic data Gaussian distribution classification problem) Results show good match with theoretical results  Experimental verification for Naïve Bayes classifiers utterance classification - NLP application  First experiments with “unsupervised” estimates of stream weights “Intra-class” based metrics on observations AV-ASR application

13 Outline  Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR  Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation  Work package 3 Task 1: Fixed platform integration

14 Dynamical System Segment Model  Based on linear dynamical system  Where x is state, y is observation, u control, w,v noise  The system parameters should guarantee Identifiability, Controllability, Observability, Stability  We investigated more generalized parameter structures

15  The system’s parameters have an identifiable canonical form F: “ones” in the superdiagonal; remaining with “zeros”. Row r i with free parameters (i=1,…,n) H: column dim. equal to F. Filled with “zeros”. Take r 0 =0 and then row i have a “one” in column r i-1 + 1. P, R: filled with free parameters.  Propose a novel element-wise estimation based on EM algorithm for systems identification. Generalized forms of parameter structures

16 Application on speech  Experiments on clean data from AURORA 2  11 word-models (one…nine, zero, oh)  No. of segments of each model depends on the No. of phones of the word-model  HTK for feature extraction (14 MFCCs)  Alignments taken by HTK using HMMs  4000 training sentences; 600 isolated words for testing

17 Results  Fig. (a) classification performance (using 3 different initializations)  Fig. (b) the log-likelihood is increasing for the same runs

18 Conclusions & Future Work  Developed new forms of Linear State-space models  Proposed a novel element-wise parameter estimation process  Performed training & classification on AURORA 2 based on speech segments and LDS  Results shown correlation between performance and initialization  In the future: investigation of optimal initialization Feature-segments alignment (through dynamic programming) Investigation of state space dimension

19 Outline  Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR  Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation  Work package 3 Task 1: Fixed platform integration

20 Vocal Tract Length Normalization.  Linear and Non-Linear Frequency Warping.  Multi-Parameter Frequency Warping.  Warping and Spectral Bias Addition by ML Estimation.

21 Linear and Non-Linear Warping: Analysis  An optimal warping factor a is computed (for each phoneme), so that the Euclidean spectral distance (MSE) is minimized, between the warped g(X) and the corresponding unwraped spectrum X. Optimization is achieved by full search  The mapped spectrum is warped according to this optimal warping factor.

22 Linear and Non-Linear Warping  Frequency Warping is implemented by re-sampling the spectral envelope at linearly and nonlinearly frequency indices, i.e. 1. Linear 2. Piece-Wise Non-Linear 3. Power

23 Multi-Parameter Frequency Warping. After the computation of the optimal warping factor, we explore alternative piecewise linear frequency warping strategies Bi-Parametric Warping Function (2pts)  Different warping factors are evaluated, for the low (F < 3 KHz) and high (F ≥ 3 KHz) frequencies. Four-Parametric Warping Function (4pts)  Different warping factors are evaluated for the frequency ranges, 0-1.5, 1.5-3, 3-4.5 and 4.5-8 KHz.

24 Reduction in MSE: Non-linear warping

25 Reduction in MSE: Multi-parametric warping

26 Reduction in MSE: Bias Removal and Multi-parametric warping

27 Ongoing work  Implementation of “phone-dependent” warping in HTK  Implementation of multi-parametric warping and bias removal in HTK

28 Outline  Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR  Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation  Work package 3 Task 1: Fixed platform integration

29 Optimal Bayes Adaptation Central problem is to determine Using Bayes rule we have 2 step process Obtain the priors from the SI models Compute the Likelihoods

30 Number of Dimensions (Cepstrum Coef) Number of Mixture Components 12M12M genone 1genone 2 Phone-Based Clustering Cluster the output distributions based on common central phone θ is every component of the above representation and stands for the prior

31 Our Implementation Computation of priors using : Computation of likelihoods by using Baum Welch algorithm and ML After computation of posterior probabilities we use smoothing Such techniques are:  Flooring  Uniform  Delta

32 Outline  Work package 1 Task 1:Blind Source Separation for ASR Task 2,5: Feature extraction and fusion Task 4: Segment models for ASR  Work package 2 Task 1,2: VTLN Task 2: Bayes optimal adaptation  Work package 3 Task 1: Fixed platform integration


Download ppt "HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University."

Similar presentations


Ads by Google