Presentation is loading. Please wait.

Presentation is loading. Please wait.

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.

Similar presentations


Presentation on theme: "RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif."— Presentation transcript:

1 RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005 Computer Engineering Department, Sharif University of Technology

2 Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

3 Computer Engineering Department Sharif University of Technology
Effect of Noise on ASR Two phase in most ASR systems Train Operating (Testing) Mismatch causes reduction in accuracy Mismatch occur because of Environment Microphone, babble, distance, transmission canal Speaker Specific speaker: speed,… Various speakers: gender, age, accent,… Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

4 Computer Engineering Department Sharif University of Technology
Effect of Noise on ASR Noise Additive noise Babble, car, subway Exhibit, office, … Convolutional Noise Canal, telephone line Microphone effect Distance of speaker to microphone Others Lombard noise, Reflection of building noise Stationary Non-stationary Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

5 Computer Engineering Department Sharif University of Technology
Effect of Noise on ASR Simple model Robust Speech Recognition is the study of building speech recognition that handle mismatch condition. Convolutional noise Corrupted Speech Additive noise Clean Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

6 Computer Engineering Department Sharif University of Technology
Robustness Methods Signal Speech enhancement Feature Robust feature extraction Model Change of the model parameters Model training Training phase Testing phase Speech Signal Features Model Feature Extraction Training Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

7 Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

8 Mel-Frequency Cepstral Coefficient
Compute magnitude-squared of Fourier transform Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution Take log of outputs ( for RCC we take root instead of log) Compute cepstral using discrete cosine transform Smooth by dropping higher-order coefficients Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

9 Computer Engineering Department Sharif University of Technology
Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: Delta Feature: first and second order differences; regression Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

10 Perceptual Linear Prediction (PLP)
Compute magnitude-squared of Fourier transform Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution Apply compressive nonlinearities Compute discrete cosine transform Smooth using autoregressive modeling Compute cepstral using linear recursion Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

11 Computer Engineering Department Sharif University of Technology
PLP (Cont.) Algorithm Intensity-Loudness Conversion Inverse DFT Find Autoregressive Coefficients All pole model Critical Band Analysis Equal Loudness Pre-Emphasis Speech signal Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

12 RelAtive SpecTral Analysis
Which makes PLP (and possibly also some other short-term spectrum based techniques) more robust to linear spectral distortions The new spectral estimate is less sensitive to slow variations in the short-term spectrum Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

13 Computer Engineering Department Sharif University of Technology
RASTA (Cont.) Algorithm SPECTRAL ANALYSIS Bank of Compressing Static Nonlinearities Bank of Linear Band pass Filters Bank of Expanding Static Nonlinearities OPTIONAL PROCESSING SPEECH SIGNAL Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

14 Computer Engineering Department Sharif University of Technology
RASTA-PLP Algorithm Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

15 Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

16 RCC-Mean Normalization
Root Cepstral Coefficients (RCC) Derived using root compression rather than log compression on the filterbank energies Advantage of RCC to MFCC More immune to noise Faster decoding Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

17 RCC-Mean Normalization
If we approximate root with logarithm Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

18 Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

19 Computer Engineering Department Sharif University of Technology
Experiment 1 Database TFARSDAT 64 Speakers 8 hours telephony speech data ASR Sharif ASR System HMM based Training: Segmental K-means Search: Beam Viterbi Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

20 Computer Engineering Department Sharif University of Technology
Experiment 1 Test results Accuracy Correctness% MFCC % 54.97 % 59.32 MFCC_CMS % 51.62 % 56.63 RASTA_PLP % 58.38 % 65.59 RCC % 55.67 % 59.85 RCC_MN % 56.89 % 64.31 Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

21 Computer Engineering Department Sharif University of Technology
Experiment 2 Aurora 2.0 Noisy connected digits recognition 4 hours training data, 2 hours test data in 70 Noise Types/SNR conditions HTK HMM based Model for each digit 16 states with 3 Gaussian mixtures Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

22 Computer Engineering Department Sharif University of Technology
Experiment 2 Average results on AURORA Average obtained on various SNRs of a noise Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

23 Computer Engineering Department Sharif University of Technology
Experiment 2 Subway noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

24 Computer Engineering Department Sharif University of Technology
Experiment 2 Babble noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

25 Computer Engineering Department Sharif University of Technology
Experiment 2 Car noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

26 Computer Engineering Department Sharif University of Technology
Experiment 2 Exhibition noise in various SNRs Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

27 Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

28 Computer Engineering Department Sharif University of Technology
Summery Various robust features was tested Introduce of RCC_MN In first experiment RASTA-PLP Although RCC_MN is good In second experiment RCC_MN Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

29 Computer Engineering Department Sharif University of Technology
Outline Introduction Feature based methods MFCC, RCC, CMN, PLP, RASTA Mean Normalization Root Cepstral Coefficients Experimental Results Experiment 1 – Sharif CSR and TFARSDAT Database Experiment 2 – HTK CSR and AURORA 2 Database Summery Wednesday, February 18, 2005 Computer Engineering Department Sharif University of Technology

30 Thanks for your patience !


Download ppt "RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif."

Similar presentations


Ads by Google