1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

Slides:



Advertisements
Similar presentations
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Advertisements

Building an ASR using HTK CS4706
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Speech Recognition with Hidden Markov Models Winter 2011
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Speaker Adaptation for Vowel Classification
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
A PRESENTATION BY SHAMALEE DESHPANDE
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Representing Acoustic Information
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
Introduction to Automatic Speech Recognition
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Isolated-Word Speech Recognition Using Hidden Markov Models
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
Jacob Zurasky ECE5526 – Spring 2011
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Spoken Digit Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION
feature extraction methods for EEG EVENT DETECTION
Speech recognition, machine learning
Measuring the Similarity of Rhythmic Patterns
Speech recognition, machine learning
Presentation transcript:

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Spectral/Temporal Acoustic Features for Automatic Speech Recognition Stephen A. Zahorian, Hongbing Hu, Jiang Wu Department of Electrical and Computer Engineering Binghamton University November 16th, 2010

2 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Overview of talk  Background/Introduction  Review of traditional spectral/temporal features  DCTC/DCS features  Experimental results  Conclusions

3 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Most Typical Speech Features for ASR  Spectral Features (Static Features)  Represent the vocal tract information  MFCCs (Mel-Frequency Cepstral Coefficients)  Temporal Features (Dynamic Features)  Capture time variation (trajectory) of spectral features  Delta and Delta-Delta terms of MFCCs

4 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York MFCCs ( Mel-Frequency Cepstral Coefficients )  Mel-Frequency Scale  The coefficients c i are calculated from the log filter-bank amplitudes using the Cosine transform Mel scale filter banks (20) N: Number of banks m j : Log amplitudes

5 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Speech Recognition Architecture Recognizer (HMM/NN) ini:dsil e I need a Speech Waveform Feature Extraction Speech Features Phonemes Words Classification (Recognition) Classification (Recognition)

6 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Hidden Markov Models (HMMs)  Speech vectors are generated by a Markov model  The overall probability is calculated as the product of the transition and output probabilities  Likelihood can be approximated by only considering the most likely state sequence

7 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York DCTC Features  Discrete Cosine Transform Coefficients (DCTCs)  Given the spectrum X with the frequency f normalized to a [0, 1] range, the ith DCTC is calculated: First 3 DCTC basis vectors Basis vector : a(X): nonlinear amplitude scaling (log) g(f): nonlinear frequency warping (Mel- like function)

8 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York DCS Features  Discrete Cosine Series Coefficients (DCSCs)  Represent the spectral evolution of DCTCs over time and encode the modulation spectrum Basis vectors: h(t): time “warping” function—non- uniform time resolution First 3 DCSC basis vectors

9 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Example  Original spectrogram, and its rebuilt version with different selection of features. Original spectrogram Rebuilt with 13 DCTC and 3 DCS terms Rebuilt with 8 DCTC and 5 DCS terms

10 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York DCTC/DCS Computation z DCS1DCS2DCS3 DCTC 1 DCTC 2 DCTC 3 DCTC 4 DCTC 5 Frame LengthBlock Length Spectrogram DCTC/DCS Features

11 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Experimental Evaluation  Database  Recognizer: HMMs  Left-to-right Markov models with no skip  48 monophone HMMs are created using the HTK toolkit  Bigram phone information was used as the language model  Cambridge University/Microsoft HTK toolkit (Ver3.4)  Provide powerful tools for data preparation, HMM training and testing, result analysis TIMIT database (“SI” and “SX” only) PhonemeReduced 48 phone set mapped down from the TIMIT 62 phone set Training data3696 sentences (462 speakers) Testing data1344 sentences (168 speakers)

12 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Experimental Evaluation  TIMIT database  630 total speakers, 10 sentences each  462 speakers for training, 168 test speakers  3 state HMM phone models  Results given as phone accuracy for 39 “standard” phone categories  Number of mixtures per state “relatively” high to maximize accuracy

13 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Evaluation with Static Only Features  Vary frame length from 5 ms to 30ms (5ms as the frame space)  Vary number of DCTCs (7, 10, 13, 16, 19)  8 GMM mixtures for each state of HMMs

14 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Evaluation with Dynamic Features  Use small number of DCTCs (1, 2, 3, or 4), and vary the number of DCSs  Vary the number of frames per block, so that DCS terms are computed over 50, 100, or 300 ms  10 ms frame length, 5 ms frame space  8 GMM mixtures for each GMM state

15 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York

16 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Evaluation with Spectral/Temporal Features  Use 40 features total, and 40 GMM mixtures.  Vary frame length and the number of frames per block  2 ms frame space  8 ms block space  Vary the combination of different numbers of DCTCs and DCSs—but fix number of parameters to 40

17 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Evaluation with Spectral/Temporal Features  Condition 1: 8 DCTCs and 5 DCSs  Condition 2: 9 DCTCs and 5 DCSs  Condition 3: 10 DCTCs and 4 DCSs Ss  Condition 4: 11 DCTCs and 4 DC  Condition 5: 12 DCTCs and 4 DCSs  Condition 6: 13 DCTCs and 4 DCSs  Condition 7: 14 DCTCs and 3 DCSs  Condition 8: 15 DCTCs and 3 DCSs

18 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Conclusions from these results  Features which represent trajectories of global spectral shape carry considerable information for ASR.  There are tradeoffs between “static” spectral features and “dynamic” spectral trajectory features  Spectral resolution can be relatively low for spectral ASR features  “Information” in trajectory features is more “dilute” than in spectral features

19 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State University of New York Questions?