Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Processing Using HTK Trevor Bowden 12/08/2008.

Similar presentations


Presentation on theme: "Speech Processing Using HTK Trevor Bowden 12/08/2008."— Presentation transcript:

1 Speech Processing Using HTK Trevor Bowden 12/08/2008

2 Outline Concept of Project HTK Feature Extraction Capabilities Details of Feature Extraction Script Future Development

3 Concept of Project Explore HTK Feature Extraction Capabilities  Feature Output Types  Additional Feature Parameters Ideal Solution  Derive Any Feature Type from Any Corpus

4 HTK Feature Extraction Models Hamming Window FFT()Log() Linear Prediction Analysis Cepstral Analysis Hamming Window

5 HTK Feature Extraction Capabilities Feature Extraction Methods  Linear Prediction Analysis  Cepstral Analysis  Mel-Scaling  Perceptual Linear Prediction Analysis Additional Feature Information  Signal Energy  Derivative Information

6 Linear Prediction Analysis Vocal Tract Transfer Function Transfer Function Coefficients Solution Autocorrelation Matrices Autocorrelation of Speech Amplitude of Model

7 Cepstral Analysis Logarithmic Spectral Domain (Cepstral Domain) Allows for Separation of Convolved Signals

8 Mel-Scaling Perception of sound by the human mind is non-linear in that the mind perceives a non-linear scale of pitches to be equally spaced in the frequency domain.

9 Perceptual Linear Prediction Analysis Perceptual linear prediction is a combination of both linear prediction and Cepstral analysis. The spectrum of the speech data is first converted using the Mel scale. The data is then cubed and linear prediction coefficients are computed. From these coefficients Cepstral analysis is performed.

10 Signal Energy and Derivatives Signal Energy Delta Coefficients Acceleration Coefficients Third Differential Coefficients

11 Speech Processing of the AMI Corpus Ideal Solution Yields Generic Feature Types from Generic Corpus Corpora Have Varying Audio File Types and Varying Organizational Structures Corpora Have Varying Methods for Annotation

12 Speech Processing of the AMI Corpus Project Solution Yields Generic Feature Types from Corpora with Riff Format WAV Audio Files Two Main Functions of Script  Traverse Corpus Directory Tree Generate List of Audio Files Produce Feature Data  Using User-Defined Configuration File

13 Future Development Expand Script to Handle Audio Inputs of Any File Type Include Processing for Specific Corpus Annotations


Download ppt "Speech Processing Using HTK Trevor Bowden 12/08/2008."

Similar presentations


Ads by Google