EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speech Recognition Chapter 3
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Jacob Zurasky ECE5525 Fall  Goals ◦ Determine if the principles of speech processing relate to snoring sounds. ◦ Use homomorphic filtering techniques.
Speech and Audio Processing and Recognition
Spectral Analysis Goal: Find useful frequency related features
Feature Extraction for ASR Spectral (envelope) Analysis Auditory Model/ Normalizations.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department.
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Speech Recognition in Noise
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
A PRESENTATION BY SHAMALEE DESHPANDE
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Representing Acoustic Information
Topics covered in this chapter
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
By Sarita Jondhale1 Signal Processing And Analysis Methods For Speech Recognition.
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Basics of Neural Networks Neural Network Topologies.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Ch4 Short-time Fourier Analysis of Speech Signal z Fourier analysis is the spectrum analysis. It is an important method to analyze the speech signal. Short-time.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Linear Prediction.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
7.0 Speech Signals and Front-end Processing References: , 3.4 of Becchetti of Huang.
Adv DSP Spring-2015 Lecture#11 Spectrum Estimation Parametric Methods.
Speech Enhancement Summer 2009
Spectral Analysis Models
PATTERN COMPARISON TECHNIQUES
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Automated Detection of Speech Landmarks Using
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Linear Prediction.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
8-Speech Recognition Speech Recognition Concepts
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Digital Systems: Hardware Organization and Design
Linear Prediction.
Mark Hasegawa-Johnson 10/2/2018
Homomorphic Speech Processing
Dealing with Acoustic Noise Part 1: Spectral Estimation
Speech Processing Final Project
Speech Signal Representations
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27, 2004

What are ‘speech features’? Speech features are: –A linear/nonlinear projection of raw speech, –A compressed representation, –Salient and succinct characteristics (for a given application).

Why extract features? Applications –Communications –Automatic speech recognition –Speaker identification/verification Feature extraction allows for the addition of expert information into the solution.

Application example Automatic speech recognition between two speech utterances x(n) and y(n). Naïve approach: Problems w/ this approach?

Naïve approach limitations x(n) = -1*y(n), yet E≠0 x(n) = α* y(n), yet E≠0 x(n) = y(n-m), yet E≠0 These variations can be removed by considering the normalized magnitude spectrum: A feature vector of the raw speech signal!

Frequency domain features Then consider the Euclidean distance between |X(k)| and |Y(k)| : The Fourier transform: What about pitch?

Pitch harmonics Pitch harmonics reduce overlap between spectra. Can we remove pitch? How?

Pitch-free speech features Linear prediction (1967) –Parametric estimator: all-pole filter for vocal tract model –Hugs peaks of spectra –Computationally inexpensive –Transformable to more stable domains (cepstrum, reflection, pole pairs)

Pitch-free speech features Linear prediction (1967) –Parameters sensitive to noise, numeric precision –Doesn’t model zeros in vocal tract transfer function (nasals, additive noise) –Model order empirically determined: Too low: miss formants Too high: represent pitch information

Pitch-free speech features Cepstrum (1962) –Nonparametric estimator: homomorphic filtering transforms convolution to addition –Pitch removed by low-time liftering in quefrency domain –Orthogonal outputs –Cepstral mean subtraction (removes stationary convolutive channel effects)

Pitch-free speech features Cepstrum (1962) –Doesn’t consider human auditory system characteristics (critical bands) –Sensitive to outliers from log compression of noisy spectrum (“sum of the log” approach)

Modern improvements Perceptual linear prediction (Hermansky,1990) –Performs LP on the output of perceptually motivated filter banks –Filter bank smoothes pitch (and noise) –All the same benefits as LPC Mel frequency cepstral coefficients (Davis & Mermelstein, 1980) –Replace magnitude spectrum with mel-spaced filter bank energy –Filter bank smoothes pitch (and noise) –Orthogonal outputs (Gaussian modeling)

Modern improvements Human factor cepstral coefficients (Skowronski & Harris, 2002) –Decouples filter bandwidth from other filter spacing –Sets bandwidth according to critical band expressions for the human auditory system –Bandwidth may also be optimized to control trade-off between local SNR and spectral resolution

Other features Temporal features –Static features (position) –Δ: first derivative in time of each feature (velocity) (1981) –ΔΔ: second derivative in time (acceleration) (1981) Cepstral Mean Subtraction (1974) –Convolution constant  Additive constant –Removes static channel effects (microphone)

Typical feature matrix Time Features Position Velocity Acceleration

References Auditory Toolbox for Matlab –Malcolm Slaney, MFCC code – HFCC and other Matlab tools –blockX2.m: change speech vector into column matrix of overlapping windows of speech –fbInit.m: create HFCC filter bank and DCT matrix –getFeatures.m: extract HFCC features –