Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speech Recognition Chapter 3
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
OPTIMUM FILTERING.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech and Audio Processing and Recognition
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Feature Extraction for ASR Spectral (envelope) Analysis Auditory Model/ Normalizations.
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Pole Zero Speech Models Speech is nonstationary. It can approximately be considered stationary over short intervals (20-40 ms). Over thisinterval the source.
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
System Microphone Keyboard Output. Cross Synthesis: Two Implementations.
AGC DSP AGC DSP Professor A G Constantinides 1 Digital Filter Specifications Only the magnitude approximation problem Four basic types of ideal filters.
Representing Acoustic Information
Introduction to Spectral Estimation
Digital Signals and Systems
LE 460 L Acoustics and Experimental Phonetics L-13
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Linear Prediction Coding (LPC)
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Digital Systems: Hardware Organization and Design
Linear Prediction Coding of Speech Signal Jun-Won Suh.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
T – Biomedical Signal Processing Chapters
By Sarita Jondhale1 Signal Processing And Analysis Methods For Speech Recognition.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Basics of Neural Networks Neural Network Topologies.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Chapter 6 Linear Predictive Coding (LPC) of Speech Signals 6.1 Basic Concepts of LPC 6.2 Auto-Correlated Solution of LPC 6.3 Covariance Solution of LPC.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Derivation Computational Simplifications Stability Lattice Structures.
0 - 1 © 2007 Texas Instruments Inc, Content developed in partnership with Tel-Aviv University From MATLAB ® and Simulink ® to Real Time with TI DSPs Spectrum.
Lecture#10 Spectrum Estimation
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
More On Linear Predictive Analysis
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
Autoregressive (AR) Spectral Estimation
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Lecture 12: Parametric Signal Modeling XILIANG LUO 2014/11 1.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Linear Prediction.
Adv DSP Spring-2015 Lecture#11 Spectrum Estimation Parametric Methods.
Professor A G Constantinides 1 Digital Filter Specifications We discuss in this course only the magnitude approximation problem There are four basic types.
PATTERN COMPARISON TECHNIQUES
Figure 11.1 Linear system model for a signal s[n].
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
3.1 Introduction Why do we need also a frequency domain analysis (also we need time domain convolution):- 1) Sinusoidal and exponential signals occur.
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Linear Predictive Coding Methods
Digital Systems: Hardware Organization and Design
Linear Prediction.
Chapter 7 Finite Impulse Response(FIR) Filter Design
Speech Processing Final Project
Presentation transcript:

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage

Speech Signal Representations I Decomposition of the speech signal (x[n]) as a source (e[n]) passed through a linear time- varying filter (h[n]).

Speech Signal Representations I Estimation of the filter, inspired by:  Speech production models –Linear Predictive Coding (LPC) –Cepstral analysis  Speech perception models (part II) –Mel-frequency cepstrum –Perceptual Linaer Prediction (PLP) Speech recognizers estimate filter characteristics and ignore the source

Speech Signal Representations I Short-Time Fourier Analysis  Spectrogram –Representation of a signal highlighting several of its properties based on short-time Fourier analysis –Two dimensional: time horizontal and frequency vertical –Third ‘dimension’: gray or color level indicating energy

Speech Signal Representations I Short-Time Fourier Analysis  Spectrogram –Narrow band  Long windows (> 20 ms) →  Narrow bandwidth  Lower time resolution, better frequency resolution –Wide band  Short windows ( <10 ms) →  Wide bandwidth  Good time resolution, lower frequency resolution –Pitch synchronous  Requires knowledge of local pitch period

Speech Signal Representations I Short-Time Fourier Analysis  Spectrogram

Speech Signal Representations I Short-Time Fourier Analysis  Window analysis –Series of short segments, analysis frames –Short enough so that the signal is stationary –Usually constant, ms –Overlaps possible –Different types of window functions (w m [n]):  Rectangular (equal to no window function)  Hamming  Hanning

Speech Signal Representations I Short-Time Fourier Analysis  Window analysis –Window size must be long enough  Rectangular: N ≥ M  Hamming, Hanning: N ≥ 2M –Pitch period not known in advance → –Prepare for lowest pitch period → –At least 20ms for rectangular or 40ms for Hamming/Hanning (50Hz) –But longer windows give a more average spectrum instead of distinct spectra → –Rectangular window has better time resolution

Speech Signal Representations I Short-Time Fourier Analysis

 Window analysis –Frequency response not completely zero outside main lobe → Spectral leakage –Second lobe of a Hamming window is approx. 43dB below main lobe → less spectral leakage –Hamming, Hanning, triangular windows offer less spectral leakage → –Rectangular windows are rarely used despite their better time resolution

Speech Signal Representations I Short-Time Fourier Analysis

Short-time spectrum of male voice speech a)Time signal /ah/ local pitch 110Hz b)30ms rectangular window c)15ms rectangular window d)30ms Hamming window e)15ms Hamming window

Speech Signal Representations I Short-Time Fourier Analysis Short-time spectrum of female voice speech a)Time signal /aa/ local pitch 200Hz b)30ms rectangular window c)15ms rectangular window d)30ms Hamming window e)15ms Hamming window

Speech Signal Representations I Short-Time Fourier Analysis Short-time spectrum of unvoiced speech a)Time signal b)30ms rectangular window c)15ms rectangular window d)30ms Hamming window e)15ms Hamming window

Speech Signal Representations I Linear Predictive Coding  LPC a.k.a. auto-regressive (AR) modeling  All-pole filter is good approximation of speech, with p as the order of the LPC analysis:  Predicts current sample as linear combination of past p samples

Speech Signal Representations I Linear Predictive Coding  To estimate predictor coefficients (a k ), use short- term analysis technique  Per segment, minimize the total prediction error by calculating the minimum squared error  Take the derivative, equate it to 0; expressed as a set of p linear equations: the Yule-Walker equations

Speech Signal Representations I Linear Predictive Coding  Solution of the Yule-Walker equations: –Any standard matrix inversion package –Due to the special form of the matrix, efficient solutions:  Covariance method using the Cholesky decomposition  Autocorrelation method using windows, results in equations with Toeplitz matrices, solved by the Durbin recursion algorithm  Lattice method equivalent to Levinson Durbin recursion often used in fixed-point implementations because lack of precision doesn’t result in unstable filters

Speech Signal Representations I Linear Predictive Coding

 Spectral analysis via LPC –All-pole (IIR) filter –Peaks at the roots of the denominator

Speech Signal Representations I Linear Predictive Coding  Prediction error –Should be (approximately) the excitation –Unvoiced speech, expect white noise; OK –Voiced speech, expect impulse train; NOK  All-pole assumption not altogether valid  Real speech not perfectly periodic  Pitch synchronous analysis gives better results –LPC order  Larger p gives lower prediction errors  Too large a p results in fitting the individual harmonics → separation between filter and source will not be so good

Speech Signal Representations I Linear Predictive Coding  Prediction error –Inverse LPC filter gives residual signal

Speech Signal Representations I Linear Predictive Coding  Alternatives for the predictor coefficients –Line Spectral Frequencies  local sensitivity  efficiency –Reflection Coefficients  Guaranteed stable → useful for coefficient interpolated over time –Log-area ratios  Flat spectral sensitivity –Roots of the polynomial  Represent resonance frequencies and bandwidths

Speech Signal Representations I Cepstral Processing –A homomorphic transformation converts a convolution into a sum:

Speech Signal Representations I Cepstral Processing