Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage

Speech Signal Representations I Decomposition of the speech signal (x[n]) as a source (e[n]) passed through a linear time- varying filter (h[n]).

Speech Signal Representations I Estimation of the filter, inspired by:  Speech production models –Linear Predictive Coding (LPC) –Cepstral analysis  Speech perception models (part II) –Mel-frequency cepstrum –Perceptual Linaer Prediction (PLP) Speech recognizers estimate filter characteristics and ignore the source

Speech Signal Representations I Short-Time Fourier Analysis  Spectrogram –Representation of a signal highlighting several of its properties based on short-time Fourier analysis –Two dimensional: time horizontal and frequency vertical –Third ‘dimension’: gray or color level indicating energy

Speech Signal Representations I Short-Time Fourier Analysis  Spectrogram –Narrow band  Long windows (> 20 ms) →  Narrow bandwidth  Lower time resolution, better frequency resolution –Wide band  Short windows ( <10 ms) →  Wide bandwidth  Good time resolution, lower frequency resolution –Pitch synchronous  Requires knowledge of local pitch period

Speech Signal Representations I Short-Time Fourier Analysis  Spectrogram

Speech Signal Representations I Short-Time Fourier Analysis  Window analysis –Series of short segments, analysis frames –Short enough so that the signal is stationary –Usually constant, 20-30 ms –Overlaps possible –Different types of window functions (w m [n]):  Rectangular (equal to no window function)  Hamming  Hanning

Speech Signal Representations I Short-Time Fourier Analysis  Window analysis –Window size must be long enough  Rectangular: N ≥ M  Hamming, Hanning: N ≥ 2M –Pitch period not known in advance → –Prepare for lowest pitch period → –At least 20ms for rectangular or 40ms for Hamming/Hanning (50Hz) –But longer windows give a more average spectrum instead of distinct spectra → –Rectangular window has better time resolution

Speech Signal Representations I Short-Time Fourier Analysis

 Window analysis –Frequency response not completely zero outside main lobe → Spectral leakage –Second lobe of a Hamming window is approx. 43dB below main lobe → less spectral leakage –Hamming, Hanning, triangular windows offer less spectral leakage → –Rectangular windows are rarely used despite their better time resolution

Speech Signal Representations I Short-Time Fourier Analysis

Short-time spectrum of male voice speech a)Time signal /ah/ local pitch 110Hz b)30ms rectangular window c)15ms rectangular window d)30ms Hamming window e)15ms Hamming window

Speech Signal Representations I Short-Time Fourier Analysis Short-time spectrum of female voice speech a)Time signal /aa/ local pitch 200Hz b)30ms rectangular window c)15ms rectangular window d)30ms Hamming window e)15ms Hamming window

Speech Signal Representations I Short-Time Fourier Analysis Short-time spectrum of unvoiced speech a)Time signal b)30ms rectangular window c)15ms rectangular window d)30ms Hamming window e)15ms Hamming window

Speech Signal Representations I Linear Predictive Coding  LPC a.k.a. auto-regressive (AR) modeling  All-pole filter is good approximation of speech, with p as the order of the LPC analysis:  Predicts current sample as linear combination of past p samples

Speech Signal Representations I Linear Predictive Coding  To estimate predictor coefficients (a k ), use short- term analysis technique  Per segment, minimize the total prediction error by calculating the minimum squared error  Take the derivative, equate it to 0; expressed as a set of p linear equations: the Yule-Walker equations

Speech Signal Representations I Linear Predictive Coding  Solution of the Yule-Walker equations: –Any standard matrix inversion package –Due to the special form of the matrix, efficient solutions:  Covariance method using the Cholesky decomposition  Autocorrelation method using windows, results in equations with Toeplitz matrices, solved by the Durbin recursion algorithm  Lattice method equivalent to Levinson Durbin recursion often used in fixed-point implementations because lack of precision doesn’t result in unstable filters

Speech Signal Representations I Linear Predictive Coding

 Spectral analysis via LPC –All-pole (IIR) filter –Peaks at the roots of the denominator

Speech Signal Representations I Linear Predictive Coding  Prediction error –Should be (approximately) the excitation –Unvoiced speech, expect white noise; OK –Voiced speech, expect impulse train; NOK  All-pole assumption not altogether valid  Real speech not perfectly periodic  Pitch synchronous analysis gives better results –LPC order  Larger p gives lower prediction errors  Too large a p results in fitting the individual harmonics → separation between filter and source will not be so good

Speech Signal Representations I Linear Predictive Coding  Prediction error –Inverse LPC filter gives residual signal

Speech Signal Representations I Linear Predictive Coding  Alternatives for the predictor coefficients –Line Spectral Frequencies  local sensitivity  efficiency –Reflection Coefficients  Guaranteed stable → useful for coefficient interpolated over time –Log-area ratios  Flat spectral sensitivity –Roots of the polynomial  Represent resonance frequencies and bandwidths

Speech Signal Representations I Cepstral Processing –A homomorphic transformation converts a convolution into a sum:

Speech Signal Representations I Cepstral Processing

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Similar presentations

Presentation on theme: "Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Similar presentations

Presentation on theme: "Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage."— Presentation transcript:

Similar presentations

About project

Feedback