Topics covered in this chapter

Slides:

Advertisements

Similar presentations

Richard Young Optronic Laboratories Kathleen Muray INPHORA

Advertisements

Shapelets Correlated with Surface Normals Produce Surfaces Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Acoustic Characteristics of Vowels

Pattern Recognition and Machine Learning

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.

Speech Recognition Chapter 3

Filtering Filtering is one of the most widely used complex signal processing operations The system implementing this operation is called a filter A filter.

Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.

Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,

A PRESENTATION BY SHAMALEE DESHPANDE

Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:

Chapter 15 Fourier Series and Fourier Transform

AGC DSP AGC DSP Professor A G Constantinides 1 Digital Filter Specifications Only the magnitude approximation problem Four basic types of ideal filters.

Introduction to Frequency Selective Circuits

Representing Acoustic Information

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Lecture 1 Signals in the Time and Frequency Domains

EBB Chapter 2 SIGNALS AND SPECTRA Chapter Objectives: Basic signal properties (DC, RMS, dBm, and power); Fourier transform and spectra; Linear systems.

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.

Wireless and Mobile Computing Transmission Fundamentals Lecture 2.

Modern Navigation Thomas Herring

1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Basics of Neural Networks Neural Network Topologies.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Effect of Noise on Angle Modulation

Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.

Chapter 6. Effect of Noise on Analog Communication Systems

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

GG313 Lecture 24 11/17/05 Power Spectrum, Phase Spectrum, and Aliasing.

Frequency Modulation ECE 4710: Lecture #21 Overview:

Performance Comparison of Speaker and Emotion Recognition

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

More On Linear Predictive Analysis

Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

Lecture 3: The Sampling Process and Aliasing 1. Introduction A digital or sampled-data control system operates on discrete- time rather than continuous-time.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

CHAPTER 2.3 PROBABILITY DISTRIBUTIONS. 2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION  The Gaussian distribution is an approximation to the binomial distribution.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,

Digital Signal Processing Lecture 6 Frequency Selective Filters

Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.

Professor A G Constantinides 1 Digital Filter Specifications We discuss in this course only the magnitude approximation problem There are four basic types.

PATTERN COMPARISON TECHNIQUES

QRS Detection Linda Henriksson 1.

Linear Prediction.

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

Linear Predictive Coding Methods

Fundamentals of Electric Circuits Chapter 14

EE513 Audio Signals and Systems

7.1 Introduction to Fourier Transforms

Digital Systems: Hardware Organization and Design

Speech Perception (acoustic cues)

Chapter 7 Finite Impulse Response(FIR) Filter Design

Chapter 8 The Discrete Fourier Transform

Tania Stathaki 811b LTI Discrete-Time Systems in Transform Domain Ideal Filters Zero Phase Transfer Functions Linear Phase Transfer.

CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.

Chapter 7 Finite Impulse Response(FIR) Filter Design

Speech Processing Final Project

Presenter: Shih-Hsiang(士翔)

Measuring the Similarity of Rhythmic Patterns

Presentation transcript:

Topics covered in this chapter Three basic problems in pattern comparison How to detect the speech signal in a recording interval (i.e. separate speech from background) How to locally compare spectra from two speech utterances (local spectral distortion measure), and How to globally align and normalize the distance between two speech patterns (sequences of spectral vectors) which may or may not represent the same linguistic sequence of sounds (word, phrase, sentence, etc.)

Distortion Measures Mathematical considerations to find out the dissimilarity between two feature vectors. Let x and y are two vectors defined on a vector space X. A metric or distance function d on the vector space X as a real valued function on the Cartesian product XX is defined as ……

Distortion Measures

Distortion Measures If a measure of a distance d, satisfies only the positive definiteness property then it is called as distortion measure if vectors are representation of the speech spectra. Distance in speech recognition means measure of dissimilarity. For speech processing, an important consideration in choosing a measure of distance is its subjective meaningfulness The mathematical measure of distance to be useful in speech processing should consider the lingustic characteristics.

Distortion Measures For example a large difference in the waveform error does not always imply large subjective differences.

Distortion Measures Perceptual considerations: the choice of an appropriate measure of spectral dissimilarity is the concept of subjective judgment of sound difference or phonetic relevance. Spectral changes that keep the sound the same perceptually should be associated with small distances. And spectral changes that keep the sound the different perceptually should be associated with large distances

Distortion Measures Consider comparing two spectral representations, S(w) and S’(w) using a distance measure d(S,S’) If the spectral content of two signal are phonetically same (same sound) then the distance measure d is ideally very small

Distortion Measures Spectral changes due to large phonetic distance include Significant differences in formant locations. i.e the spectral resonance of S(w) and S’(w) occure at very different frequencies. Significant differences in formant bandwidths. i.e the frequency widths of spectral resonance of S(w) and S’(w) are very different. For each of these cases sounds are different so the spectral distance measure d(S,S’) is ideally very large

Distortion Measures To relate a physical measure of difference to subjective perceived measure of difference it is important to understand auditory sensitivity to changes in frequencies, bandwidths of the speech spectrum, signal sensitivity and fundamental frequency.

Distortion Measures This sensitivity is presented in the form of just discriminable change – the change in a physical parameter such that the auditory system can reliably detect the change as measured in standard listening test. Terms used to describe just discriminable change include the difference limen (DL), just noticeable difference (JND), and differential threshold

Spectral-distortion measures Measuring the difference between two speech patterns in terms of average spectral distortion is reasonable way both in terms of its mathematically tractability and its computational efficiency Perceived sound differences can be interpreted in terms of differences of spectral features

Log spectral distance Consider two spectra S(w) and S’(w). The difference between two spectra on a log magnitude versus frequency scale is defined by A distance or distortion measure between S and S’ can be defined by

This is related to how humans perceive sound differences

Log spectral distance For P=1 the above equation defines the mean absolute log spectral distortion For P=2, equation defines the rms log spectral distortion that has application in many speech processing systems For P tends to infinity, equation reduces to the peak log spectral distrotion

Log spectral distance Since perceived loudness of a signal is approximately logarithmic, the log spectral distance family appears to be closely tied to the subjective assessment of sound differences; hence, it is perceptually relevant distortion measure We can calculate the distortion using short time FFT power spectra and by LPC model spectra (all pole smooth model spectra) The smooth spectral difference allows a closer examination of the properties of the distortion measure.

Cepstral distances For the Cepstral coefficients we use the rms log spectral distance.

Cepstral distances

Cepstral distances Since the cepstrum is a decaying sequence, the summation in equation 3 does not require an infinite number of terms The number of terms must be no less than p (cepstral coefficients) The truncated cepstral distance is defined as The truncated cepstral distance is a very efficient method for estimation the rms log spectral distance.

Weighted cepstral distances and liftering Several other properties of the cepstrum when properly utilized are beneficial for speech recognition applications It can be shown that under certain regular conditions, the cepstral coefficients except c0 have Zero means Variance essentially inversely proportional to the square of the coefficient index, s.t

Weighted cepstral distances and liftering Liftering makes the system more robust to noise, Liftering is done to obtain the equal variance Liftering is significant for the improvement for the recognition performance If we incorporate n2 factor into the cepstral distance to normalize the contribution from each cepstarl term, the distance Spectral slope: The amplitude of the harmonics resulting from vocal fold vibration falls off by 12 dB per octave. This means that each time the frequency doubles, the amplitude of the harmonics decreases by 12 dB. This is called the spectral slope or tilt or roll-off in the source spectrum. Low cepstral coeff affects spectral slope

Weighted cepstral distances and liftering The variability of higher capstral coefficients are more influenced by the inherent objects (artifacts) of LPC analysis than that of lower cepstral coefficients. For speech recognition, therefore, suppression of higher cepstral coefficients in the calculation of a cepstral distance should lead to a more reliable measurement of spectral differences than otherwise

Weighted cepstral distances and liftering The Lpc spectrum also includes components that are strong functions of the speaker’s glottal shape and vocal cord duty cycles. These components affects mainly the first few cepstral coefficients. For speech recognition the phonetic content of the sound is important and not these components so these components are need to be de-emphasized

Weighted cepstral distances and liftering A cepstral weighting or liftering procedure, w(n) can therefore be designed to control the non information-bearing cepstral variabilities for reliable discrimination of sounds. The index weighting as used in equation 2 is the example of the simple form of cepstral weighting

Weighted cepstral distances and liftering The original sharp spectral peaks are highly sensitive to the LPC analysis condition and the resulting peakiness creates unnecessary sensitivity in spectral comparison The liftering process tends to reduce the sensitivity without altering the fundamental “formant” structure. i.e the undesirable (noiselike) components of the LPC spectrum are reduced or removed, while essential characteristics of the “formant” structure are retained

Weighted cepstral distances and liftering A useful form of weighted cepstral distance is Where w(n) is any lifter function.

Itakura and Saito The log spectral difference V(w) is defined by V(w) = log S(w) – log S’(w) is the basis of many distortion measures The distortion measure proposed by Itakura and Saito in their formulation of linear prediction as an approximate maximum likelihood estimation is

Itakura and Saito

Itakura and Saito The Itakura Satio distortion measure can be used to illustrate the spectral matching properties by replacing S’(w) with the pth order all pole spectrum

Itakura

Likelihood Distortions The role of the gain terms is not explicit in the Itakura distortion because the signal level essentially makes no difference in the human understanding of speech so long as it is unambiguously heard. Gain independent distortion measure called likelihood ration distortion can be derived directly from IS distortion measure

Likelihood Distortions When the distortion is very small the Itakura distortion measure is not very different from the likelihood distortion measure.

Variations of likelihood distortions Compare to the cepstral distance likelihood distortions are asymmetric. To symmetries the distortion measure there are two methods COSH distortion Weighted likelihood distortion

COSH distortion COSH distortion is given by The COSH distortion is almost identical to twice the log spectral distance for small distortions

Weighted likelihood ratio distortion The purpose of weighting is to take the spectral shape into account as a weighting function such that different spectral components along frequency axis can be emphasized or de-emphasized to reflect some of the observed perceptual effects

Weighted likelihood ratio distortion

Comparison of dWLR and d22

Weighted slope metric distortion measure Based on a series of experiments designed to measure the subjective “phonetic” distance between pairs of synthetic vowels and fricatives, it is found that by controlled variation of several acoustic parameters and spectral distortions including formant frequency, formant amplitude, spectral tilt, highpass, lowpass, and notch filtering only formant frequency deviation was phonetically relevant

Weighted slope metric distortion measure WSM attach a weight on the spectral slope difference near spectral peaks, rather than the spectral amplitude difference, and take the overall energy difference explicitly into consideration Critical band: is the bandwidth at which subjective responses such as loudness become significantly different. After the critical band the increased loudness is perceived S

Summary The spectral distortion measures are designed to measure dissimilarity or distance between two (power) spectra of speech Many of these dissimilarity measures are not metrics because they do not satisfy the symmetry property If an objective speech distortion measure needs to reflect the subjective reality of human perception of sound differences, or even phonetic disparity, the asymmetry seems to be actual desirable. Symmetric d(S,S’)=d(S’,S) S

Summary All distortion measures are equally important because certain distortion measures may be better for an less noisy environment, while others may be robust when the background is more noisy.

Summary Log spectral: Lp metric requires large amount of calculations because we need 2 FFT’s to obtain S(w) and S’(w), logarithms of all values of S and S’ and an integral

Summary Truncated and weighted cepstral: Requires only L operations where L is of the order of 12-16 hence calculations required are less compared to Lp metric

Summary The likelihood, Itakura-Saito, Itakura and COSH measurements: all requires on the order of p is the LPC order of all pole polynomial (8-12). Hence the computations are same for cepstral measures

Summary

Summary Weighted likelihood ratio distortion: Requires L operations, similar to that of the cepstral measures

Summary Weighted Slope metric (WSM): Requires K operations, where K is the number of frequency bands used in computations (32-64)

Summary From all these points we can say that all the measures are both physically reasonable and computationally tractable for speech recognition except for the Lp metrics. Hence, practically we are going to use all the measures to study the speech recognition system