1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Current techniques for measuring
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Lecture 7: Basis Functions & Fourier Series
Acoustic Characteristics of Vowels
Pattern Recognition and Machine Learning
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Speech Recognition Chapter 3
P M V Subbarao Professor Mechanical Engineering Department
AGC DSP AGC DSP Professor A G Constantinides©1 A Prediction Problem Problem: Given a sample set of a stationary processes to predict the value of the process.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
Speech & Audio Processing
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
A PRESENTATION BY SHAMALEE DESHPANDE
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
EE513 Audio Signals and Systems Wiener Inverse Filter Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Classification and Prediction: Regression Analysis
Calibration & Curve Fitting
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Topics covered in this chapter
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Basics of Neural Networks Neural Network Topologies.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Effect of Noise on Angle Modulation
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
EE513 Audio Signals and Systems
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
More On Linear Predictive Analysis
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Lecture 12: Parametric Signal Modeling XILIANG LUO 2014/11 1.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Linear Prediction.
By Vitaly Horban Speech processing ® Intel ® Integrated Performance Primitives vs. Speech Libraries & Toolkits Math Inside & Outside
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
PATTERN COMPARISON TECHNIQUES
3.1 Introduction Why do we need also a frequency domain analysis (also we need time domain convolution):- 1) Sinusoidal and exponential signals occur.
ARTIFICIAL NEURAL NETWORKS
Digital Communications Chapter 13. Source Coding
Vocoders.
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Linear Prediction.
Linear Predictive Coding Methods
EE513 Audio Signals and Systems
Digital Systems: Hardware Organization and Design
Chapter 7 Finite Impulse Response(FIR) Filter Design
Measuring the Similarity of Rhythmic Patterns
Presentation transcript:

1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:

2 4.2 SPEECH (ENDPIONT) DETECTION

3 4.3 DISTORTION MEASURES- MATHEMATICAL CONSIDERATIONS x and y: two feature vectors defined on a vector space X The properties of metric or distance function d: A distance function is called invariant if

4 PERCEPTUAL CONSIDERATIONS Spectral changes that do not fundamentally change the perceived sound include:

5 PERCEPTUAL CONSIDERATIONS Spectral changes that lead to phonetically different sounds include:

6 PERCEPTUAL CONSIDERATIONS Just-discriminable change: known as JND (just-noticeable difference), DL (difference limen), or differential threshold

7 4.4 DISTORTION MEASURES- PERCEPTUAL CONSIDERATIONS

8

9 Spectral Distortion Measures Spectral Density Fourier Coefficients of Spectral Density Autocorrelation Function

10 Spectral Distortion Measures Short-term autocorrelation Then is an energy spectral density

11 Spectral Distortion Measures Autocorrelation matrices

12 Spectral Distortion Measures If σ/A(z) is the all-pole model for the speech spectrum, The residual energy resulting from “inverse filtering” the input signal with an all-zero filter A(z) is:

13 Spectral Distortion Measures Important properties of all-pole modeling: The recursive minimization relationship:

14 LOG SPECTRAL DISTANCE

15 LOG SPECTRAL DISTANCE

16 CEPSTRAL DISTANCES The complex cepstrum of a signal is defined as The Fourier transform of log of the signal spectrum.

17 CEPSTRAL DISTANCES Truncated cepstral distance

18 CEPSTRAL DISTANCES

19 CEPSTRAL DISTANCES

20 Weighted Cepstral Distances and Liftering It can be shown that under certain regular conditions, the cepstral coefficients, except c0, have: 1)Zero means 2)Variances essentially inversed proportional to the square of the coefficient index: If we normalize the cepstral distance by the variance inverse:

21 Weighted Cepstral Distances and Liftering Differentiating both sides of the Fourier series equation of spectrum: This is an L2 distance based upon the differences between the spectral slopes

22 Cepstral Weighting or Liftering Procedure h is usually chosen as L/2 and L is typically 10 to 16

23 A useful form of weighted cepstral distance:

24 Likelihood Distortions Previously defined: Itakura-Saito distortion measure Where and are one-step prediction errors of and as defined: of and as defined:

25

26 Likelihood Distortions The residual energy can be easily evaluated by:

27 By replacing by its optimal p-th order LPC model spectrum: If we set σ 2 to match the residual energy α : Which is often referred to as Itakura distortion measure Likelihood Distortions

28 Likelihood Distortions Another way to write the Itakura distortion measure is: Another gain-independent distortion measure is called the Likelihood Ratio distortion:

Likelihood Distortions

Likelihood Distortions That is, when the distortion is small, the Itakura distortion measure is not very different from the LR distortion measure is not very different from the LR distortion measure

Likelihood Distortions

Likelihood Distortions Consider the Itakura-Saito distortion between the input and output of a linear system H(z)

Likelihood Distortions

Likelihood Distortions

Variations of Likelihood Distortions Symmetric distortion measures:

Variations of Likelihood Distortions COSH distortion

Variations of Likelihood Distortions

Spectral Distortion Using a Warped Frequency Scale Psychophysical studies have shown that human perception of the frequency Content of sounds does not follow a linear scale. This research has led to the idea of defining subjective pitch of pure tones. For each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the “mel” scale. As a reference point, the pitch of a 1 kHz tone, 40 dB above the perceptual hearing threshold, is defined as 1000 mels.

39

Spectral Distortion Using a Warped Frequency Scale

Spectral Distortion Using a Warped Frequency Scale

Spectral Distortion Using a Warped Frequency Scale

43 Examples of Critical bandwidth

44 Warped cepstral distance b is the frequency in Barks, S(θ(b)) is the spectrum on a Bark scale, and B is the Nyquist frequency in Barks.

Spectral Distortion Using a Warped Frequency Scale Where the warping function is defined by

Spectral Distortion Using a Warped Frequency Scale

Spectral Distortion Using a Warped Frequency Scale

Spectral Distortion Using a Warped Frequency Scale

Spectral Distortion Using a Warped Frequency Scale Mel-frequency cepstrum: is the output power of the triangular filters is the output power of the triangular filters Mel-frequency cepstral distance

Alternative Spectral Representations and Distortion Measures

Alternative Spectral Representations and Distortion Measures Wave reflection occurs at each sectional boundary with reflection coefficients denoted by

Alternative Spectral Representations and Distortion Measures Another possible parametric representation of the all-pole spectrum is the set of line spectral frequencies (LSFs) defined as the roots of the following two polynomials based Upon the inverse filter A(z): These two polynomials are equivalent to artificially augmenting the p-section nonuniform acoustic tube with an extra section that is either completely closed (area=0) or completely open (area=∞). LSF parameters, due to their particular structure, possess properties similar to those of the formant frequencies and bandwidths.

Alternative Spectral Representations and Distortion Measures Weighted slope metric proposed by Klatt:

Alternative Spectral Representations and Distortion Measures

Alternative Spectral Representations and Distortion Measures

56 ComputationExpressionNotationDistortion Measure Measure Summary of Spectral Distortion Measures

57 ComputationExpressionNotation Distortion Measure Summary of Spectral Distortion Measures

58 ComputationExpressionNotation Distortion Measure Summary of Spectral Distortion Measures

INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE A first-order differential (log) spectrum is defined by:

INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE Fitting the cepstral trajectory by a second order polynomial, Choose h1, h2, h3 such that E is minimized. Differentiating E with respect to h1, h2, and h3 and setting to zero results in 3 equations:

INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE The solutions to these equations are:

INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE

INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE The first and second time derivatives of cn can be obtained by differentiating the fitting curve, giving

INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE A differential spectral distance: A second differential spectral distance:

INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE Cepstral weighting or liftering by differentiating Combining the first and second differential spectral distances with the Cepstral distance results in:

INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE A weighted differential cepstral distance:

INCORPORATION OF SPECTRAL DYNAMIC FEATURES INTO THE DISTORTION MEASURE Taking the L2 distance Other operators can be added to produce a combined representation Of the spectrum and the differential spectra. As an example: