1 2.5.4.1 Basics of Neural Networks. 2 2.5.4.2 Neural Network Topologies.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

Dimension reduction (1)
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
Dimensional reduction, PCA
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Multi-Resolution Analysis (MRA)
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Transforms: Basis to Basis Normal Basis Hadamard Basis Basis functions Method to find coefficients (“Transform”) Inverse Transform.
Principal Component Analysis Principles and Application.
Basic Image Processing January 26, 30 and February 1.
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
8-Speech Recognition  Speech Recognition Concepts  Speech Recognition Approaches  Recognition Theories  Bayse Rule  Simple Language Model  P(A|W)
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Review for Exam I ECE460 Spring, 2012.
By Sarita Jondhale1 Signal Processing And Analysis Methods For Speech Recognition.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Basics of Neural Networks Neural Network Topologies.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Generalizing Linear Discriminant Analysis. Linear Discriminant Analysis Objective -Project a feature space (a dataset n-dimensional samples) onto a smaller.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Systems (filters) Non-periodic signal has continuous spectrum Sampling in one domain implies periodicity in another domain time frequency Periodic sampled.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
CCN COMPLEX COMPUTING NETWORKS1 This research has been supported in part by European Commission FP6 IYTE-Wireless Project (Contract No: )
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Performance of Digital Communications System
Feature Extraction 主講人:虞台文.
Chapter 13 Discrete Image Transforms
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Chapter 15: Classification of Time- Embedded EEG Using Short-Time Principal Component Analysis by Nguyen Duc Thang 5/2009.
Signal Prediction and Transformation Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Spectral Analysis Models
PATTERN COMPARISON TECHNIQUES
Background on Classification
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
The general linear model and Statistical Parametric Mapping
Linear Prediction.
All about convolution.
PCA vs ICA vs LDA.
Outline Linear Shift-invariant system Linear filters
Outline Linear Shift-invariant system Linear filters
8-Speech Recognition Speech Recognition Concepts
EE513 Audio Signals and Systems
Basic Image Processing
Digital Systems: Hardware Organization and Design
The general linear model and Statistical Parametric Mapping
Feature space tansformation methods
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen, 2005 References:
Lecture 4 Image Enhancement in Frequency Domain
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Basics of Neural Networks

Neural Network Topologies

3

4

5TDNN

Neural Network Structures for Speech Recognition

7

Spectral Analysis Models

9

THE BANK-OF-FILTERS FRONT- END PROCESSOR

THE BANK-OF-FILTERS FRONT- END PROCESSOR

THE BANK-OF-FILTERS FRONT- END PROCESSOR

THE BANK-OF-FILTERS FRONT- END PROCESSOR

THE BANK-OF-FILTERS FRONT- END PROCESSOR

Types of Filter Bank Used for Speech Recognition

16 Nonuniform Filter Banks

17 Nonuniform Filter Banks

Types of Filter Bank Used for Speech Recognition

Types of Filter Bank Used for Speech Recognition

Implementations of Filter Banks Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: Where w(n) is a fixed lowpass filter

Implementations of Filter Banks

Frequency Domain Interpretation of the Short- Time Fourier Transform

Frequency Domain Interpretation of the Short-Time Fourier Transform

Frequency Domain Interpretation of the Short-Time Fourier Transform

Frequency Domain Interpretation of the Short-Time Fourier Transform

26 Linear Filter Interpretation of the STFT

FFT Implementation of a Uniform Filter Bank

28 Direct implementation of an arbitrary filter bank

Nonuniform FIR Filter Bank Implementations

Tree Structure Realizations of Nonuniform Filter Banks

Practical Examples of Speech- Recognition Filter Banks

Practical Examples of Speech- Recognition Filter Banks

Practical Examples of Speech- Recognition Filter Banks

Practical Examples of Speech- Recognition Filter Banks

Generalizations of Filter-Bank Analyzer

Generalizations of Filter-Bank Analyzer

Generalizations of Filter-Bank Analyzer

Generalizations of Filter-Bank Analyzer

39

40

41

42

43

44

45

46 روش مل - کپستروم روش مل - کپستروم Mel-scaling فریم بندی IDCT |FFT| 2 Low-order coefficients Differentiator Cepstra Delta & Delta Delta Cepstra سیگنال زمانی Logarithm

47 Time-Frequency analysis Short-term Fourier Transform Short-term Fourier Transform Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. W(n): windowing function W(n): windowing function N: frame length N: frame length p: step size p: step size

48 Critical band integration Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole

49 Bark scale

50 Feature orthogonalization Spectral values in adjacent frequency channels are highly correlated Spectral values in adjacent frequency channels are highly correlated The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix Decorrelation is useful to improve the parameter estimation. Decorrelation is useful to improve the parameter estimation.

51 Cepstrum Computed as the inverse Fourier transform of the log magnitude of the Fourier transform of the signal Computed as the inverse Fourier transform of the log magnitude of the Fourier transform of the signal The log magnitude is real and symmetric -> the transform is equivalent to the Discrete Cosine Transform. The log magnitude is real and symmetric -> the transform is equivalent to the Discrete Cosine Transform. Approximately decorrelated Approximately decorrelated

52 Principal Component Analysis Find an orthogonal basis such that the reconstruction error over the training set is minimized Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes

53 Principal Component Analysis (PCA) Mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) Mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) Find an orthogonal basis such that the reconstruction error over the training set is minimized Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes

54 PCA (Cont.) Algorithm Algorithm Apply Transform Output = (R- dim vectors) Input= (N-dim vectors) Covariance matrix Transform matrix Eigen values Eigen vectors

55 PCA (Cont.) PCA in speech recognition systems PCA in speech recognition systems

56 Linear discriminant Analysis Find an orthogonal basis such that the ratio of the between-class variance and within-class variance is maximized Find an orthogonal basis such that the ratio of the between-class variance and within-class variance is maximized This also turns to be a general eigenvalue- eigenvector problem This also turns to be a general eigenvalue- eigenvector problem Complete decorrelation Complete decorrelation Provide the optimal linear separability under quite restrict assumption Provide the optimal linear separability under quite restrict assumption

57 PCA vs. LDA

58 Spectral smoothing Formant information is crucial for recognition Formant information is crucial for recognition Enhance and preserve the formant information: Enhance and preserve the formant information: Truncating the number of cepstral coefficients Truncating the number of cepstral coefficients Linear prediction: peak-hugging property Linear prediction: peak-hugging property

59 Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: To capture the temporal features of the spectral envelop; to provide the robustness: Delta Feature: first and second order differences; regression Delta Feature: first and second order differences; regression Cepstral Mean Subtraction: Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope For normalizing for channel effects and adjusting for spectral slope

60 RASTA (RelAtive SpecTral Analysis) Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz)

61

62 RASTA-PLP

63

64