Digital Systems: Hardware Organization and Design

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Response to a Sinusoidal Input Frequency Analysis of an RC Circuit.
Advertisements

Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
A PRESENTATION BY SHAMALEE DESHPANDE
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
Representing Acoustic Information
Speech Signal Processing I Edmilson Morais and Prof. Greg. Dogil October, 25, 2001.
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
LE 460 L Acoustics and Experimental Phonetics L-13
1 Chapter 8 The Discrete Fourier Transform 2 Introduction  In Chapters 2 and 3 we discussed the representation of sequences and LTI systems in terms.
UNIT - 4 ANALYSIS OF DISCRETE TIME SIGNALS. Sampling Frequency Harry Nyquist, working at Bell Labs developed what has become known as the Nyquist Sampling.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Module 2 SPECTRAL ANALYSIS OF COMMUNICATION SIGNAL.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Basics of Neural Networks Neural Network Topologies.
1 Fourier Representation of Signals and LTI Systems. CHAPTER 3 School of Computer and Communication Engineering, UniMAP Hasliza A Samsuddin EKT.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Course Outline (Tentative) Fundamental Concepts of Signals and Systems Signals Systems Linear Time-Invariant (LTI) Systems Convolution integral and sum.
1 Lecture 1: February 20, 2007 Topic: 1. Discrete-Time Signals and Systems.
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Ch4 Short-time Fourier Analysis of Speech Signal z Fourier analysis is the spectrum analysis. It is an important method to analyze the speech signal. Short-time.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
More On Linear Predictive Analysis
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Chapter 2. Signals and Linear Systems
7.0 Speech Signals and Front-end Processing References: , 3.4 of Becchetti of Huang.
1 Chapter 8 The Discrete Fourier Transform (cont.)
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Chapter 4 Discrete-Time Signals and transform
PATTERN COMPARISON TECHNIQUES
CHAPTER 5 Z-Transform. EKT 230.
Chapter 5 Homomorphic Processing(1)
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
3.1 Introduction Why do we need also a frequency domain analysis (also we need time domain convolution):- 1) Sinusoidal and exponential signals occur.
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Recap: Chapters 1-7: Signals and Systems
Linear Prediction.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Digital Systems: Hardware Organization and Design Speaker Recognition
Digital Systems: Hardware Organization and Design
Microcomputer Systems 2
The Vocoder and its related technology
Z TRANSFORM AND DFT Z Transform
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Finite Wordlength Effects
DEPARTMENT OF INFORMATION TECHNOLOGY DIGITAL SIGNAL PROCESSING UNIT 4
Digital Systems: Hardware Organization and Design
Linear Prediction.
Mark Hasegawa-Johnson 10/2/2018
Chapter 8 The Discrete Fourier Transform
Homomorphic Speech Processing
DEPARTMENT OF INFORMATION TECHNOLOGY DIGITAL SIGNAL PROCESSING UNIT 4
Speech Processing Final Project
Speech Signal Representations
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Digital Systems: Hardware Organization and Design 2/22/2019 Speech Recognition Speech Signal Representations Architecture of a Respresentative 32 Bit Processor

Speech Signal Representations Digital Systems: Hardware Organization and Design 2/22/2019 Speech Signal Representations Fourier Analysis Discrete-time Fourier transform Short-time Fourier transform Discrete Fourier transform Cepstral Analysis The complex cepstrum and the cepstrum Computational considerations Cepstral analysis of speech Applications to speech recognition Mel-Frequency cepstral representation Performance Comparison of Various Representations 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Discrete-Time Fourier Transform Digital Systems: Hardware Organization and Design 2/22/2019 Discrete-Time Fourier Transform Definition: Sufficient condition for convergence: Although x[n] is discrete, X(ej) is continuous and periodic with period 2ƒ. 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Discrete-Time Fourier Transform Digital Systems: Hardware Organization and Design 2/22/2019 Discrete-Time Fourier Transform Convolution/multiplication duality: 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Short-Time Fourier Analysis (Time-Dependent Fourier Transform) Digital Systems: Hardware Organization and Design 2/22/2019 Short-Time Fourier Analysis (Time-Dependent Fourier Transform) 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 2/22/2019 Rectangular Window 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 2/22/2019 Hamming Window 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 2/22/2019 Comparison of Windows 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Comparison of Windows (cont’d) Digital Systems: Hardware Organization and Design 2/22/2019 Comparison of Windows (cont’d) 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

A Wideband Spectrogram Digital Systems: Hardware Organization and Design 2/22/2019 A Wideband Spectrogram 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

A Narrowband Spectrogram Digital Systems: Hardware Organization and Design 2/22/2019 A Narrowband Spectrogram 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Discrete Fourier Transform Digital Systems: Hardware Organization and Design 2/22/2019 Discrete Fourier Transform In general, the number of input points, N, and the number of frequency samples, M, need not be the same. If M>N , we must zero-pad the signal If M<N , we must time-alias the signal 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Examples of Various Spectral Representations Digital Systems: Hardware Organization and Design 2/22/2019 Examples of Various Spectral Representations 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Cepstral Analysis of Speech Digital Systems: Hardware Organization and Design 2/22/2019 Cepstral Analysis of Speech The speech signal is often assumed to be the output of an LTI system; i.e., it is the convolution of the input and the impulse response. If we are interested in characterizing the signal in terms of the parameters of such a model, we must go through the process of de-convolution. Cepstral, analysis is a common procedure used for such de-convolution. 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 2/22/2019 Cepstral Analysis Cepstral analysis for convolution is based on the observation that: x[n]= x1[n] * x2[n] ⇒ X (z)= X1(z)X2(z) By taking the complex logarithm of X(z), then log{X (z)} =log{X1(z)} + log{X2(z)} = If the complex logarithm is unique, and if is a valid z-transform, then The two convolved signals will be additive in this new, cepstral domain. If we restrict ourselves to the unit circle, z = ej, then: It can be shown that one approach to dealing with the problem of uniqueness is to require that arg{X(ejω)} be a continuous, odd, periodic function of ω. 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Cepstral Analysis (cont’d) Digital Systems: Hardware Organization and Design 2/22/2019 Cepstral Analysis (cont’d) ^ To the extent that X(z)=log{X(z)} is valid, It can easily be shown that c[n] is the even part of x[n]. If x[n] is real and causal then x[n], be recovered from c[n]. This is known as the Minimum Phase condition. ^ ^ ^ 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 2/22/2019 An Example 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 2/22/2019 An Example (cont’d) 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Computational Considerations Digital Systems: Hardware Organization and Design 2/22/2019 Computational Considerations We now replace the Fourier transform expressions by the discrete Fourier transform expressions is a sampled version of . Therefore, Likewise, where 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Computational Considerations (cont.) Digital Systems: Hardware Organization and Design 2/22/2019 Computational Considerations (cont.) To minimize aliasing, N must be large 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Cepstral Analysis of Speech Digital Systems: Hardware Organization and Design 2/22/2019 Cepstral Analysis of Speech For voiced speech: For unvoiced speech: s[n]=w[n]*v[n]*r[n]= w[n]* hu[n]. Contributions to the cepstrum due to periodic excitation will occur at integer multiples of the fundamental period. Contributions due to the glottal waveform (for voiced speech), vocal tract, and radiation will be concentrated in the low quefrency region, and will decay rapidly with n. Deconvolution can be achieved by multiplying the cepstrum with an appropriate window, l[n]. 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Cepstral Analysis of Speech Digital Systems: Hardware Organization and Design 2/22/2019 Cepstral Analysis of Speech Where D* is the characteristic system that converts convolution into addition. Thus cepstral analysis can be used for pitch extraction and formant tracking. 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example of Cepstral Analysis of Vowel (Rectangular Window) Digital Systems: Hardware Organization and Design 2/22/2019 Example of Cepstral Analysis of Vowel (Rectangular Window) 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example of Cepstral Analysis of Vowel (Tapering Window) Digital Systems: Hardware Organization and Design 2/22/2019 Example of Cepstral Analysis of Vowel (Tapering Window) 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example of Cepstral Analysis of Fricative (Rectangular Window) Digital Systems: Hardware Organization and Design 2/22/2019 Example of Cepstral Analysis of Fricative (Rectangular Window) 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Example of Cepstral Analysis of Fricative (Tapering Window) Digital Systems: Hardware Organization and Design 2/22/2019 Example of Cepstral Analysis of Fricative (Tapering Window) 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

The Use of Cepstrum for Speech Recognition Digital Systems: Hardware Organization and Design 2/22/2019 The Use of Cepstrum for Speech Recognition Many current speech recognition systems represent the speech signal as a set of cepstral coefficients, computed at a fixed frame rate. In addition, the time derivatives of the cepstral coefficients have also been used. 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Statistical Properties of Cepstral Coefficients (Tohkura, 1987) Digital Systems: Hardware Organization and Design 2/22/2019 Statistical Properties of Cepstral Coefficients (Tohkura, 1987) From a digit database (100 speakers) over dial-up telephone lines. 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Mel-Frequency Cepstral Representation (Mermelstein & Davis 1980) Digital Systems: Hardware Organization and Design 2/22/2019 Mel-Frequency Cepstral Representation (Mermelstein & Davis 1980) Some recognition systems use Mel-scale cepstral coefficients to mimic auditory processing. (Mel frequency scale is linear up to 100 Hz and logarithmic thereafter.) This is done by multiplying the magnitude (or log magnitude) of S(ej) with a set of filter weights as shown below: 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Typical MFCC Based System Digital Systems: Hardware Organization and Design 2/22/2019 Typical MFCC Based System Front-End Processing of a Speech Recognizer 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 2/22/2019 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Signal Representation Comparisons Digital Systems: Hardware Organization and Design 2/22/2019 Signal Representation Comparisons Many researchers have compared cepstral representations with Fourier-, LPC-, and auditory-based representations. Cepstral representation typically out-performs Fourier-and LPC-based representations. Example: Classification of 16 vowels using ANN (Meng, 1991) 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Signal Representation Comparisons (cont.) Digital Systems: Hardware Organization and Design 2/22/2019 Signal Representation Comparisons (cont.) Performance of various signal representations cannot be compared without considering how the features will be used, i.e., the pattern classiffication techniques used. (Leung, et al., 1993). 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 2/22/2019 Things to Ponder... Are there other spectral representations that we should consider (e.g., models of the human auditory system)? What about representing the speech signal in terms of phonetically motivated attributes (e.g., formants, durations, fundamental frequency contours)? How do we make use of these (sometimes heterogeneous) features for recognition (i.e., what are the appropriate methods for modeling them)? 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design 2/22/2019 References Tohkura, Y., “A Weighted Cepstral Distance Measure for Speech Recognition," IEEE Trans. ASSP, Vol. ASSP-35, No. 10, 1414-1422, 1987. Mermelstein, P. and Davis, S., “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Trans. ASSP, Vol. ASSP-28, No. 4, 357-366, 1980. Meng, H., The Use of Distinctive Features for Automatic Speech Recognition,SM Thesis, MIT EECS, 1991. Leung, H., Chigier, B., and Glass, J., “A Comparative Study of Signal Represention and Classi.cation Techniques for Speech Recognition," Proc. ICASSP,Vol.II, 680-683, 1993. 22 February 2019 Veton Këpuska Architecture of a Respresentative 32 Bit Processor