Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.

Slides:



Advertisements
Similar presentations
| Page Angelo Farina UNIPR | All Rights Reserved | Confidential Digital sound processing Convolution Digital Filters FFT.
Advertisements

Acoustic/Prosodic Features
Tamara Berg Advanced Multimedia
Digital Signal Processing
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Acoustic Characteristics of Vowels
5/5/20151 Acoustics of Speech Julia Hirschberg CS 4706.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Analog to Digital Converters (ADC) 2 ©Paul Godin Created April 2008.
Presented by- Md. Bashir Uddin Roll: Dept. of BME KUET, Khulna-9203.
SIMS-201 Characteristics of Audio Signals Sampling of Audio Signals Introduction to Audio Information.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Continuous Time Signals A signal represents the evolution of a physical quantity in time. Example: the electric signal out of a microphone. At every time.
School of Computing Science Simon Fraser University
Overview What is in a speech signal?
Analysis & Synthesis The Vocoder and its related technology.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
 Principles of Digital Audio. Analog Audio  3 Characteristics of analog audio signals: 1. Continuous signal – single repetitive waveform 2. Infinite.
Representing Acoustic Information
CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.
Source/Filter Theory and Vowels February 4, 2010.
DSP for Dummies aka How to turn this (actual raw sonar trace) Into this.. (filtered sonar data)
Digital audio. In digital audio, the purpose of binary numbers is to express the values of samples that represent analog sound. (contrasted to MIDI binary.
ORE 654 Applications of Ocean Acoustics Lecture 6a Signal processing
LE 460 L Acoustics and Experimental Phonetics L-13
Digital Audio What do we mean by “digital”? How do we produce, process, and playback? Why is physics important? What are the limitations and possibilities?
Vibrationdata 1 Unit 5 The Fourier Transform. Vibrationdata 2 Courtesy of Professor Alan M. Nathan, University of Illinois at Urbana-Champaign.
Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.
Digital Sound and Video Chapter 10, Exploring the Digital Domain.
Sampling Terminology f 0 is the fundamental frequency (Hz) of the signal –Speech: f 0 = vocal cord vibration frequency (>=80Hz) –Speech signals contain.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Lecture 1 Signals in the Time and Frequency Domains
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
Dual-Channel FFT Analysis: A Presentation Prepared for Syn-Aud-Con: Test and Measurement Seminars Louisville, KY Aug , 2002.
Wireless and Mobile Computing Transmission Fundamentals Lecture 2.
Speech analysis with Praat Paul Trilsbeek DoBeS training course June 2007.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Vibrationdata 1 Unit 5 The Fourier Transform. Vibrationdata 2 Courtesy of Professor Alan M. Nathan, University of Illinois at Urbana-Champaign.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Pre-Class Music Paul Lansky Six Fantasies on a Poem by Thomas Campion.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
(Extremely) Simplified Model of Speech Production
Vibrationdata 1 Unit 6a The Fourier Transform. Vibrationdata 2 Courtesy of Professor Alan M. Nathan, University of Illinois at Urbana-Champaign.
◦ We sometimes need to digitize an analog signal ◦ To send human voice over a long distance, we need to digitize it, since digital signals are less prone.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Acoustic Phonetics 3/14/00.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Introduction to Data Conversion EE174 – SJSU Tan Nguyen.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Fourier Analysis Patrice Koehl Department of Biological Sciences National University of Singapore
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
Vocoders.
Unit 5 The Fourier Transform.
Multimedia Systems and Applications
Fourier Analyses Time series Sampling interval Total period
Time domain & frequency domain
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Acoustics of Speech Julia Hirschberg CS /7/2018.
Analyzing the Speech Signal
Analyzing the Speech Signal
Fourier Analyses Time series Sampling interval Total period
COMS 161 Introduction to Computing
Analog to Digital Encoding
Lec.6:Discrete Fourier Transform and Signal Spectrum
ELEN E4810: Digital Signal Processing Topic 11: Continuous Signals
Geol 491: Spectral Analysis
Presentation transcript:

Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301

Acoustic Analysis Instrumental acoustical analyses have been used for over 100 years Analog techniques dominated the first 60 of these years More recently, digital techniques have dominated the field We will begin by introducing a few of the important analog methods, then turn to the digital Instrumental acoustical analyses have been used for over 100 years Analog techniques dominated the first 60 of these years More recently, digital techniques have dominated the field We will begin by introducing a few of the important analog methods, then turn to the digital

Oscillograph/Oscillogram Any device that can display a waveform is an oscillograph The output (display or hardcopy) is an oscillogram There is limited information available in a waveform silence burst noise periodicity Any device that can display a waveform is an oscillograph The output (display or hardcopy) is an oscillogram There is limited information available in a waveform silence burst noise periodicity

Filter Bank Analysis In this procedure, a filter bank or a single filter is used to divide the signal energy into frequency bands The output energy is displayed for each band This is a form of spectral analysis The output typically is displayed in the form of an histogram The technique is very common in audiology and hearing applications In this procedure, a filter bank or a single filter is used to divide the signal energy into frequency bands The output energy is displayed for each band This is a form of spectral analysis The output typically is displayed in the form of an histogram The technique is very common in audiology and hearing applications

Sound Spectrograph/Spectrogram The instrument is called a spectrograph The output (usually a hardcopy) is a spectrogram This is the most commonly used device in speech research The spectrograph can capture the dynamics of speech Acoustic signals vary only in frequency, amplitude and time The sound spectrograph captures all of these The instrument is called a spectrograph The output (usually a hardcopy) is a spectrogram This is the most commonly used device in speech research The spectrograph can capture the dynamics of speech Acoustic signals vary only in frequency, amplitude and time The sound spectrograph captures all of these

Sound Spectrogram Abscissa is time Ordinate is frequency Intensity is shown as shades of gray Black areas indicate the highest amplitudes White areas indicate the noise floor Amplitudes between these extremes are shown in varying shades of grey the more intense the signal is at a particular frequency and time, the darker the trace Abscissa is time Ordinate is frequency Intensity is shown as shades of gray Black areas indicate the highest amplitudes White areas indicate the noise floor Amplitudes between these extremes are shown in varying shades of grey the more intense the signal is at a particular frequency and time, the darker the trace

Digital Signal Processing (1) In the late 1960’s general purpose digital computers made it possible to analyze acoustic signals on the computer These techniques are necissarily discrete as well as digital Once in discrete form, the signal can be stored conveniently and analyzed in many way that were not possible with analog techniques In the late 1960’s general purpose digital computers made it possible to analyze acoustic signals on the computer These techniques are necissarily discrete as well as digital Once in discrete form, the signal can be stored conveniently and analyzed in many way that were not possible with analog techniques

Digital Signal Processing (2) Presampling or brickwall filtering Nyquist Theorum In order to represent a signal faithfully, it must be sampled at a rate equal to twice its highest frequency The brickwall filter removes all of the energy above the Nyquist frequency The clinician/researcher determines the Nyquist frequency Some knowledge of speech and speech and language disorders is required Presampling or brickwall filtering Nyquist Theorum In order to represent a signal faithfully, it must be sampled at a rate equal to twice its highest frequency The brickwall filter removes all of the energy above the Nyquist frequency The clinician/researcher determines the Nyquist frequency Some knowledge of speech and speech and language disorders is required

Digital Signal Processing (3) Sampling Analog-to-digital conversion Signal must be sampled at the Nyquist rate Sampling decides the times at which the signal will be Sampling converts the acoustic signal into a series of numbers Instead of amplitudes at all instances of time, no matter how small the time interval, amplitudes in the digital world exist only at the sampling interval Aliasing Sampling Analog-to-digital conversion Signal must be sampled at the Nyquist rate Sampling decides the times at which the signal will be Sampling converts the acoustic signal into a series of numbers Instead of amplitudes at all instances of time, no matter how small the time interval, amplitudes in the digital world exist only at the sampling interval Aliasing

Digital Signal Processing (4) Quantization Discrete number of amplitude levels The more quantizer levels available, the more the discrete signal represents the original analog signal In our applications, 16 -bit quantizers over a 20-volt range are typical This yields an amplitude resolution of 300 μvolts and a signal to noise ratio of 96 dB Quantization Discrete number of amplitude levels The more quantizer levels available, the more the discrete signal represents the original analog signal In our applications, 16 -bit quantizers over a 20-volt range are typical This yields an amplitude resolution of 300 μvolts and a signal to noise ratio of 96 dB

Digital Signal Processing (5) After A/D conversion the signal is stored as a stream of numbers time is related by the index to the sampling rate the amplitude is the stored number in this form, many operations can be performed After A/D conversion the signal is stored as a stream of numbers time is related by the index to the sampling rate the amplitude is the stored number in this form, many operations can be performed

Waveform Display Duration measurements speech changes gradually some consistent rules need to be adopted Signal editing again, some consistent rules need to be adopted Amplitude measurements rms is the most common vocal fundamental frequency Duration measurements speech changes gradually some consistent rules need to be adopted Signal editing again, some consistent rules need to be adopted Amplitude measurements rms is the most common vocal fundamental frequency

Digital Spectrum Analysis The Fourier Transform revisited (FFT) Periodic waveforms can be thought of as a series of sinusoids amplitude and phase The Fourier Transform and the Inverse Fourier transform allow powerful analysis-by-synthesis techniques The Fourier Transform revisited (FFT) Periodic waveforms can be thought of as a series of sinusoids amplitude and phase The Fourier Transform and the Inverse Fourier transform allow powerful analysis-by-synthesis techniques

Digital Spectrograph This is a series of spectra based on the FFT or LPC (see below) The amplitude is depicted as shades of gray PRAAT is an example of a digital spectrograph Speech Filing System, Speech Station 2, Wavesurfer, and many other free or commercially spectrographs are available This is a series of spectra based on the FFT or LPC (see below) The amplitude is depicted as shades of gray PRAAT is an example of a digital spectrograph Speech Filing System, Speech Station 2, Wavesurfer, and many other free or commercially spectrographs are available

Linear Predictive Coding (1) Speech is highly predictable over the short term It is not hard to predict the amplitude of the next time sample of the speech waveform from a knowledge of the previous amplitudes As few as 10 to 15 previous samples is all that is required Speech is highly predictable over the short term It is not hard to predict the amplitude of the next time sample of the speech waveform from a knowledge of the previous amplitudes As few as 10 to 15 previous samples is all that is required

LPC (2) From statistics, we know that: y= a0+a1(x-1)+a2(x-2)+...+an(x-n) where y is the amplitude of the next sample and x is one of the previous samples This is linear prediction From statistics, we know that: y= a0+a1(x-1)+a2(x-2)+...+an(x-n) where y is the amplitude of the next sample and x is one of the previous samples This is linear prediction

LPC (3) Linear Predictive Coding (LPC) is one of the most powerful techniques in speech analysis The a’s in the previous equation can be used as estimates of the resonances of the vocal tract. They can represent sections of the vocal tract Linear Predictive Coding (LPC) is one of the most powerful techniques in speech analysis The a’s in the previous equation can be used as estimates of the resonances of the vocal tract. They can represent sections of the vocal tract

Wideband versus Narrowband Spectrograms Wideband (0.005, 0.007, 0.009) Short time window Good for measuring formant frequencies Narrowband (0.1, 0.05) Long time window Good for showing and measuring harmonics Wideband (0.005, 0.007, 0.009) Short time window Good for measuring formant frequencies Narrowband (0.1, 0.05) Long time window Good for showing and measuring harmonics