93 SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES VOICE SPECTRUM SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES.

Slides:

Advertisements

Similar presentations

UNLOCKING THE SECRETS Patterns in Data. Techniques Plotting Log-log, semi-log, scaling Statistics Frequency domain (fft) xmgrace xy plotting tool Derivatives.

Advertisements

Acoustic/Prosodic Features

Figures for Chapter 7 Advanced signal processing Dillon (2001) Hearing Aids.

Acoustic Characteristics of Consonants

Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.

ACHIZITIA IN TIMP REAL A SEMNALELOR. Three frames of a sampled time domain signal. The Fast Fourier Transform (FFT) is the heart of the real-time spectrum.

Fundamentals of Data & Signals (Part II) School of Business Eastern Illinois University © Abdou Illia, Spring 2015 (February18, 2015)

SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.

Lock-in amplifiers Signals and noise Frequency dependence of noise Low frequency ~ 1 / f –example: temperature (0.1 Hz), pressure.

The frequency spectrum

Abstract Binaural microphones were utilised to detect phonation in a human subject (figure 1). This detection was used to cut the audio waveform in two.

Filtering Filtering is one of the most widely used complex signal processing operations The system implementing this operation is called a filter A filter.

Chapter 7 Principles of Analog Synthesis and Voltage Control Contents Understanding Musical Sound Electronic Sound Generation Voltage Control Fundamentals.

CHAPTER 4 DIGITAL MODULATION Part 1.

Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.

ACOUSTICAL THEORY OF SPEECH PRODUCTION

The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.

Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.

Vowels Vowels: Articulatory Description (Ferrand, 2001) Tongue Position.

Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.

3.1 Chapter 3 Data and Signals Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.

Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.

Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,

A PRESENTATION BY SHAMALEE DESHPANDE

Magnitude and Phase Measurements

Page 1 Time_vs_Freq.ppt 9/21/99 Time vs. Frequency September 21, 1999 Ron Denton.

Physics 434 Module 3 - T. Burnett 1 Physics 434 Module 3 Acoustic excitation of a physical system.

Basics of Signal Processing. frequency = 1/T  speed of sound × T, where T is a period sine wave period (frequency) amplitude phase.

Lock-in amplifiers

Fundamentals of...Fundamentals of... Fundamentals of Digital Audio.

Representing Acoustic Information

Source/Filter Theory and Vowels February 4, 2010.

DSP. What is DSP? DSP: Digital Signal Processing---Using a digital process (e.g., a program running on a microprocessor) to modify a digital representation.

Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.

DIGITAL VOICE NETWORKS ECE 421E Tuesday, October 02, 2012.

GG 313 Lecture 26 11/29/05 Sampling Theorem Transfer Functions.

Fourier Concepts ES3 © 2001 KEDMI Scientific Computing. All Rights Reserved. Square wave example: V(t)= 4/  sin(t) + 4/3  sin(3t) + 4/5  sin(5t) +

MUSIC 318 MINI-COURSE ON SPEECH AND SINGING

Dual-Channel FFT Analysis: A Presentation Prepared for Syn-Aud-Con: Test and Measurement Seminars Louisville, KY Aug , 2002.

Acoustic Analysis of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.

ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.

1 Analysis and Synthesis of Pathological Vowels Prospectus Brian C. Gabelman 6/13/2003.

Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.

Introduction to SOUND.

Vibrationdata 1 Unit 9 White Noise FFT. Vibrationdata 2 Fourier Transform, Sine Function A Fourier transform will give the exact magnitude and frequency.

ELECTRONIC COMMUNICATIONS A SYSTEMS APPROACH CHAPTER Copyright © 2014 by Pearson Education, Inc. All Rights Reserved Electronic Communications: A Systems.

Developing a model to explain and stimulate the perception of sounds in three dimensions David Kraljevich and Chris Dove.

Structure of Spoken Language

Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp

Lecture 2: Measurement and Instrumentation. Time vs. Frequency Domain Different ways of looking at a problem –Interchangeable: no information is lost.

P105 Lecture #27 visuals 20 March 2013.

Acoustic Phonetics 3/14/00.

FREQUENCY (HZ) LOG10 MAGNITUDE Figure 2.30a. PSD of original voice

Physics 434 Module 4 - T. Burnett 1 Physics 434 Module 4 Acoustic excitation of a physical system: time domain.

SOUNDS RECORDING AND REPRODUCTION The Volume of the Wave n The Amplitude is a measure of volume n The wave pink is softer than the blue wave. n It represents.

Copyright  Muhammad A M Islam. SBE202A Introduction to Electronics 1 10/2/2016 Introduction to Electronics.

P105 Lecture #26 visuals 18 March 2013.

Lock-in amplifiers

Fundamentals of Data & Signals (Part II)

ESTIMATED INVERSE SYSTEM

Naoki Watanabe et al. BTS 2017;2:

The Production of Speech

Presentation transcript:

93 SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES VOICE SPECTRUM SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES VOICE SPECTRUM SOURCE TIME SERIES VOCAL TRACT TRANSFER FUNCTION VOICE TIME SERIES VOICE SPECTRUM CASE 0: NORMAL SOURCE AND NORMAL VOCAL TRACT CASE 1: “BREATHY” SOURCE AND NORMAL VOCAL TRACT CASE 2: NORMAL SOURCE AND ABNORMAL VOCAL TRACT Figure 3.1. Ambiguity between the source and vocal tract models is illustrated with three examples. In case 0, a normal glottal source and vocal tract give rise to a normal voice; the normal LF glottal flow derivative waveform, normal vocal tract log spectrum transfer function magnitude, and normal voice time series and spectrum for /a/ are plotted. Case 1 shows a pathological glottal source missing the normal sharp return; it is convolved with a normal vocal tract model to give rise to a pathological /a/ with sinusoidal appearance and missing high frequency formants. This voice is perceived as “breathy.” Case 2 combines the normal glottal source with a vocal tract model with greatly attenuated high frequency formants; the resulting voice time series and spectrum is exactly the same as case 1. Thus, working backwards from the resulting time series of cases 1 and 2 (as is attempted in inverse filtering and formant analysis), it is impossible to arrive at the correct vocal tract and source model without additional assumptions or information.

94 PC #1: STIMULUS GEN. STIMULUS SIGNAL D/A AMPLIFIER XDUCER /a/ ACOUSTIC CONDUIT PC #2: DAQ SIGNAL CONDITIONER A/D MEM. MICROPHONE CH 1 CH 2 VOCAL TRACT TUBE OF LENGTH L WITH CLOSED END. FORMANTS AT F = C/4L, 3C/4L, 5C/4L,... Figure 3.2. System setup for external stimulation of the vocal tract. Two PCs are used: PC #1 generates a stimulus signal to excite the vocal tract. An audio amplifier and transducer transform the stimulus to sound, which is then conducted via an acoustic conduit to either a quarter wave tube model or the subject’s vocal tract. PC #2 is used to record the stimulus and response signals. A sample of the stimulus is acquired on channel 1 of an analog to digital converter and is stored to memory. The response is recorded with a small microphone, conditioned and acquired on channel 2 of the analog to digital converter. The tube model is used for proof of principle and system validation. SIGNAL CONDITIONER

95 EXTERNAL STIMULUS AMP & XDCR DYNAMICS H1(s) ROOM/FACE ACOUSTICS H2(s) A/D CH 1 GLOTTAL SOURCE S0 S1 + + VOCAL TRACT V(s) MIC & SIG COND. H3(s) S2 A/D CH 2 x(t) r(t) g(t) Figure 3.3. Model of external source formant analysis system. In addition to the normal glottal source g(t) and vocal tract V(s) pathway, an external source x(t) is added to stimulate the vocal tract. Stimulus electronics, transducer, and environmental acoustics are modeled in H1 and H2; microphone and signal conditioning are modeled in H3. Switches S0 and S2 represent presence/absence of x(t) and g(t); S1 represents presence/absence of the vocal tract (mouth open or shut).

x SPECTRUM WITH RAG, W/O RAG, & MAG OF T.F. FREQUENCY (Hz) MAGNITUDE (dB) |HB(f)| |HA(f)| |V(f)| TUBE FORMANTS Figure 3.4. Validation of external source testing setup using a simple quarter wave tube closed at one end. HA is the spectrum recorded by the microphone near the open end of the tube when a 0 – 10KHz stimulus sinewave sweep is executed. HB is the “background” spectrum recorded when the tube is stuffed with a rag, effectively eliminating the tube formants. V is HA – HB, revealing the expected quarter wave tube formants. All spectra become noise above 10KHz at the end of the sweep.

FREQUENCY (Hz) MAGNITUDE (dB) VOCAL TRACT FREQ. RESP. MOUTH SHUT FREQ. RESP. MOUTH OPEN FREQ. RESP. Figure 3.5. Vocal tract transfer function for /a/ with a normal male voice. Analogous to Fig 3.4, the chirp response spectra of mouth open/shut (top curves) are subtracted to yield the vocal tract frequency response (bottom). The expected formants for /a/ are present, plus additional peaks. Ripples at 94 Hz spacing in the top curves due to acoustic conduit resonances are effectively cancelled by the subtraction process.

COMPARE TF USING MOUTH OPEN SWEEP1 VS SWEEP2 FREQUENCY (Hz) MAGNITUDE (dB) F1F2F3F4 Figure 3.6. Drift in formant peaks during testing. In order to estimate effects of articulator movements, the vocal tract calculation is repeated using different chirps from the same time series. In this case, the peaks remained fairly constant with about 100 Hz change at 3200 Hz ( 3% ).

FREQUENCY (Hz) MAGNITUDE (dB) F1 F3 F4 LP FFT EXT SRC T.F.’S F2 Figure 3.7. Comparison of traditional FFT/LP analysis with the ES analysis of Fig Formant peak shifts between FFT/LP and ES resonances are probably due to involuntary articulator movement from the vocalizing to non-vocalizing transition. Many resonances are shown in the ES data that are not revealed by FFT/LP.

TIME (SEC) SIGNALS MIC SIGNAL (TOP), EXT. SRC (BOT) EXTERNAL SOURCE MOUTH OPEN MIC SIGNAL MOUTH SHUT VOCALIZING /a/ PULSE+CHIRP Figure 3.8. Time history of vocal tract audio output (top) and the ES (external source) stimulus (bottom). Periodic pulse + sine chirps are interspersed with periods of silence, during which vocalization (alone) can be recorded (0.3 s to 0.8 s). The resonances of the vocal tract are clearly visible in the mouth open responses at 1.7 s and 2.6 s. Note that the first chirp occurs during vocalizing; this was done to allow collection of data for future analysis using correlation techniques to separate the responses to chirp and glottis.

F1F2 F3F4 LP FFT EXT SRC T.F.’S Figure 3.9. FFT/LP/ES analysis of another instance of normal male /a/. Compare with Fig 3.9b which is a breathy /a/ from the same time series. MAGNITUDE (dB) FREQUENCY (Hz)

FREQUENCY (Hz) MAGNITUDE (dB) F1F2 F3F4 LP FFT EXT SRC T.F.’S Figure 3.9b. FFT/LP/ES analysis of a breathy /a/ from the same time series as Fig. 3.9.

x SAMPLE # FSR=44100 NORMALIZED SIGNAL EXTERNAL SOURCE MOUTH OPEN MIC SIGNAL MOUTH SHUT BREATHY /a/ PULSE+ CHIRP MODAL /a/ Figure Time history of vocal tract output and ES (analogous to Fig. 3.8) with an added breathy segment at the beginning. The voice is the author’s.

FREQUENCY (Hz) MAGNITUDE (dB) F1F2 F3 F4 LP FFT EXT SRC T.F. Figure FFT/LP/ES analysis of a normal female /a/. Compare to Fig 3.11b which contains a breathy /a/ from the same time series.

FREQUENCY (Hz) MAGNITUDE (dB) F1F2 F4 LP FFT EXT SRC T.F. F4 Figure 3.11b. FFT/LP/ES analysis of a female breathy /a/. Compare to Fig 3.11 which contains a normal /a/ from the same time series.