Anna Barney, Antonio De Stefano ISVR, University of Southampton, UK & Nathalie Henrich LAM, Université Paris VI, France The Effect of Glottal Opening on.

Slides:



Advertisements
Similar presentations
C. M. Johnson, P. H. Riley and C. R. Saha Thermo-acoustic engine converts thermal energy into sound energy by transferring heat between the working media.
Advertisements

Current techniques for measuring
Acoustic/Prosodic Features
Digital Signal Processing
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
Analysis and Digital Implementation of the Talk Box Effect Yuan Chen Advisor: Professor Paul Cuff.
From Resonance to Vowels March 8, 2013 Friday Frivolity Some project reports to hand back… Mystery spectrogram reading exercise: solved! We need to plan.
Acoustic Characteristics of Vowels
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
ME 322: Instrumentation Lecture 21
8 VOCE VISTA, ELECTROGLOTTOGRAMS, CLOSED QUOTIENTS
Comments, Quiz # 1. So far: Historical overview of speech technology - basic components/goals for systems Quick overview of pattern recognition basics.
ACOUSTICAL THEORY OF SPEECH PRODUCTION
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
It was assumed that the pressureat the lips is zero and the volume velocity source is ideal  no energy loss at the input and output. For radiation impedance:
Influence of Acoustic Loading on the Flow-Induced Oscillations of Single Mass Models of the Human Larynx Matías Zañartu Salas School of Electrical and.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
S1-1 SECTION 1 REVIEW OF FUNDAMENTALS. S1-2 n This section will introduce the basics of Dynamic Analysis by considering a Single Degree of Freedom (SDOF)
Basic Concepts: Physics 1/25/00. Sound Sound= physical energy transmitted through the air Acoustics: Study of the physics of sound Psychoacoustics: Psychological.
Representing Acoustic Information
NONLINEAR SOURCE-FILTER COUPLING IN SPEECH AND SINGING
Source/Filter Theory and Vowels February 4, 2010.
LE 460 L Acoustics and Experimental Phonetics L-13
AEROELASTIC MODELING OF A FLEXIBLE WING FOR WIND TUNNEL FLUTTER TEST WESTIN, Michelle Fernandino; GÓES, Luiz Carlos Sandoval; SILVA, Roberto Gil Annes.
Lecture 1 Signals in the Time and Frequency Domains
Computer Sound Synthesis 2
Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice.
RITEC MEDARS MULTIPLE EMAT DRIVER AND RECEIVER SYSTEM Your Logo Here Drive Phased Arrays of EMATs at high powers.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
Automatic Pitch Tracking January 16, 2013 The Plan for Today One announcement: Starting on Monday of next week, we’ll meet in Craigie Hall D 428 We’ll.
Björkner, Eva Researcher, Doctoral Student Address Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing P.O. Box 3000.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Eva Björkner Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing HUT, Helsinki, Finland KTH – Royal Institute of Technology.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Fundamentals of Audio Production. Chapter 1 1 Fundamentals of Audio Production Chapter One: The Nature of Sound.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Structure of Spoken Language
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
From Resonance to Vowels March 13, 2012 Fun Stuff (= tracheotomy) Peter Ladefoged: “To record the pressure of the air associated with stressed as opposed.
CARE / ELAN / EUROTeV Feedback Loop on a large scale quadrupole prototype Laurent Brunetti* Jacques Lottin**
Longitudinal Motion Characteristics between a Non- Matched Piezoelectric Sensor and Actuator Pair Young-Sup Lee Department of Embedded Systems Engineering,
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
More On Linear Predictive Analysis
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Speech Generation and Perception
Topic: Pitch Extraction
Acoustic Tube Modeling (I) 虞台文. Content Introduction Wave Equations for Lossless Tube Uniform Lossless Tube Lips-Radiation Model Glottis Model One-Tube.
ARENA08 Roma June 2008 Francesco Simeone (Francesco Simeone INFN Roma) Beam-forming and matched filter techniques.
CFD Simulation Investigation of Natural Gas Components through a Drilling Pipe RASEL A SULTAN HOUSSEMEDDINE LEULMI.
AN ANALOG INTEGRATED- CIRCUIT VOCAL TRACT PRESENTED BY: NIEL V JOSEPH S7 AEI ROLL NO-46 GUIDED BY: MR.SANTHOSHKUMAR.S ASST.PROFESSOR E&C DEPARTMENT.
The Human Voice. 1. The vocal organs
CS 591 S1 – Computational Audio -- Spring, 2017
The Human Voice. 1. The vocal organs
Speech Generation and Perception
Speech Perception CS4706.
Linear Predictive Coding Methods
Speech Perception (acoustic cues)
Speech Generation and Perception
Eigenvalues and eigenvectors of the transfer matrix
Speech Processing Final Project
Presentation transcript:

Anna Barney, Antonio De Stefano ISVR, University of Southampton, UK & Nathalie Henrich LAM, Université Paris VI, France The Effect of Glottal Opening on the Acoustic Response of the Vocal Tract

Introduction We are interested in the interaction between the voice source and the vocal tract. We hope that an improved understanding of source-tract interaction will enhance naturalness in synthesised speech

Structure of this talk Types of source-tract interaction Effect of source-tract interaction on formant frequencies: theory Mechanical model Measurement of the effect of source- tract interaction: static Measurement of the effect of source- tract interaction: dynamic Conclusions & Future work

Assumptions of Source- Filter Theory Source and vocal-tract filter do not interact Non-linear effects are normally lumped into the source model Formants are the resonances of the vocal-tract, calculated when the glottal impedance is infinite

Source Tract Interaction (STI) Childers & Wong (1994) define 3 principal types of STI: Loading of the source by the vocal tract impedance Dissipation of vocal tract energy by glottal opening (mainly at F1) Carry over of energy from one glottal period to the next (for low glottal damping) (D.G. Childers and C.-F. Wong, 'Measuring and Modeling Vocal Source-Tract Interaction', IEEE Transactions on Biomedical Engineering, Vol. 41. No. 7. pp (1994) )

Source Tract Interaction (STI) Flanagan (Speech analysis synthesis and perception, 1965) considered the effect of finite glottal impedance on a transmission line model of the vocal tract ZaZa ZaZa ZbZb ZaZa ZaZa ZbZb ZgZg ZlZl supraglottal vocal tract Subglottal vocal tract glottis ZoZo

Source Tract Interaction (STI) Flanagan stated that a finite glottal impedance would raise F1 and increase formant damping He predicted and increase in F1 of 1.4% for a glottal area of 5 mm 2

Source Tract Interaction (STI) Ananthapadmanabha, T.V. & Fant G. (1982) (Calculation of the true glottal volume velocity and its components. Speech Commun. 1 (1982) ). Found the theoretical effect of glottal inertance to be small

Source Tract Interaction (STI) P. Badin and G. Fant, (Notes on Vocal tract computation. STL-QPSR 2-3/1984 (1984) ) Modelled the sub-glottal system as a short circuit used a glottal area of mm 2, glottis modelled by inductance only: F1 increased by 0.2%

Measurements on Real Speech It is known that the formant estimates vary depending on where in a pitch period the estimation window is placed. F1 estimated during open phase using group delay characteristics and a minimum phase assumption are generally a little higher during open phase than during closed phase. (B Yeganarayana, R Veldhuis IEEE trans speech & audio processing, 6(4) 1998) Closed-phase formant analysis is used to get estimates of the vocal tract formants that are reliably decoupled from any sub- glottal formants. (L.C.Wood, D.J.P Pearce IEE Proceedings 136 pt 1 no )

Source Tract Interaction (STI) Shifts in F1 may be small but they may correlate with: – changes in glottal OQ and/or –changes in glottal amplitude And may be of interest when considering voice quality & naturalness of synthesis Also – glottal areas considered in the literature are always at the small end of the range found in normal voicing.

Flanagan’s model We implemented Flanagan’s transmission line model with a uniform duct of length 17.5 cm and area 2.89 cm 2 to explore the change as glottal width increased

The formant shift – theory Frequency (Hz) Log amplitude

Theoretical modelling of the formant shift – static glottis To match our experimental measurements we elaborated on Flanagan’s model We used 4 T-sections for the supra-glottal vocal tract and other parameters to match those of our mechanical model We chose the boundary condition at the lips to match the boundary condition for our measurements

Theoretical modelling of the formant shift –glottal impedance model Flanagan (1965) & others for finite glottal impedance:

Theoretical modelling of– glottal impedance model Laine & Karjalainen (1986): where

Theoretical modelling of the formant shift –glottal impedance model Rösler & Strube (1989) Where

Theoretical modelling of the formant shift –glottal impedance model How should we model the sub- glottal impedance? Speech models often assume that the lower end of the trachea is a fully absorbing boundary (r=0) so that there are no sub- glottal resonances.

Theoretical modelling of the formant shift –glottal impedance model We wanted to compare our theoretical model with measurements. We tried all three glottal impedance models and a range of sub-glottal impedance models to find the best fit to the data.

The Mechanical Model We made our measurements of F1 shift using a mechanical model of the larynx and vocal tract

The mechanical model

Shutter Driver System The shutter region

Schematic Diagram of the Model pt1 pt2 pt flow All dimensions in mm, not to scale

Instrumentation  Rotameter -Inlet volume flow rate  Manometer -Mean pressure upstream  Entran EPE-54 miniature pressure transducers, diameter of 2.36 mm, range 0 to 14kPa -Time-varying pressure at the duct wall for up to 4 locations.  Shutter driving signal - shutter position  All time-histories are captured by a simultaneous-sampling ADC connected to a PC with a sampling frequency of 8928 Hz.

Experimental measurements – static case Glottal widths of 0,1,2,3 mm Excitation provided by speaker at duct outlet – tonal discrete frequencies between 300 Hz and 2 kHz Speaker modified duct boundary condition at “lips” so it was closer to a closed end condition. Impedance here was held constant throughout the measurements

Experimental measurements – static case 2 pressure transducers between “glottis” and “lips” Pressure transducer separation 80 mm Standing wave component pressure amplitudes extracted as specified by K R Holland & POAL Davies (The measurement of sound power flux in flow ducts. Journal of Sound and Vibration 230 (2000) ) Transfer function from “glottis” to “lips” obtained.

Transfer function from glottis to lips – measured & theoretical - static dB

Transfer function from glottis to lips – measured & theoretical - static dB

Transfer function from glottis to lips – measured & theoretical - static dB

Transfer function from glottis to lips – measured & theoretical - static dB

Glottal width Flanagan model, Flanagan factor of 6/5 L & K model R & S model 1 mm MSE mm MSE mm MSE

dB 0 mm 1 mm 2 mm 3 mm

Static case - Summary F1 & F2 increased with increasing glottal width Predicted values of F1 (799 Hz, 854 Hz, 882 Hz, 896 Hz) match well to measurements Increase in F1 between closed glottis and 1 mm wide glottis is ~6% Increase in F1 between closed glottis and 3 mm wide glottis is ~13% Increase in F1 larger than found by previous researchers, perhaps due to using greater glottal widths

Dynamic Experimental measurements How do our measurements for the static case transfer to a model excited by a vibrating larynx? What is the dependence of F1 on the open quotient? What is the dependence of F1 on the glottal amplitude?

Experimental measurements – dynamic Moving shutters 10 – 40 Hz square wave excitation OQ: 20, 40, 60, 80 % Glottal width: 0.25 mm to 4 mm

Peak glottal width versus OQ for all f Open quotient Glottal amplitude

Pressure time history at p1 in the duct Time (s) Pressure (Pa) closure opening

Experimental measurements – dynamic F1 frequency found from AR spectral estimation. AR analysis uses whole glottal cycle to ensure STI effects included in analysis AR analysis uses the Yule-Walker algorithm with an order of ceil((Fs/1000)+2) = 11

Experimental measurements – dynamic F1 peak defined as maximum value of spectrum between 200 Hz and 1 kHz Data set rejected if no peak visible in this range hence small data set for OQ = 80%

AR analysis

Frequency of F1 for changing glottal width and OQ Glottal width (mm) F1 (Hz)

Summary – dynamic measurements F1 increases with increasing glottal width for fixed OQ F1 increases with increasing OQ for fixed glottal width – at least at small glottal widths Observed values of F1 much higher than normally predicted for open-closed tube of the same length or expected for real speech.

Theoretical model – dynamic Simulink model Model adapted from one created by Nicolas Montgermont and Benoit Fabre, LAM for investigating the flute

Duct model Switchable glottal impedance Glottal excitation Simulink model of dynamic case

Pressure time history at P1 - simulated open closed

F1 values for dynamic simulation

Simulation - summary The simulation does show a change in the formant frequency as OQ changes The increase in F1 is much smaller than observed in the dynamic model experiments The dynamic model has much greater damping, especially during closure, than the simulation

Future work To make a theoretical model of the formant shift in the dynamic case that matches the measurements more closely To make similar measurements in real speakers