Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advances in Speech Synthesis

Similar presentations


Presentation on theme: "Advances in Speech Synthesis"— Presentation transcript:

1 Advances in Speech Synthesis
Advances in simulation of sentence-level speech production with kinematic models of the vocal tract and vocal folds ASA Fall 2009 – San Antonio, TX, Brad Story Speech, Language, and Hearing Sciences University of Arizona Research supported by NIH R

2 Goal: Develop a model to facilitate understanding the human sound production system- nonlinear interaction of source and filter (vocal tract) acoustics produced by the vocal folds & vocal tract : -anatomic/physiologic scaling -time-varying changes perceptual response to sounds produced by the system Brad Story, U. of Arizona

3 To some degree, the model should allow access to three levels:
Coordinated movement Speech Production Speech Perception Implication: Model must produce intelligible speech at the syllable, word or sentence level Brad Story, U. of Arizona

4 “Replicate” Natural Speech
Fleshpoint tracking: temporal patterns of vocal tract shape change, constriction locations,… to model parameters Tracking acoustic characteristics: acoustic characteristics to parameters for vocal fold vibration and vocal tract shape. “The black cat” F1 F2 F3 Frequency (Hz) 1000 2000 3000 Time U. Wisconsin X-ray microbeam Brad Story, U. of Arizona

5 Parts of the Model 1. Kinematic vocal fold model: Driven motion of the medial surfaces; glottal flow is interactive with vocal tract pressures 1D tubular waveguide Trachea 2. Vocal Tract Voice source* *Titze, I.R. (2006). The myoelastic aerodynamic theory of phonation, NCVS, pp Brad Story, U. of Arizona

6 Slowly-varying postural component Vibrational displacement
1. Model of Vocal Fold Kinematics Adductory maneuver + vibrational displacement Glottal width Slowly-varying postural component Vibrational displacement L T Medial surfaces of the vocal folds Brad Story, U. of Arizona

7 Slowly-varying postural component Vibrational displacement
1. Model of Vocal Fold Kinematics Adductory maneuver + vibrational displacement Glottal width Slowly-varying postural component Vibrational displacement Glottal area Medial surfaces of the vocal folds Brad Story, U. of Arizona

8 Example: “Typical” vibration
Glottal area Medial surfaces of the vocal folds Brad Story, U. of Arizona

9 Three Categories of Vocal Tract Movements
2. Model of Time-Varying Vocal Tract Shape Three Categories of Vocal Tract Movements Shaping: slowly-varying changes to the shape of the entire vocal tract - vowels Valving: modulate vowels with constrictions - consonants Tuning: modify parts of the vocal tract shape to enhance voice quality or facilitate voice production Gracco, V.L., (1992). Perkell, J. (1969). Ohman, S. E. G. (1966;1967). Brad Story, U. of Arizona

10 TubeTalker* Tier I: Overall vocal tract shaping
Hierarchical control tiers Tier I: Overall vocal tract shaping Composite time-varying vocal tract Tier II: Valving *Story, (2005). JASA, 117, Brad Story, U. of Arizona

11 Tier I (Shaping): Derived from principal component analysis of vocal tract shapes for a collection of vowels from a specific speaker [F1, F2] vowel space +j2 -j2 Average (mean) VT shape W +j1 -j1 Story and Titze, (1998). J. Phonetics; Story, (2005), JASA Brad Story, U. of Arizona

12 Tier I (Shaping): Mapping of [F1,F2] frequencies to model coefficients
“The black cat” Tier I (Shaping): Mapping of [F1,F2] frequencies to model coefficients [q1, q2] coeff space [F1, F2] vowel space Interpolate between trajectories Brad Story, U. of Arizona

13 Vocal tract deformation due to articulation
“The black cat” Tier I: Vowel transitions Overall shape changes Postural component Vocal tract deformation due to articulation Brad Story, U. of Arizona

14 “The black cat” – vowel transitions only
Tier I: modulation of the overall shape of the vocal tract F3 F2 F1 Time-varying formant frequencies (continuous) Brad Story, U. of Arizona

15 Vocal Fold and Respiratory Parameters
“The black cat” Vocal Fold and Respiratory Parameters Fundamental Frequency Separation of the vocal folds at the vocal processes Respiratory pressure Brad Story, U. of Arizona

16 Tier II (consonant valving):
requires specification of constriction location, degree of closure, and temporal characteristics Constriction location/degree Time course of the constriction Brad Story, U. of Arizona

17 Constriction location and timing Movement in the midsagittal plane
“The black cat” Constriction location and timing Movement in the midsagittal plane Time-varying cross-distance U. Wisconsin X-ray microbeam Brad Story, U. of Arizona

18 “The black cat” Tier I - Shaping vowel shapes are imposed on neutral vocal tract Tier II - Valving consonant perturbations are imposed on the vowel substrate Brad Story, U. of Arizona

19 “The black cat” Shaping + Valving =
composite time-varying vocal tract shape Time-varying formant frequencies F3 F2 F1 vowel transitions only vowel transitions + consonant perturbations Brad Story, U. of Arizona

20 Spectrographic Comparison: Simulated/Natural
Brad Story, U. of Arizona

21 Modification: change constriction location
“The black cat” “The black bat” Brad Story, U. of Arizona

22 Modification: change constriction location and open nasal port
“The black cat” “The black gnat” = nasal port open Brad Story, U. of Arizona

23 “black cat” “black bat” “black gnat” Brad Story, U. of Arizona

24 Variations of the phrase
Black cat – vowels Black cat – vowels w/adduct Black cat – normal Black cat – widened epi larynx Black cat – constricted epi larynx Black cat – long duration, tremor Black cat – altered F0 contour Black cat – short VT, high F0 Black cat – shortened VT, increased F0 Black cat – Halloween voice Brad Story, U. of Arizona

25 The End Operation Black Cat Brad Story, U. of Arizona


Download ppt "Advances in Speech Synthesis"

Similar presentations


Ads by Google