Presentation is loading. Please wait.

Presentation is loading. Please wait.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

Similar presentations


Presentation on theme: "CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan."— Presentation transcript:

1 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu Center for Spoken Language Understanding OGI School of Science & Technology at OHSU

2 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 2 OVERVIEW 1.IMPORTANCE OF SPECTRAL BALANCE 2.MEASUREMENT OF SPECTRAL BALANCE 3.ANALYSIS METHODS 4.RESULTS 5.SYNTHESIS 6.CONCLUSIONS

3 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 3 1. IMPORTANCE OF SPECTRAL BALANCE Linguistic Control Factors –Stress-like factors –Positional factors –Phonemic factors Acoustic Correlates –Traditionally TTS-controlled: Pitch, timing, amplitude –Demonstrated in natural speech, but usually not TTS-controlled: Spectral tilt, balance Formant dynamics …

4 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 4 2. MEASUREMENT OF SPECTRAL BALANCE Data: –472 greedily selected sentences Genre: newspaper Greedy features: linguistic control factors –One female speaker –Manual segmentation –Accent: independent rating by 3 judges 0-3 score

5 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 5 2. MEASUREMENT OF SPECTRAL BALANCE Energy in 5 formant-range frequency bands –B 0 :100-300 Hz [~F0] –B 1 :300-800 Hz [~F1] –B 2 :800-2500 Hz [~F2] –B 3 :2500-3500 Hz [~F3] –B 4 :3500- max Hz [~fricative noise] In other words, multidimensional measure Filter bank  Square   Average [1 ms rect.]  20 log 10 (B i ) Subtract estimated per-utterance means

6 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 6 2. MEASUREMENT OF SPECTRAL BALANCE Details: –Confounding with F 0 Measure pitch-corrected and raw –For certain wave shapes, pitch directly related to fixed-frame energy –Why do both: wave shapes may change in unknown ways F 0 not confined to B 0 [female speech] –Vowel formants not quite confined to bands [e.g., F 1 for /EE/ and F 3 for /ER/]

7 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 7 2. MEASUREMENT OF SPECTRAL BALANCE Why not more or different bands? –Multiple interacting Linguistic Control Factors Need measurements that minimize interactions –5 bands  Different vowels “behave similarly” Can model vowels as a class Why not simply spectral tilt? –5 bands more information than single measure –Supply more information for synthesis

8 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 8 3. ANALYSIS METHODS Measures likely to behave like segmental duration: –Multiple interacting, confounded factors: Interaction: Magnitude of effects on one factor may depend on other factors Confounding: Unequal frequencies of control factor combinations –“Directional Invariance” Direction of effects on one factor independent of other factors

9 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 9 3. ANALYSIS METHODS Need method that –can handle multiple interacting, confounded factors and –takes advantage of Directional Invariance: Used: Sums of Products Model:

10 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 10 3. ANALYSIS METHODS Special cases: –Multiplicative model: K = {1}, I 1 = {0,…,n} –Additive model: K = {0,…,n}, I i = {i}

11 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 11 3. ANALYSIS METHODS Used additive model Note: Parameter estimates are: –Estimates of marginal means … –… in balanced design:

12 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 12 3. ANALYSIS METHODS Pitch correction: Confounding with F 0 : Show both and:

13 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 13 4. RESULTS: (A) POSITIONAL EFFECTS 5 Bands, not pitch-corrected Solid: right position, dashed: left position. Y-axis: corrected mean

14 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 14 4. RESULTS: (A) POSITIONAL EFFECTS 5 Bands, pitch-corrected

15 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 15 4. RESULTS: (A) POSITIONAL EFFECTS 4 Bands, not pitch-corrected

16 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 16 4. RESULTS: (A) POSITIONAL EFFECTS 4 Bands, pitch-corrected

17 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 17 4. RESULTS: (B) STRESS/ACCENT EFFECTS 5 Bands, not pitch-corrected Solid: stressed syllable, dashed: unstressed. Y-axis: corrected mean

18 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 18 4. RESULTS: (B) STRESS/ACCENT EFFECTS 5 Bands, pitch-corrected

19 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 19 4. RESULTS: (B) STRESS/ACCENT EFFECTS 4 Bands, not pitch-corrected

20 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 20 4. RESULTS: (B) STRESS/ACCENT EFFECTS 4 Bands, pitch-corrected

21 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 21 4. RESULTS: (C) TILT EFFECTS

22 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 22 5. SYNTHESIS Use ABS/OLA sinusoidal model: s[n] = sum of overlapped short-time signal frames s k [n] s k [n] = sum of quasi-harmonic sinusoidal components: s k [n]   l A k,l cos( k,l n +  k,l  Each frame of unit is represented by a set of quasi-harmonic sinusoidal parameters; Given the desired F0 contour, pitch shift is applied to the sinusoidal parameter component of the unit to obtain the target parameter A k,l ;

23 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 23 5. SYNTHESIS Considering the differences of prosody factors between original and target unit, band differences: Transform the band difference into weights applying to the sinusoidal parameters:,when the j’th harmonic is located in the i'th band; Spectral smoothing across unit boundaries.

24 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 24 5. SYNTHESIS 5 Bands modification example [i:]

25 CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 25 CONCLUSIONS Described simple methods for predicting and synthesizing spectral balance But: Spectral balance is only one “non-standard acoustic correlate” Others that remain to be addressed: –Spectral dynamics –Phase


Download ppt "CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan."

Similar presentations


Ads by Google