Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.

Similar presentations


Presentation on theme: "Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012."— Presentation transcript:

1

2 Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012

3 Apply VC principles to a different problem… December 11, 20122E.Godoy, Speaking Style Conversion

4 Speech Intelligibility Context E.Godoy, Speaking Style Conversion3  Speech is often heard in adverse conditions  Noisy environments  Listener has difficulty hearing/understanding  How to transform speech to make it more intelligible…?  To make speech synthesis systems more effective December 11, 2012 Example of speech with environmental barriers:  the speech is not very intelligible! noise no noise

5 Intelligible Speaking Styles December 11, 2012E.Godoy, Speaking Style Conversion4 I. Lombard speech  Speaker is immersed in noise  Human reflex to increase the speech loudness II. Clear speech  Listener faces barrier (noise, hearing, language,…)  Speaker adapts strategy to increase speech clarity normalLombard casualclear

6 VC to improve speech intelligibility? E.Godoy, Speaking Style Conversion5  Voice Conversion  Modify speech to change the speaker identity  Learn transformation from source-to-target speaker  Speaking Style Conversion  Modify speech to improve intelligibility  Determine transformation from normal-to-intelligible style  Spectral Envelope: still very important! December 11, 2012

7 Overview: Analyses-to-Modifications E.Godoy, Speaking Style Conversion6 I. Acoustic analyses to identify (mainly spectral) characteristics of Lombard & Clear styles i. Average Spectra ii. Vowel Spaces II. Result of analyses inspire spectral modifications to improve intelligibility i. Spectral energy band boosting (corrective filters) ii. Formant shifting (frequency warping) December 11, 2012

8 Corpora E.Godoy, Speaking Style Conversion7  Lombard-normal: Grid  8 speakers (4 male, 4 female)  50 sentences each  Lombard  Ninf96: most extreme (Lu & Cooke)  Clear-casual: LUCID read sentences  8 speakers (4 male, 4 female)  50 sentences each  Read speech: most exaggerated (Baker & Hazan) December 11, 2012

9 Average Relative Spectra December 11, 2012E.Godoy, Speaking Style Conversion8  Recall Amplitude Scaling in DFWA  Average Relative spectra is similar:  difference between normal (X) and intelligible (Y) style  Average across all frames

10 Average Relative Spectra (by Speaker) E.Godoy, Speaking Style Conversion9 Lombard-normal Clear-casual December 11, 2012

11 Average Relative Spectra (Overall)  Lombard speech: Spectral energy boosting “where formants are” (~500-4500Hz)  Clear speech: Varies depending on speaker strategy, extent of differences mild overall E.Godoy, Speaking Style Conversion10December 11, 2012

12 Vowel Spaces (average for all speakers) E.Godoy, Speaking Style Conversion11 Lombard speech: Vowel Space Translation Clear speech: Vowel Space Expansion December 11, 2012

13 Inspiration for Speech Modifications E.Godoy, Speaking Style Conversion12 1. Spectral energy band boosting (Lombard) 2. Vowel space expansion (Clear)  Features attributed with increased speech intelligibility  Though not observed together in human speech production…  Signal processing algorithms can accomplish both! December 11, 2012

14 Spectral Energy Band Boosting E.Godoy, Speaking Style Conversion13  Corrective Filters Lombard-inspired & Enhanced (high SII)Corrective Filter: Varying Gain December 11, 2012

15 Frequency Warping for VS Expansion December 11, 2012E.Godoy, Speaking Style Conversion14  Curve fitting formant shifts inspires warping…

16 Sound Samples E.Godoy, Speaking Style Conversion15 With Noise (SSN, 0dB)  Original  Warp  Boost  BW No Noise  Original  WarpE  Boost  BW December 11, 2012

17 Want more ? E.Godoy, Speaking Style Conversion16  See Maria’s presentation for more details … December 11, 2012

18 Voice & Speaking Style Conversion Parallels December 11, 2012E.Godoy, Speaking Style Conversion17  Voice Conversion  Dynamic Frequency Warping + Amplitude Scaling (based on acoustic-phonetic spaces of source & target speakers)  Speaking Style Conversion  Frequency Warping + Corrective Filter 1. Clear-speech inspired frequency warping for vowel space expansion 2. Lombard-speech inspired corrective filters to increase loudness

19 Thank you! More Questions?

20 Extras…

21 Objective Metrics for Evaluation December 11, 2012E.Godoy, Speaking Style Conversion20 I. Loudness  Energy in frequency bands weighted based on human hearing II. Speech Intelligibility Index (SII)  Energy & modulations in frequency bands relative to a noise masker

22 Loudness Distributions E.Godoy, Speaking Style Conversion21 Lombard speech: “louder” for voiced (bi-modal) Clear speech: not “louder” than casual speech Transients: neither style distinguishes on average December 11, 2012

23 Extended SII Distributions E.Godoy, Speaking Style Conversion22  extSII highly correlated with ave loudness  Lombard speech objectively more intelligible  Clear speech intelligibility gain not captured by extSII  limitations of objective intelligibility metrics December 11, 2012

24 Observations from Analyses E.Godoy, Speaking Style Conversion23  Lombard Speech  Spectral boosting in inclusive formant region  Increase in Loudness (also extSII)  Vowel space translation, but no expansion  Clear Speech  Small changes in average spectra (slight spectral “flattening”)  Consistent vowel space expansion  Greater vowel discrimination  Comparison between styles  Acoustic differences  translate into perceptual distinctions  linked to intelligibility gains  Spectral boosting & Vowel space expansion: mutually exclusive December 11, 2012


Download ppt "Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012."

Similar presentations


Ads by Google