Download presentation
Presentation is loading. Please wait.
Published byGarey Carroll Modified over 9 years ago
2
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012
3
Apply VC principles to a different problem… December 11, 20122E.Godoy, Speaking Style Conversion
4
Speech Intelligibility Context E.Godoy, Speaking Style Conversion3 Speech is often heard in adverse conditions Noisy environments Listener has difficulty hearing/understanding How to transform speech to make it more intelligible…? To make speech synthesis systems more effective December 11, 2012 Example of speech with environmental barriers: the speech is not very intelligible! noise no noise
5
Intelligible Speaking Styles December 11, 2012E.Godoy, Speaking Style Conversion4 I. Lombard speech Speaker is immersed in noise Human reflex to increase the speech loudness II. Clear speech Listener faces barrier (noise, hearing, language,…) Speaker adapts strategy to increase speech clarity normalLombard casualclear
6
VC to improve speech intelligibility? E.Godoy, Speaking Style Conversion5 Voice Conversion Modify speech to change the speaker identity Learn transformation from source-to-target speaker Speaking Style Conversion Modify speech to improve intelligibility Determine transformation from normal-to-intelligible style Spectral Envelope: still very important! December 11, 2012
7
Overview: Analyses-to-Modifications E.Godoy, Speaking Style Conversion6 I. Acoustic analyses to identify (mainly spectral) characteristics of Lombard & Clear styles i. Average Spectra ii. Vowel Spaces II. Result of analyses inspire spectral modifications to improve intelligibility i. Spectral energy band boosting (corrective filters) ii. Formant shifting (frequency warping) December 11, 2012
8
Corpora E.Godoy, Speaking Style Conversion7 Lombard-normal: Grid 8 speakers (4 male, 4 female) 50 sentences each Lombard Ninf96: most extreme (Lu & Cooke) Clear-casual: LUCID read sentences 8 speakers (4 male, 4 female) 50 sentences each Read speech: most exaggerated (Baker & Hazan) December 11, 2012
9
Average Relative Spectra December 11, 2012E.Godoy, Speaking Style Conversion8 Recall Amplitude Scaling in DFWA Average Relative spectra is similar: difference between normal (X) and intelligible (Y) style Average across all frames
10
Average Relative Spectra (by Speaker) E.Godoy, Speaking Style Conversion9 Lombard-normal Clear-casual December 11, 2012
11
Average Relative Spectra (Overall) Lombard speech: Spectral energy boosting “where formants are” (~500-4500Hz) Clear speech: Varies depending on speaker strategy, extent of differences mild overall E.Godoy, Speaking Style Conversion10December 11, 2012
12
Vowel Spaces (average for all speakers) E.Godoy, Speaking Style Conversion11 Lombard speech: Vowel Space Translation Clear speech: Vowel Space Expansion December 11, 2012
13
Inspiration for Speech Modifications E.Godoy, Speaking Style Conversion12 1. Spectral energy band boosting (Lombard) 2. Vowel space expansion (Clear) Features attributed with increased speech intelligibility Though not observed together in human speech production… Signal processing algorithms can accomplish both! December 11, 2012
14
Spectral Energy Band Boosting E.Godoy, Speaking Style Conversion13 Corrective Filters Lombard-inspired & Enhanced (high SII)Corrective Filter: Varying Gain December 11, 2012
15
Frequency Warping for VS Expansion December 11, 2012E.Godoy, Speaking Style Conversion14 Curve fitting formant shifts inspires warping…
16
Sound Samples E.Godoy, Speaking Style Conversion15 With Noise (SSN, 0dB) Original Warp Boost BW No Noise Original WarpE Boost BW December 11, 2012
17
Want more ? E.Godoy, Speaking Style Conversion16 See Maria’s presentation for more details … December 11, 2012
18
Voice & Speaking Style Conversion Parallels December 11, 2012E.Godoy, Speaking Style Conversion17 Voice Conversion Dynamic Frequency Warping + Amplitude Scaling (based on acoustic-phonetic spaces of source & target speakers) Speaking Style Conversion Frequency Warping + Corrective Filter 1. Clear-speech inspired frequency warping for vowel space expansion 2. Lombard-speech inspired corrective filters to increase loudness
19
Thank you! More Questions?
20
Extras…
21
Objective Metrics for Evaluation December 11, 2012E.Godoy, Speaking Style Conversion20 I. Loudness Energy in frequency bands weighted based on human hearing II. Speech Intelligibility Index (SII) Energy & modulations in frequency bands relative to a noise masker
22
Loudness Distributions E.Godoy, Speaking Style Conversion21 Lombard speech: “louder” for voiced (bi-modal) Clear speech: not “louder” than casual speech Transients: neither style distinguishes on average December 11, 2012
23
Extended SII Distributions E.Godoy, Speaking Style Conversion22 extSII highly correlated with ave loudness Lombard speech objectively more intelligible Clear speech intelligibility gain not captured by extSII limitations of objective intelligibility metrics December 11, 2012
24
Observations from Analyses E.Godoy, Speaking Style Conversion23 Lombard Speech Spectral boosting in inclusive formant region Increase in Loudness (also extSII) Vowel space translation, but no expansion Clear Speech Small changes in average spectra (slight spectral “flattening”) Consistent vowel space expansion Greater vowel discrimination Comparison between styles Acoustic differences translate into perceptual distinctions linked to intelligibility gains Spectral boosting & Vowel space expansion: mutually exclusive December 11, 2012
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.