Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.

Slides:



Advertisements
Similar presentations
Sounds that “move” Diphthongs, glides and liquids.
Advertisements

SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Acoustic Characteristics of Vowels
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Multipitch Tracking for Noisy Speech
V Telecommunications Industry AssociationTR L.
Abigail Stefaniw Room Acoustics for Classrooms: measurement techniques University of Georgia Classroom Acoustics Seminar.
HEARING LOSS In Collaboration with Linda Thibodeau Jack Scott III Paul Dybala University of Texas at Dallas Region 10 Education Service Center P.O. Box.
Jessica E. Huber Ph.D. in Speech Science from University at Buffalo MA in Speech-Language Pathology, Certified Speech- Language Pathologist Assistant Professor,
Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.
Reflections Diffraction Diffusion Sound Observations Report AUD202 Audio and Acoustics Theory.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Effectiveness of spatial cues, prosody, and talker characteristics in selective attention C.J. Darwin & R.W. Hukin.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
L 17 The Human Voice. The Vocal Tract epiglottis.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
Fitting Formulas Estimate amplification requirements of individual patients Maximize intelligibility of speech Provide good overall sound quality Keep.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Why is ASR Hard? Natural speech is continuous
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
1 Improved Subjective Weighting Function ANSI C63.19 Working Group Submitted by Stephen Julstrom for October 2, 2007.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
Prepared by: Waleed Mohamed Azmy Under Supervision:
Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
1 Auditory, tactile, and vestibular sensory systems n Perceptually relevant characteristics of sound n The receptor system: The ear n Basic sensory characteristics.
Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012
Chapter 3.2 Speech Communication Human Performance Engineering Robert W. Bailey, Ph.D. Third Edition.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
Lombard Speech Synthesis  Humans modify their voice according to the social situation/context  Shouting or loud speech is an important mode of speaking.
Sh s Children with CIs produce ‘s’ with a lower spectral peak than their peers with NH, but both groups of children produce ‘sh’ similarly [1]. This effect.
COPYRIGHT © All rights reserved by Sound acoustics Germany The averaged quality measures over all test cases indicate the real influence of a test object.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
SRINIVAS DESAI, B. YEGNANARAYANA, KISHORE PRAHALLAD A Framework for Cross-Lingual Voice Conversion using Artificial Neural Networks 1 International Institute.
Speech Perception.
Predicting the Intelligibility of Cochlear-implant Vocoded Speech from Objective Quality Measure(1) Department of Electrical Engineering, The University.
HEARING LOSS Hearing Loss Children and Adults who are deaf are those who cannot hear or understand conversational speech under normal circumstances.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Acoustic Phonetics 3/14/00.
SOUND PRESSURE, POWER AND LOUDNESS
Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014.
Digital Image Processing Lecture 8: Image Enhancement in Frequency Domain II Naveed Ejaz.
High Quality Voice Morphing
Speech Enhancement Summer 2009
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
III. Analysis of Modulation Metrics IV. Modifications
Speech Perception.
Speech Perception CS4706.
Norm-Based Coding of Voice Identity in Human Auditory Cortex
Speech Communications
New Subjective Weighting Function
New Subjective Weighting Function
COPYRIGHT © All rights reserved by Sound acoustics Germany
Auditory Morphing Weyni Clacken
Presentation transcript:

Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012

Apply VC principles to a different problem… December 11, 20122E.Godoy, Speaking Style Conversion

Speech Intelligibility Context E.Godoy, Speaking Style Conversion3  Speech is often heard in adverse conditions  Noisy environments  Listener has difficulty hearing/understanding  How to transform speech to make it more intelligible…?  To make speech synthesis systems more effective December 11, 2012 Example of speech with environmental barriers:  the speech is not very intelligible! noise no noise

Intelligible Speaking Styles December 11, 2012E.Godoy, Speaking Style Conversion4 I. Lombard speech  Speaker is immersed in noise  Human reflex to increase the speech loudness II. Clear speech  Listener faces barrier (noise, hearing, language,…)  Speaker adapts strategy to increase speech clarity normalLombard casualclear

VC to improve speech intelligibility? E.Godoy, Speaking Style Conversion5  Voice Conversion  Modify speech to change the speaker identity  Learn transformation from source-to-target speaker  Speaking Style Conversion  Modify speech to improve intelligibility  Determine transformation from normal-to-intelligible style  Spectral Envelope: still very important! December 11, 2012

Overview: Analyses-to-Modifications E.Godoy, Speaking Style Conversion6 I. Acoustic analyses to identify (mainly spectral) characteristics of Lombard & Clear styles i. Average Spectra ii. Vowel Spaces II. Result of analyses inspire spectral modifications to improve intelligibility i. Spectral energy band boosting (corrective filters) ii. Formant shifting (frequency warping) December 11, 2012

Corpora E.Godoy, Speaking Style Conversion7  Lombard-normal: Grid  8 speakers (4 male, 4 female)  50 sentences each  Lombard  Ninf96: most extreme (Lu & Cooke)  Clear-casual: LUCID read sentences  8 speakers (4 male, 4 female)  50 sentences each  Read speech: most exaggerated (Baker & Hazan) December 11, 2012

Average Relative Spectra December 11, 2012E.Godoy, Speaking Style Conversion8  Recall Amplitude Scaling in DFWA  Average Relative spectra is similar:  difference between normal (X) and intelligible (Y) style  Average across all frames

Average Relative Spectra (by Speaker) E.Godoy, Speaking Style Conversion9 Lombard-normal Clear-casual December 11, 2012

Average Relative Spectra (Overall)  Lombard speech: Spectral energy boosting “where formants are” (~ Hz)  Clear speech: Varies depending on speaker strategy, extent of differences mild overall E.Godoy, Speaking Style Conversion10December 11, 2012

Vowel Spaces (average for all speakers) E.Godoy, Speaking Style Conversion11 Lombard speech: Vowel Space Translation Clear speech: Vowel Space Expansion December 11, 2012

Inspiration for Speech Modifications E.Godoy, Speaking Style Conversion12 1. Spectral energy band boosting (Lombard) 2. Vowel space expansion (Clear)  Features attributed with increased speech intelligibility  Though not observed together in human speech production…  Signal processing algorithms can accomplish both! December 11, 2012

Spectral Energy Band Boosting E.Godoy, Speaking Style Conversion13  Corrective Filters Lombard-inspired & Enhanced (high SII)Corrective Filter: Varying Gain December 11, 2012

Frequency Warping for VS Expansion December 11, 2012E.Godoy, Speaking Style Conversion14  Curve fitting formant shifts inspires warping…

Sound Samples E.Godoy, Speaking Style Conversion15 With Noise (SSN, 0dB)  Original  Warp  Boost  BW No Noise  Original  WarpE  Boost  BW December 11, 2012

Want more ? E.Godoy, Speaking Style Conversion16  See Maria’s presentation for more details … December 11, 2012

Voice & Speaking Style Conversion Parallels December 11, 2012E.Godoy, Speaking Style Conversion17  Voice Conversion  Dynamic Frequency Warping + Amplitude Scaling (based on acoustic-phonetic spaces of source & target speakers)  Speaking Style Conversion  Frequency Warping + Corrective Filter 1. Clear-speech inspired frequency warping for vowel space expansion 2. Lombard-speech inspired corrective filters to increase loudness

Thank you! More Questions?

Extras…

Objective Metrics for Evaluation December 11, 2012E.Godoy, Speaking Style Conversion20 I. Loudness  Energy in frequency bands weighted based on human hearing II. Speech Intelligibility Index (SII)  Energy & modulations in frequency bands relative to a noise masker

Loudness Distributions E.Godoy, Speaking Style Conversion21 Lombard speech: “louder” for voiced (bi-modal) Clear speech: not “louder” than casual speech Transients: neither style distinguishes on average December 11, 2012

Extended SII Distributions E.Godoy, Speaking Style Conversion22  extSII highly correlated with ave loudness  Lombard speech objectively more intelligible  Clear speech intelligibility gain not captured by extSII  limitations of objective intelligibility metrics December 11, 2012

Observations from Analyses E.Godoy, Speaking Style Conversion23  Lombard Speech  Spectral boosting in inclusive formant region  Increase in Loudness (also extSII)  Vowel space translation, but no expansion  Clear Speech  Small changes in average spectra (slight spectral “flattening”)  Consistent vowel space expansion  Greater vowel discrimination  Comparison between styles  Acoustic differences  translate into perceptual distinctions  linked to intelligibility gains  Spectral boosting & Vowel space expansion: mutually exclusive December 11, 2012