CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.

Slides:

Advertisements

Similar presentations

Change-Point Detection Techniques for Piecewise Locally Stationary Time Series Michael Last National Institute of Statistical Sciences Talk for Midyear.

Advertisements

Acoustic/Prosodic Features

Shapelets Correlated with Surface Normals Produce Surfaces Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia.

Acoustic Characteristics of Vowels

Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.

The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/

Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.

A System for Hybridizing Vocal Performance By Kim Hang Lau.

Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.

FLST: Prosodic Models FLST: Prosodic Models for Speech Technology Bernd Möbius

Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.

Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.

VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.

Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

Text-To-Speech Synthesis An Overview. What is a TTS System  Goal A system that can read any text Automatic production of new sentences Not just audio.

Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.

A PRESENTATION BY SHAMALEE DESHPANDE

Basics of Signal Processing. frequency = 1/T  speed of sound × T, where T is a period sine wave period (frequency) amplitude phase.

Representing Acoustic Information

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

Basics of Signal Processing. SIGNALSOURCE RECEIVER describe waves in terms of their significant features understand the way the waves originate effect.

IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.

1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.

Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

By Sarita Jondhale1 Signal Processing And Analysis Methods For Speech Recognition.

A formant-trajectory model and its usage in comparing coarticulatory effects in Dysarthric and normal speech Xiaochuan Niu and Jan P. H. van Santen Center.

♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.

Basics of Neural Networks Neural Network Topologies.

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Structure of Spoken Language

Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.

Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic 1, engineering student supervised by Olivier Cappé.

Chapter 3 Time Domain Analysis of Speech Signal. 3.1 Short-time windowing signal (1) Three types windows : –Rectangular window –h r [n] = u[n] – u[n –

IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.

Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.

Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.

A Recognition Model for Speech Coding Wendy Holmes 20/20 Speech Limited, UK A DERA/NXT Joint Venture.

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

CHAPTER 4 COMPLEX STIMULI. Types of Sounds So far we’ve talked a lot about sine waves =periodic =energy at one frequency But, not all sounds are like.

0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.

Copyright © 2011 by Denny Lin1 Simple Synthesizer Part 3 Based on Floss Manuals (Pure Data) “Building a Simple Synthesizer” By Derek Holzer Slides by Denny.

Speech recognition Home Work 1. Problem 1 Problem 2 Here in this problem, all the phonemes are detected by using phoncode.doc There are several phonetics.

IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.

UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.

High Quality Voice Morphing

Mr. Darko Pekar, Speech Morphing Inc.

Spectrum Analysis and Processing

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

Norm-Based Coding of Voice Identity in Human Auditory Cortex

8.5 Modulation of Signals basic idea and goals

Measuring the Similarity of Rhythmic Patterns

Presentation transcript:

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan Niu Center for Spoken Language Understanding OGI School of Science & Technology at OHSU

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 2 OVERVIEW 1.IMPORTANCE OF SPECTRAL BALANCE 2.MEASUREMENT OF SPECTRAL BALANCE 3.ANALYSIS METHODS 4.RESULTS 5.SYNTHESIS 6.CONCLUSIONS

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 3 1. IMPORTANCE OF SPECTRAL BALANCE Linguistic Control Factors –Stress-like factors –Positional factors –Phonemic factors Acoustic Correlates –Traditionally TTS-controlled: Pitch, timing, amplitude –Demonstrated in natural speech, but usually not TTS-controlled: Spectral tilt, balance Formant dynamics …

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 4 2. MEASUREMENT OF SPECTRAL BALANCE Data: –472 greedily selected sentences Genre: newspaper Greedy features: linguistic control factors –One female speaker –Manual segmentation –Accent: independent rating by 3 judges 0-3 score

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 5 2. MEASUREMENT OF SPECTRAL BALANCE Energy in 5 formant-range frequency bands –B 0 : Hz [~F0] –B 1 : Hz [~F1] –B 2 : Hz [~F2] –B 3 : Hz [~F3] –B 4 :3500- max Hz [~fricative noise] In other words, multidimensional measure Filter bank  Square   Average [1 ms rect.]  20 log 10 (B i ) Subtract estimated per-utterance means

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 6 2. MEASUREMENT OF SPECTRAL BALANCE Details: –Confounding with F 0 Measure pitch-corrected and raw –For certain wave shapes, pitch directly related to fixed-frame energy –Why do both: wave shapes may change in unknown ways F 0 not confined to B 0 [female speech] –Vowel formants not quite confined to bands [e.g., F 1 for /EE/ and F 3 for /ER/]

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 7 2. MEASUREMENT OF SPECTRAL BALANCE Why not more or different bands? –Multiple interacting Linguistic Control Factors Need measurements that minimize interactions –5 bands  Different vowels “behave similarly” Can model vowels as a class Why not simply spectral tilt? –5 bands more information than single measure –Supply more information for synthesis

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 8 3. ANALYSIS METHODS Measures likely to behave like segmental duration: –Multiple interacting, confounded factors: Interaction: Magnitude of effects on one factor may depend on other factors Confounding: Unequal frequencies of control factor combinations –“Directional Invariance” Direction of effects on one factor independent of other factors

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 9 3. ANALYSIS METHODS Need method that –can handle multiple interacting, confounded factors and –takes advantage of Directional Invariance: Used: Sums of Products Model:

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING ANALYSIS METHODS Special cases: –Multiplicative model: K = {1}, I 1 = {0,…,n} –Additive model: K = {0,…,n}, I i = {i}

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING ANALYSIS METHODS Used additive model Note: Parameter estimates are: –Estimates of marginal means … –… in balanced design:

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING ANALYSIS METHODS Pitch correction: Confounding with F 0 : Show both and:

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING RESULTS: (A) POSITIONAL EFFECTS 5 Bands, not pitch-corrected Solid: right position, dashed: left position. Y-axis: corrected mean

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING RESULTS: (A) POSITIONAL EFFECTS 5 Bands, pitch-corrected

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING RESULTS: (A) POSITIONAL EFFECTS 4 Bands, not pitch-corrected

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING RESULTS: (A) POSITIONAL EFFECTS 4 Bands, pitch-corrected

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING RESULTS: (B) STRESS/ACCENT EFFECTS 5 Bands, not pitch-corrected Solid: stressed syllable, dashed: unstressed. Y-axis: corrected mean

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING RESULTS: (B) STRESS/ACCENT EFFECTS 5 Bands, pitch-corrected

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING RESULTS: (B) STRESS/ACCENT EFFECTS 4 Bands, not pitch-corrected

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING RESULTS: (B) STRESS/ACCENT EFFECTS 4 Bands, pitch-corrected

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING RESULTS: (C) TILT EFFECTS

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING SYNTHESIS Use ABS/OLA sinusoidal model: s[n] = sum of overlapped short-time signal frames s k [n] s k [n] = sum of quasi-harmonic sinusoidal components: s k [n]   l A k,l cos( k,l n +  k,l  Each frame of unit is represented by a set of quasi-harmonic sinusoidal parameters; Given the desired F0 contour, pitch shift is applied to the sinusoidal parameter component of the unit to obtain the target parameter A k,l ;

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING SYNTHESIS Considering the differences of prosody factors between original and target unit, band differences: Transform the band difference into weights applying to the sinusoidal parameters:,when the j’th harmonic is located in the i'th band; Spectral smoothing across unit boundaries.

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING SYNTHESIS 5 Bands modification example [i:]

CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 25 CONCLUSIONS Described simple methods for predicting and synthesizing spectral balance But: Spectral balance is only one “non-standard acoustic correlate” Others that remain to be addressed: –Spectral dynamics –Phase