Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

Slides:

Advertisements

Similar presentations

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.

Advertisements

DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.

A System for Hybridizing Vocal Performance By Kim Hang Lau.

Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.

A Robust Algorithm for Pitch Tracking David Talkin Hsiao-Tsung Hung.

Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.

1 Frequency Domain Analysis/Synthesis Concerned with the reproduction of the frequency spectrum within the speech waveform Less concern with amplitude.

Chapter 8: The Discrete Fourier Transform

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Time-Frequency and Time-Scale Analysis of Doppler Ultrasound Signals

On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97.

Transformations Definition: A mapping of one n-dimensional space onto another k-dimensional space, which could be itself. – Example: Mapping a three dimensional.

Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.

Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.

A PRESENTATION BY SHAMALEE DESHPANDE

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

Discrete Time Periodic Signals A discrete time signal x[n] is periodic with period N if and only if for all n. Definition: Meaning: a periodic signal keeps.

Normalised Least Mean-Square Adaptive Filtering

Representing Acoustic Information

Chapter 4: Sampling of Continuous-Time Signals

LE 460 L Acoustics and Experimental Phonetics L-13

DTFT And Fourier Transform

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

8.1 representation of periodic sequences:the discrete fourier series 8.2 the fourier transform of periodic signals 8.3 properties of the discrete fourier.

Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.

SPEECH CODING Maryam Zebarjad Alessandro Chiumento.

1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.

1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Zhongguo Liu_Biomedical Engineering_Shandong Univ. Chapter 8 The Discrete Fourier Transform Zhongguo Liu Biomedical Engineering School of Control.

Structure of Spoken Language

Linear Predictive Analysis 主講人：虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.

Course Outline (Tentative) Fundamental Concepts of Signals and Systems Signals Systems Linear Time-Invariant (LTI) Systems Convolution integral and sum.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

Chapter 6 Linear Predictive Coding (LPC) of Speech Signals 6.1 Basic Concepts of LPC 6.2 Auto-Correlated Solution of LPC 6.3 Covariance Solution of LPC.

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

Chapter 3 Time Domain Analysis of Speech Signal. 3.1 Short-time windowing signal (1) Three types windows : –Rectangular window –h r [n] = u[n] – u[n –

Quiz 1 Review. Analog Synthesis Overview Sound is created by controlling electrical current within synthesizer, and amplifying result. Basic components:

Fourier Analysis of Signals and Systems

EEE 503 Digital Signal Processing Lecture #2 : EEE 503 Digital Signal Processing Lecture #2 : Discrete-Time Signals & Systems Dr. Panuthat Boonpramuk Department.

EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,

More On Linear Predictive Analysis

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

By Dr. Rajeev Srivastava CSE, IIT(BHU)

Time Compression/Expansion Independent of Pitch. Listening Dies Irae from Requiem, by Michel Chion (1973)

Copyright ©2010, ©1999, ©1989 by Pearson Education, Inc. All rights reserved. Discrete-Time Signal Processing, Third Edition Alan V. Oppenheim Ronald W.

ELECTRICAL ENGINEERING: PRINCIPLES AND APPLICATIONS, Third Edition, by Allan R. Hambley, ©2005 Pearson Education, Inc. CHAPTER 6 Frequency Response, Bode.

بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.

Fourier Analysis Patrice Koehl Department of Biological Sciences National University of Singapore

Lecture 19 Spectrogram: Spectral Analysis via DFT & DTFT

Chapter 4 Discrete-Time Signals and transform

DIGITAL SIGNAL PROCESSING ELECTRONICS

Chapter 5 Homomorphic Processing(1)

Figure 11.1 Linear system model for a signal s[n].

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

ARTIFICIAL NEURAL NETWORKS

Linear Prediction.

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

The Vocoder and its related technology

Digital Systems: Hardware Organization and Design

Linear Prediction.

Speech Processing Final Project

Presentation transcript:

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4 PSOLA( Pitch Synchronous Overlap- Add) Algorithm for Synthesis 16.5 Synthesis based on addition of sin functions

16.1 Synthesis based on LPC (1) x(n) = Σ a i x(n-i), i=1~p For every frame of the original speech the p a i are extracted by LPC algorithm and stored in memory with the first p signals. When synthesis is required, the later signals could be generated by above formula.

16.2 Synthesis based on formants (1) The transfer characteristics of formant filter y(n) = ax(n)-by(n-1)-cy(n-2) where a=1+b+c, b=-2exp(-πBT s )cos(2πFT s ) c= exp(-2πBT s ) B is bandwidth, F is resonance frequency of filter, T s is sample frequency

Synthesis based on formants (2) In the range of formants deploys a couple of filters with F 1,F 2,F 3 … as the resonance frequency the whole system will close to the transfer characteristics of the vocal tract Cascade (series) or Parallel Connection of formant filters)

Synthesis based on homomorphic processing (1) After homomorphic processing x(n) = e(n) + v(n) For voice the e(n) is a periodic sequence. Suppose the period is N, e(n) =Σδ(n-rN), r=0~R e(n) only nonzero on mN. It is easy to separate the e(n) and restored e(n)

16.4 PSOLA( Pitch Synchronous Overlap-Add) Algorithm for Synthesis (1) This algorithm was proposed by F. Charpentier and E.Moulines in the end of 1980’s. The advantage is relative lower computing complexity, the clarity and naturiness are both better. In particular, the TD-PSOLA(time-domain PSOLA) can meet the real time requirement. The principle of PSOLA

PSOLA Algorithm for Synthesis (2) The algorithm is originated from the addition of the reconstructed short-time Fourier transform signals : The short-time Fourier transform of x[n] is : X n (e jω )=Σx(m)w(n-m)e -jωm, for-∞<m<∞ For any n it corresponds a continuous frequency spectrum function. There exists redundancy. So we can just take a sample every R samples: Y r (e jω ) = X n (e jω )| n=rR, It’s reverse transform is

PSOLA Algorithm for Synthesis (3) y r (m)=∫ -∞ ∞ Y r (e jω )e jωm dω/(2π) Added the y r (m)’s we get y(m)=Σy r (m)=Σx(m)w(m-rR) = x(m)Σw(rR-m), for -∞<r<∞ It is possible to prove that when R<=N/4 Σw(m-rR)≈W(e j0 )/R, so y(n)≈x(n) W(e j0 )/R

PSOLA Algorithm for Synthesis (4) So the difference between y(n) and x(n) is only a constant factor! If Hanning window is used, an exact relation could be derived that Σw(rN/2-m)≡1, for -∞<r<∞, for any m If x(n) is a voiced with period N p, then we can use Hanning window to intercept a signal with double periods 2N p and added by N p delay. Under idea periodic condition, it is possible to restore the original signal x(n)= Σw(rN p -n)x(n)

PSOLA Algorithm for Synthesis (5) In practice, there is no idea periodic condition and the reconstruction condition is not completely satisfied, and we need to change the pitch, duration and intensity so don’t want to reconstruct the original signal. By using PSOLA, we can make the mean square of spectrum minimal

PSOLA Algorithm for Synthesis (6) D[x(n),y(n)]=∫ -π π |X tm (e jω )-Y tg (e jω )| 2 dω Where t m and t g are pitch mark point of x(n) and y(n) respectively The procedures for PSOLA 1. Pitch synchronous analysis : to mark the pitch as accurate as possible; 2. Change time scale : for given pitch adjust parameterβand time adjust parameterγ, determine the relation between the original pitch mark sequence and the synthesized pitch mark sequence;

PSOLA Algorithm for Synthesis (7) 3. Change the analyzed short-time signal and create synthesized signal(TD-PSOLA only make delay and adjust the signal on frequency domain) Pitch synchronous overlay processing and create last version of the synthesized speech signal.

PSOLA Algorithm for Synthesis (8) Pitch Synchronous Analysis : for unvoiced speech we set the period according to fixed period; for voiced segments, the pitch marks being set correctly. So a series of pitch mark points {t m, m=1,2,…M} Times the x(n) with the series of window functions will get a series of short-time signal x m (n) :

PSOLA Algorithm for Synthesis (9) x m (n) = w m (t m -n)x(n) These x m (n) are intermediate representation of the waves. W is Hanning window. Window length is larger than a pitch period. Window center is located at the pitch mark. There are partly overlap between the adjucent frames.

PSOLA Algorithm for Synthesis (10) Time scale changing In order to perform prosodic modification, must determine the new pitch mark position on the synthesis axis t q (q=1~Q) and the mapping t m -> t q. Duration adjustment functionγ(n) and pitch adjustment function β(n) are two important parameters for determining new mark and mapping relation. They will change at same time. The change of pitch leads the increase of pitch period, so duration should make some change to adjust to the original duration. It is also could be done in one step.

PSOLA Algorithm for Synthesis (11) x m (n) is changed into x q (n) by modification. Then x q (n) will be synthesized according to new marks. It contains three steps: changing the numbers of short-time signal waves, changing the delay of short-time signals, changing every short-time signal itself. For TD_PSOLA, the synthesized signal is only the copy of analyzed signal. First select the number of analyzed signals; delay δ q =t q -t m, x q (n)=x m (n+δ q )=x m (n-t m +t q ) For FD-PSOLA, besides above processing, x m (n+δ q ) must be transformed on frequency domain.

PSOLA Algorithm for Synthesis (12) The overlap-add There are a couple of ways to add. x(n)= Σα q x q (n)w q (t q -n)/ Σw q 2 (t q -n)for q Where α q are normalized factors; w is the sequence of synthesized window. Another simple way is : x(n)= Σα q x q (n)/ Σw q (t q -n)for q

16.5 Synthesis based on Sin Models (1) This technique starts from the frequency spectrum decomposition of speech signal. By the decomposition, a series of frequencies, amplitudes and phases are obtained. By matching the frequency parameters and adjusting amplitude and phases, the re-addition of sin waves could synthesize new speech signal. Sin Model of Speech for Synthesis by Analysis The generation of speech could be seen as the result of a glottal excitation through a linear time-variant system. S(t) =∫ 0 t h(t-τ,t)e(τ)dτ

Synthesis based on Sin Models (2) e(t) =Σa l (t)cos[Ω l (t)], l=1~L Ω l (t)= ∫ 0 t ω l (σ)dσ+φ l s(t) = ΣA l (t)cos[θ l (t)], l=1~L The transfer function H(ω,t) of vocal track is the Fourier transform of h(t-τ,t), H(ω,t) =M(ω,t)exp(jψ(ω,t)) A l (t)= a l (t) M l (t), θ l (t)= Ω l (t)+ψ l (t) Speech Synthesis by Analysis Based on Sin Models 1. The estimate of frequency, amplitude and phase parameters

Synthesis based on Sin Models (3) The conclusion is the frequencies of synthesized speech signal correspond the the frequencies at the peaks of the short-time Fourier transform(DFT) of that frame. The amplitudes and phases are that at these frequencies. In practice, we estimate frequency, amplitude and phase parameters by peak extraction. By windowing a series of short-time speech signal. For the performance, the window length should be larger than two current pitch periods. The window used is Hamming window with length 256 and 0%-50% overlay.

Synthesis based on Sin Models (4) After 512 points of FFT, the spectrum is obtained. By peak Extraction, the frequencies ω l amplitudes A l and phasesθ l are obtained. l=1~L, L generally is Frequency Matching Adjecent frame needs to do the frequency matching to facilitate the explonation. After matching, the frequency matching locus is obtained. Explonation of amplitude and phase for two frames Experiment Results.

Synthesis based on Sin Models (1)

Synthesis based on Sin Models (2)