Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.

Slides:

Advertisements

Similar presentations

ON THE REPRESENTATION OF VOICE SOURCE APERIODICITIES IN THE MBE SPEECH CODING MODEL Preeti Rao and Pushkar Patwardhan Department of Electrical Engineering,

Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Easily extensible unix software for spectral analysis, display modification, and synthesis of musical sounds James W. Beauchamp School of Music Dept.

Analogue to Digital Conversion (PCM and DM)

DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.

Overview of Real-Time Pitch Tracking Approaches Music information retrieval seminar McGill University Francois Thibault.

2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.

Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.

Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.

CEN352, Dr. Ghulam Muhammad King Saud University

CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)

December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.

Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.

1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.

1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.

Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec

Dual Tone Multi-Frequency System Michael Odion Okosun Farhan Mahmood Benjamin Boateng Project Participants: Dial PulseDTMF.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.

Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.

Analysis & Synthesis The Vocoder and its related technology.

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.

EE392J Final Project, March 20, Multiple Camera Object Tracking Helmy Eltoukhy and Khaled Salama.

LE 460 L Acoustics and Experimental Phonetics L-13

GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.

Lecture 1 Signals in the Time and Frequency Domains

Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.

LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.

Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.

Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.

Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.

Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.

Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.

SPEECH CODING Maryam Zebarjad Alessandro Chiumento.

1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.

Signals CY2G2/SE2A2 Information Theory and Signals Aims: To discuss further concepts in information theory and to introduce signal theory. Outcomes:

Basics of Neural Networks Neural Network Topologies.

Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Systems (filters) Non-periodic signal has continuous spectrum Sampling in one domain implies periodicity in another domain time frequency Periodic sampled.

Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.

ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.

A Comparison Of Speech Coding With Linear Predictive Coding (LPC) And Code-Excited Linear Predictor Coding (CELP) By: Kendall Khodra Instructor: Dr. Kepuska.

More On Linear Predictive Analysis

SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.

Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.

Autoregressive (AR) Spectral Estimation

Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)

Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.

Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.

Digital Communications Chapter 13. Source Coding

Advanced Wireless Networks

CS 591 S1 – Computational Audio -- Spring, 2017

Linear Prediction.

1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.

ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.

Linear Predictive Coding Methods

Linear Prediction.

Govt. Polytechnic Dhangar(Fatehabad)

Combination of Feature and Channel Compensation (1/2)

Presentation transcript:

Page 0 of 34 MBE Vocoder

Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm AMBE IMBE

Page 2 of 34 Vocoders - analyzer 1. Speech analyzed first by segmenting speech using a window (e.g. Hamming window) 2. Excitation and system parameters are calculated for each segment 1.Excitation parameters : voiced/unvoiced, pitch period 2.System parameters: spectral envelope / system impulse response 3. Sending this parameters

Page 3 of 34 Vocoders - Synthesizer System parameters Excitation Signal White noise/ unvoiced Pulse train/voiced Synthesized voice

Page 4 of 34 Vocoders But usually vocoders have poor quality –Fundamental limitation in speech models –Inaccurate parameter estimation –Incapability of pulse train/ white noise to produce all voice speech synthesized entirely with a periodic source exhibits a “buzzy” quality, and speech synthesized entirely with a noise source exhibits a “hoarse” quality Potential solution to buzziness of vocoders is to use of mixed excitation models In these vocoders periodic and noise like excitations are mixed with a calculated ratio and this ration will be sent along the parameters

Page 5 of 34 Multi Band Excitation Speech Model Due to stationary nature of a speech signal, a window w(n) is usually applied to signal The Fourier transform of a windowed segment can be modeled as the product of a spectral envelope and an excitation spectrum In most models is a smoothed version of the original speech spectrum

Page 6 of 34 MBE model (Cont’d) the spectral envelope must be represented accurately enough to prevent degradations in the spectral envelope from dominating. – quality improvements achieved by the addition of a frequency dependent voiced/unvoiced mixture function. In previous simple models, the excitation spectrum is totally specified by the fundamental frequency w 0 and a voiced/unvoiced decision for the entire spectrum. In MBE model, the excitation spectrum is specified by the fundamental frequency w 0 and a frequency dependent voiced/unvoiced mixture function.

Page 7 of 34 Multi Banding In general, a continuously varying frequency dependent voiced/unvoiced mixture function would require a large number of parameters to represent it accurately. The addition of a large number of parameters would severely decrease the utility of this model in such applications as bit-rate reduction. To further reduce the number of these binary parameters, the spectrum is divided into multiple frequency bands and a binary voiced/unvoiced parameter is allocated to each band. MBE model differs from previous models in that the spectrum is divided into a large number of frequency bands (typically 20 or more), whereas previous models used three frequency bands at most.

Page 8 of 34 Multi Banding Original spectrum Spectral envelope Periodic spectrum V/UV information Noise spectrum Excitation spectrum Synthetic spectrum

Page 9 of 34 MBE Parameters The parameters used in MBE model are: 1. spectral envelope 2.the fundamental frequency 3.the V/UV information for each harmonic 4.and the phase of each harmonic declared voiced. The phases of harmonics in frequency bands declared unvoiced are not included since they are not required by the synthesis algorithm

Page 10 of 34 Parameter Estimation In many approaches (LPC based algorithms) the algorithms for estimation of excitation parameters and estimation of spectral envelope parameters operate independently. These parameters are usually estimated based on heuristic criterion without explicit consideration of how close the synthesized speech will be to the original speech. –This can result in a synthetic spectrum quite different from the original spectrum. In MBE the excitation and spectral envelope parameters are estimated simultaneously so that the synthesized spectrum is closest in the least squares sense to the spectrum of the original speech “analysis by synthesis”

Page 11 of 34 Parameter Estimation (Cont’d) the estimation process has been divided into two major steps. 1.In the first step, the pitch period and spectral envelope parameters are estimated to minimize the error between the original spectrum and the synthetic spectrum. 2. Then, the V/UV decisions are made based on the closeness of fit between the original and the synthetic spectrum at each harmonic of the estimated fundamental.

Page 12 of 34 Parameter Estimation (cont’d) The parameters estimated by minimizing the following error criterion: –Where The error in an interval is minimized at:

Page 13 of 34 Pitch Estimation and Spectral Envelope An efficient method for obtaining a good approximation for the periodic transform P ( w ) in this interval is to precompute samples of the Fourier transform of the window w (n) and center it around the harmonic frequency associated with this interval. For unvoiced frequency intervals, the envelope parameters are estimated by substituting idealized white noise (unity across the band) for |E (a)| in previous formulas which reduces to averaging the original spectrum in each frequency interval. For unvoiced regions, only the magnitude of A, is estimated since the phase of A, is not required for speech synthesis.

Page 14 of 34 More about pitch estimation Experimentally, the error E tends to vary slowly with the pitch period P the initial estimate is obtained by evaluating the error for integer pitch periods Since integer multiples of the correct pitch period have spectra with harmonics at the correct frequencies, the error E will be comparable for the correct pitch period and its integer multiples.

Page 15 of 34 More about pitch estimation (Cont’d) Speech segment Original spectrum Error/Pitch Original and Synthetic P=42.48 Original and Synthetic P=42

Page 16 of 34 V/UV Decision The voiced/unvoiced decision for each harmonic is made by comparing the normalized error over each harmonic of the estimated fundamental to a threshold When the normalized error over mth harmonic is below the threshold, this frame will be marked as voiced else unvoiced

Page 17 of 34 Analysis Algorithm Flowchart Window Speech segment start Compute error vs. pitch period Autocorrelation approach Select initial pitch period (Dynamic programming Pitch tracker) Refine initial pitch period (frequency domain approach) Make V/UV decision for each Frequency band Select V/UV spectral Envelope parameters For each freq. band Stop

Page 18 of 34 Speech Synthesis The voiced signal can be synthesized as the sum of sinusoidal oscillators with frequencies at the harmonics of the fundamental and amplitudes set by the spectral envelope parameters (The time domain method). The unvoiced signal can be synthesized as the sum of bandpass filtered white noise The frequency domain method was selected for synthesizing the unvoiced portion of the synthetic speech.

Page 19 of 34 Synthesis algorithm block diagram Separate Voiced/Unvoiced Envelope samples Bank of Harmonic oscillators STFT Replace envelope Weighted Overlap-add Linear interpolation V/UV Decision Envelope samples Voiced envelope samples Unvoiced envelope samples Voiced envelope samples Unvoiced envelope samples Voiced speech Unvoiced envelope samples White noise sequence Unvoiced speech

Page 20 of 34 MBE Synthesis algorithm First, the spectral envelope samples are separated into voiced or unvoiced spectral envelope samples depending on whether they are in frequency bands declared voiced or unvoiced Voiced envelope samples include both magnitude and phase, whereas unvoiced envelope samples include only the magnitude. Voiced speech is synthesized from the voiced envelope samples by summing the outputs of a band of sinusoidal oscillators running at the harmonics of the fundamental frequency

Page 21 of 34 MBE Synthesis algorithm (Voiced) The phase function is determined by an initial phase and a frequency track as follows: The frequency track is linearly interpolated between the mth harmonic of the current frame and that of the next frame by:

Page 22 of 34 MBE Synthesis algorithm (Unvoiced) Unvoiced speech is synthesized from the unvoiced envelope samples by first synthesizing a white noise sequence. For each frame, the white noise sequence is windowed and an FFT is applied to produce samples of the Fourier transform In each unvoiced frequency band, the noise transform samples are normalized to have unity magnitude. The unvoiced spectral envelope is constructed by linearly interpolating between the envelope samples |A m (t)|. The normalized noise transform is multiplied by the spectral envelope to produce the synthetic transform. The synthetic transforms are then used to synthesize unvoiced speech using the weighted overlap-add method.

Page 23 of 34 MBE Synthesis (Cont’d) The final synthesized speech is generated by summing the voiced and unvoiced synthesized speech signals + Synthesized speech Voiced speech Unvoiced speech

Page 24 of 34 Bit Allocation ParameterBits Fundamental Frequency 9 Harmonic Magnitude Harmonic Phase 0-45 Voiced/Unvoiced Bits 12 Total160

Page 25 of 34 Advanced MBE (AMBE) MBE coding rate at 2400 bps AMBE coding rate at 1200/2400 bps Four new features 1.Enhanced V/UV decision 2.Initial pitch detection 3.Refined pitch determination 4.Dual rate coding

Page 26 of 34 Enhanced V/UV decision divide the whole speech frequency band into 4 subbands and 2 subbands for 2.4 kbps and 1.2 kbps respectively. That is to say only 4 bits and 2 bits are used to encode U/V decisions for 2.4 kb/s and 1.2 kb/s vocoder respectively.

Page 27 of 34 Initial pitch detection MBE takes 2 steps to detect the refined initial pitch period –Spectrum matching technique to find the initial pitch period –Using DTW-based (Discrete Time Wrapping) technique to smooth the estimation Computational complexity is very high In MBE, a modified three-level center clipped autocorrelation method is used to detect the initial pitch period, and also use a simple smoothing method to correct the pitch errors.

Page 28 of 34 Redefined pitch determination To find the best pitch the basic method is to compute the error between the original speech spectrum and the shaped voiced speech spectrum by first supposing a pitch period The supposed pitch of which the spectrum error is minimum is chosen as the last pitch To reduce the computational complexity, AMBE uses a 256- point FFT to get the speech spectrum, and 5-point window spectrum is used to form the voiced harmonic spectrum. To get the refined pitch, AMBE perform seven times of spectrum matching process. In every time. AMBE first set a supposed pitch, then shape a harmonic spectrum over the overall frequency band according to the supposed pitch and window spectrum, and an error can be calculated by subtracting the shaped spectrum from speech spectrum. After the seven times of matching process, the refined pitch can easily be determined

Page 29 of 34 Dual rate coding Parameter2400 bps1200 bps Pitch quantization 86 V/UV decision42 Amplitude quantization 4119 total5327

Page 30 of 34 Improved MBE (IMBE) A 2400 bps coder based on MBE Substantially better than U.S government standard LPC-10e The parameters of the MBE speech model : –the fundamental frequency –voiced/unvoiced information –the spectral envelope.

Page 31 of 34 IMBE algorithm estimate the excitation and system parameters which minimize the distance between the original and synthetic speech spectra (analysis by synthesis) Once these parameters are estimated, voiced/unvoiced decisions are made by comparing the spectral error over a series of harmonics to a prescribed threshold

Page 32 of 34 IMBE block diagram IMBE algorithm block diagram

Page 33 of 33 IMBE Coding IMBE offered in 2.4, 4.8 and 8.0 kbps Analysis and synthesis routines are the same except the bit allocation The fundamental frequency needs accuracy of about l Hz. and requires about 9 bits per frame. The V/UV decisions are encoded with one bit per decision. The remaining bits are allocated to error control and the spectral envelope information.