Patrick-André Savard, Philippe Gournay and Roch Lefebvre Université de Sherbrooke, Québec, Canada.

Slides:



Advertisements
Similar presentations
[1] AN ANALYSIS OF DIGITAL WATERMARKING IN FREQUENCY DOMAIN.
Advertisements

Philippe Gournay, Bruno Bessette, Roch Lefebvre
Lock-in amplifiers Signals and noise Frequency dependence of noise Low frequency ~ 1 / f –example: temperature (0.1 Hz), pressure.
Visualization of dynamic power and synchrony changes in high density EEG A. Alba 1, T. Harmony2, J.L. Marroquín 2, E. Arce 1 1 Facultad de Ciencias, UASLP.
Time-scale and pitch modification Algorithms review Alexey Lukin.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Prosody modification in speech signals Project by Edi Fridman & Alex Zalts supervision by Yizhar Lavner.
Logo Prosodic Manipulation Advanced Signal Processing, SE David Ludwig
Final Year Project Pat Hurney Digital Pitch Correction for Electric Guitars.
A Robust Algorithm for Pitch Tracking David Talkin Hsiao-Tsung Hung.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Block Convolution: overlap-save method  Input Signal x[n]: arbitrary length  Impulse response of the filter h[n]: lenght P  Block Size: N  we take.
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Keyword Spotting Using Crosscorrelation Engineering Expo Banquet 2009.
Project by Fridman Eduard Supervisor and Escort Dr. Yizhar Lavner SIPL Lab experiment onTime-Scale and Pitch- Scale Modifications of Speech.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Time-Scale Modification of Speech Signal
FFT-based filtering and the Short-Time Fourier Transform (STFT) R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
On improving the intelligibility of synchronized over-lap-and-add (SOLA) at low TSM factor Wong, P.H.W.; Au, O.C.; Wong, J.W.C.; Lau, W.H.B. TENCON '97.
System Microphone Keyboard Output. Cross Synthesis: Two Implementations.
Effects in frequency domain Stefania Serafin Music Informatics Fall 2004.
Yi Liang July 12, 2000 Adaptive Playout Time Control with Time-scale Packet Modification.
H. Sanneck*, A. Stenger, K. Ben Younes, B. Girod
A REAL-TIME VIDEO OBJECT SEGMENTATION ALGORITHM BASED ON CHANGE DETECTION AND BACKGROUND UPDATING 楊靜杰 95/5/18.
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Lock-in amplifiers
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Speech Enhancement Using Spectral Subtraction
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
Pitch Estimation by Enhanced Super Resolution determinator By Sunya Santananchai Chia-Ho Ling.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
1.INTRODUCTION The use of the adaptive codebook (ACB) in CELP-like speech coders allows the achievement of high quality speech, especially for voiced segments.
In CELP coders, the past excitation signal used to build the adaptive codebook is the main source of error propagation when a frame is lost. We presents.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Graphcut Textures Image and Video Synthesis Using Graph Cuts
More On Linear Predictive Analysis
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
A UDIO B ANDWIDTH D ETECTION IN THE EVS C ODEC University of Sherbrooke, Canada VoiceAge Corporation, Montréal, Canada Fraunhofer IIS, Erlagen, Germany.
A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.
Time Compression/Expansion Independent of Pitch. Listening Dies Irae from Requiem, by Michel Chion (1973)
Piano Music Transcription Wes “Crusher” Hatch MUMT-614 Thurs., Feb.13.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Ch. 2 : Preprocessing of audio signals in time and frequency domain
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
Vocoders.
Time-Scale Modification of Speech Signal
FFT-based filtering and the
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Mohamed Chibani, Roch Lefebvre and Philippe Gournay
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Linear Predictive Coding Methods
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
A System for Hybridizing Vocal Performance
Minimum Weighted Norm Extrapolation Using Frequency Domain Blocking for Digital Audio By J. Alex Souppa April 16, 1999.
Presentation transcript:

Patrick-André Savard, Philippe Gournay and Roch Lefebvre Université de Sherbrooke, Québec, Canada

 Problem description  Prior art ◦ Synchronized overlap-add w/fixed syn. (SOLAFS) ◦ Improved phase vocoder  Hybrid time-scale modification ◦ High level algorithm ◦ Classification ◦ Main algorithm ◦ Mode transition  Performance evaluation ◦ Classification performance ◦ Subjective testing results

 What is time-scale modification?  Subject of interest: ◦ Subjective quality of time-scaled signals  Existing methods: ◦ Time vs frequency approaches ◦ High quality results on specific types of signals  TSM applied to various signal types ◦ Can be speech, music, or mixed-type signals  There is a need for a more “universal” method

Input Signal Output Signal SaSa SsSs W LEN delay

 Based on the block- by-block STFT analysis/synthesis model  STFT phases are updated so as to preserve instantaneous frequencies  STFT amplitudes are preserved  STFT modification Improvements Peak-detection Compute inst. freq.for peaks Define regions of influence Update peak phases Apply phase- lock.to ROIs   STFT modification stage   FFT IFFT Overlap-add and gain control N RaRa RsRs

 Uses a frame-by- frame model  Each frame goes through a classifier  Signals identified as monophonic are processed using SOLAFS  Signals identified as polyphonic or noisy are processed using the phase vocoder

 Goal: ◦ Discriminate monophonic/polyphonic/noise signals  Method used: ◦ Test the maximum of the normalized cross- correlation (C.C.) measure in SOLAFS for each analysis window Music Signal Speech Signal UnvoicedVoiced Voiced speech: High C.C. Music: Low to medium C.C. Unvoiced speech: Low & high C.C.

SOLAFS processing R max <T xcorr  Default method: SOLAFS  Switches to phase vocoder when R max <T xcorr  Constraint on minimum length of a SOLAFS synthesis segment Frame 1 Frame 2 SOLAFS processing R max <T xcorr SOLAFS processing Phase vocoder processing Frame 1 Frame 2 discarded

 Phase vocoder initialization:  Synthesis padded with input samples  Initialization based on matching input/output samples  Gain control:  More padding needed  Synthesis further padded and windowed to reproduce a phase vocoder output Last SOLAFS synthesis window Output signal padded with input samples Initialization based on matching input/output samples Previously padded synthesis More padding using input samples Resulting synthesis is windowed First phase vocoder synthesis window overlaps coherently

 Current frame’s first analysis window is out of phase with current output signal  Assume that the current input frame contains a stationary signal  First input window is one phase vocoder analysis step ahead  First SOLAFS segment is OLA at the last phase vocoder synthesis step  SOLAFS synthesis samples (after the first OLA region) replace synthesis samples obtained by the phase vocoder Previous frame Current frame Synthesis signal (before transition) First SOLAFS synthesis window Subsequent SOLAFS synthesis windows Current frame’s first analysis window (not in phase with current output)Approximately in phase with current output

 Signal length =1 second  T max =0.6  Unvoiced speech is successfully detected  Triggers phase vocoder processing

 Signal length = 25 seconds  T max =0.6  Classification results:  91 % phase vocoder  9 % SOLAFS

 A/B method  Speech, music and mixed content (speech over music) samples tested  Hybrid method compared to stand-alone techniques  Comparisons performed on compressed and expanded signals  Eight listeners took part of the test  Samples evaluated using a 5 step scale

 A hybrid TSM method is presented ◦ Uses a frame-by-frame classification stage ◦ Selects the best method based on the input signal monophonic/polyphonic/noise character ◦ Mode transitions  High quality results are obtained ◦ Using speech, music and mixed-content signals  Future work ◦ Refine the classification criterion ◦ Use of phase flexibility to improve phase coherence would improve phase vocoder to SOLAFS transitions

 Contact: