Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speech Enhancement through Noise Reduction By Yating & Kundan.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Communications & Multimedia Signal Processing Frequency Kalman Noise Reduction Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel.
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department.
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
Bayesian Methods for Speech Enhancement I. Andrianakis P. R. White Signal Processing and Control Group Institute of Sound and Vibration Research University.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Communications & Multimedia Signal Processing Formant Based Synthesizer Qin Yan Communication & Multimedia Signal Processing Group Dept of Electronic.
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
Communications & Multimedia Signal Processing 1 Speech Communication for Mobile and Hands-Free Devices in Noisy Environments EPSRC Project GR/S30238/01.
Speech Recognition in Noise
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
1 Speech Enhancement Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Adaptive Signal Processing
RLSELE Adaptive Signal Processing 1 Recursive Least-Squares (RLS) Adaptive Filters.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
An algorithm for dynamic spectrum allocation in shadowing environment and with communication constraints Konstantinos Koufos Helsinki University of Technology.
Digital Audio Signal Processing Lecture-4: Noise Reduction Marc Moonen/Alexander Bertrand Dept. E.E./ESAT-STADIUS, KU Leuven
Eigenstructure Methods for Noise Covariance Estimation Olawoye Oyeyele AICIP Group Presentation April 29th, 2003.
Speech Enhancement Using Spectral Subtraction
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Baseband Demodulation/Detection
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.
3.7 Adaptive filtering Joonas Vanninen Antonio Palomino Alarcos.
Digital Audio Signal Processing Lecture-3 Noise Reduction
Autoregressive (AR) Spectral Estimation
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
PART II: TRANSIENT SUPPRESSION. IntroductionIntroduction Cohen, Gannot and Talmon\11 2 Transient Interference Suppression Transient Interference Suppression.
RECONSTRUCTION OF MULTI- SPECTRAL IMAGES USING MAP Gaurav.
Speech Enhancement Summer 2009
National Mathematics Day
Vocoders.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Equalization in a wideband TDMA system
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Equalization in a wideband TDMA system
A Tutorial on Bayesian Speech Feature Enhancement
EE513 Audio Signals and Systems
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Dealing with Acoustic Noise Part 1: Spectral Estimation
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July, 2005

Communications & Multimedia Signal Processing Contents Review of Noise Reduction Methods (more Results) –Review of the methods –DFT-Kalman, a new method for parameter estimation –Evaluation results and sample speech signals FTLP-HNM Model –FTLP-HNM for gap restoration Noise Station –An Interface for the programs

Communications & Multimedia Signal Processing Review of Noise Reduction Methods Most noise reduction systems fit to this block- diagram The de-noising method is based on: –Spectral subtraction, or –Bayesian Estimation

Communications & Multimedia Signal Processing Spectral Subtraction Where S, X and N are the speech, noisy speech and noise spectral amplitudes, k is the frequency index, α is the power exponent A and B are attenuation and subtraction coefficients respectively and T is the dynamic threshold Spectral subtraction methods vary with the methods used to for estimation of A and B Spectral subtraction method is generally formulized as:

Communications & Multimedia Signal Processing Spectral Subtraction Simple SS: Constant A and B (e.g. A=1, B=1, T=0 α=1 or 2) Adaptive Spectral Subtraction: –Using a posteriori SNR (uses only the speech information in current frame) –Using a priori SNR (tracks the fluctuations of speech in successive frames) –Using a posteriori and a priori SNRs (e.g. optimized to give the MMSE) Different algorithms are used for calculation of the threshold The number of negative values resulting from spectral subtraction could be large and depends on the noise spectrum and SNR

Communications & Multimedia Signal Processing Bayesian Estimation Frames are independent: –Estimation of ST-DFT components (real and imaginary) Gaussian-Gaussian (Wiener) Other distributions for speech and noise (various estimators by Martin) –Estimation of the amplitude and using noisy phase Amplitude, log-Amplitudes, Power (different parameters to be estimated) Gaussian, Gaussian Mixtures (needs training), Laplacian (computationally not feasible) Criteria: MMSE, MAP, Joint phase and amplitude MAP, etc. –Methods for parameter estimation use inter-frame information Frames are not independent: –DFT-Kalman

Communications & Multimedia Signal Processing Bayesian Estimation Wiener: speech always suppressed Distributions vary from phoneme to phoneme and frequency to frequency Average Symetric Kullback-Leibler Distance

Communications & Multimedia Signal Processing DFT-Kalman Incorporate the AR model of the short-time DFT trajectories for estimation Gaussian Distribution Noise in each ST-DFT channel is assumed to be WGN

Communications & Multimedia Signal Processing DFT-Kalman During noise only periods the output converges to zero, making the whole output zero In order to avoid too small values of LP error covariance, Q, during speech active periods: Q=max (Q,m×|X(k)| 2 ) (0.05) 2 <m<(0.30) 2 Small values of m results in further reduction of background noise but results in more distortion of the speech signal.

Communications & Multimedia Signal Processing DFT-Kalman Another method is based on spectral subtraction of the ST-DFT Trajectories. An autocorrelation vector is obtained using spectral subtraction at the start of the speech after long noise-only periods: Where L+1 is the number of samples used in calculation of the autocorrelation vector and X r (n) is the real component of the ST-DFT trajectories at frame n and an arbitrary frequency. Similar equations hold for the imaginary components.

Communications & Multimedia Signal Processing DFT-Kalman Where n 1 is the frame index of the first speech segment detected. Regardless of the presence of speech if the variance of the excitation of the AR model is lower than a fixed threshold, a weighted average of the spectral subtraction-based autocorrelation and the autocorrelation of the previous estimates of the ST-DFT trajectories is used: This autocorrelation is linearly combined with the estimated autocorrelation obtained from previous estimated samples:

Communications & Multimedia Signal Processing Evaluation of the methods The correlation coefficient between different distortion measures and the mean opinion score (MOS) of 90 sentences is calculated (noisy, clean and de-noised) (number of listeners: 10) PESQ has the highest correlation with the MOS results

Communications & Multimedia Signal Processing PESQ – Car Noise SASS: Simple Amplitude SSBPSS: a post. Power SSMBSS: Multiband SS SSAPR: a priori Amplitude SSPSS: Parametric SS MMSE STSA: Ephraim’s Amp. EstimatorMMSE LSA: Ephraim’s Log-Amp. Estimator GGDFT: Martin’s Gamma-Gamma DFT Estimator

Communications & Multimedia Signal Processing PESQ – Train Noise SASS: Simple Amplitude SSBPSS: a post. Power SSMBSS: Multiband SS SSAPR: a priori Amplitude SSPSS: Parametric SS MMSE STSA: Ephraim’s Amp. EstimatorMMSE LSA: Ephraim’s Log-Amp. Estimator GGDFT: Martin’s Gamma-Gamma DFT Estimator

Communications & Multimedia Signal Processing Mean Opinion Score – Car Noise SASS: Simple Amplitude SSBPSS: a post. Power SSMBSS: Multiband SS SSAPR: a priori Amplitude SSPSS: Parametric SS MMSE STSA: Ephraim’s Amp. EstimatorMMSE LSA: Ephraim’s Log-Amp. Estimator GGDFT: Martin’s Gamma-Gamma DFT Estimator

Communications & Multimedia Signal Processing Mean Opinion Score – Train Noise SASS: Simple Amplitude SSBPSS: a post. Power SSMBSS: Multiband SS SSAPR: a priori Amplitude SSPSS: Parametric SS MMSE STSA: Ephraim’s Amp. EstimatorMMSE LSA: Ephraim’s Log-Amp. Estimator GGDFT: Martin’s Gamma-Gamma DFT Estimator

Communications & Multimedia Signal Processing Sample Speech Signals Car Noise Noisy SASS BPSS MBSS SSAPR PSS Wiener MMSE STSA MMSE LSA GGDFT DFTK DFTSS Train Noise Noisy SASS BPSS MBSS SSAPR PSS Wiener MMSE STSA MMSE LSA GGDFT DFTK DFTSS Clean Signal SASS: Simple Amplitude SSBPSS: a post. Power SSMBSS: Multiband SS SSAPR: a priori Amplitude SSPSS: Parametric SS MMSE STSA: Ephraim’s Amp. EstimatorMMSE LSA: Ephraim’s Log-Amp. Estimator GGDFT: Martin’s Gamma-Gamma DFT Estimator

Communications & Multimedia Signal Processing Future and Present Work Investigate the effect of incorporating noise AR model in the Kalman formulation: Where F’s are the state transition matrices of speech and noise. Clean speech would a by- product of the Kalman filtering

Communications & Multimedia Signal Processing Future and Present Work Development of FTLP- HNM model together with the group and explore its potential for: –Gap Restoration, –Speech Enhancement, and –(possibly) Coding The problem with phase in gap restoration Sample

Communications & Multimedia Signal Processing Future and Present Work Further development of the Noise Station program

Communications & Multimedia Signal Processing Future and Present Work Current capabilities: –Open/Close/Save/Amplify/Play/Resample wave signals –Frame by Frame and overall viewing of signal/FFT/LP Spectrum/Excitation/Formants/Pitch Frequency/Harmonics –Add Noise/De-Noise (different methods)/Distortion Measurement –Formant/Pitch/Harmonic Tracking and viewing Future capabilities –An option for adding new methods (de-noising, pitch tracking, etc) easily

Communications & Multimedia Signal Processing Future and Present Work function output=MMSESTSA84_NS(signal,fs,P) % output=MMSESTSA84_NS(signal,fs,P) % HELP AND DIRECTIONS APPEARE HERE % Author: - % Date: Dec-04 % INITIALIZE ALL THE PARAMETERS HERE PARAMETER IS=.25; %INITIAL SILENCE LENGTH alpha=.99; %DECISION DIRECTED PARAMETER if (nargin>=3 & isstruct(P)) %EXTRACTING PARAMETERS if isfield(P,'alpha') alpha=IS.alpha; %DECISION DIRECTED PARAMETER else alpha=.99; %DECISION DIRECTED PARAMETER end if isfield(P,'IS') IS=P.IS; else IS=.25; %INITIAL SILENCE LENGTH end %THE PROGRAM STARTS HERE Template for the Programs