Advanced Speech Enhancement in Noisy Environments

Slides:



Advertisements
Similar presentations
Improved ASR in noise using harmonic decomposition Introduction Pitch-Scaled Harmonic Filter Recognition Experiments Results Conclusion aperiodic contribution.
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Speech Enhancement through Noise Reduction By Yating & Kundan.
An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:
Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation Man-Wai MAK and Hon-Bill YU The Hong Kong Polytechnic University.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Advances in WP1 Turin Meeting – 9-10 March
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 664 Final Presentation May 2009 Dr. Radu Balan Department of Mathematics.
Advances in WP1 Nancy Meeting – 6-7 July
Communications & Multimedia Signal Processing Frequency Kalman Noise Reduction Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel.
Communications & Multimedia Signal Processing Report of Work on Formant Tracking LP Models and Plans on Integration with Harmonic Plus Noise Model Qin.
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
Speech Recognition in Noise
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Communications & Multimedia Signal Processing Refinement in FTLP-HNM system for Speech Enhancement Qin Yan Communication & Multimedia Signal Processing.
Advances in WP1 and WP2 Paris Meeting – 11 febr
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Microphone Integration – Can Improve ARS Accuracy? Tom Houy
Nico De Clercq Pieter Gijsenbergh Noise reduction in hearing aids: Generalised Sidelobe Canceller.
Speech Enhancement Using Spectral Subtraction
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 1) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
2010/12/11 Frequency Domain Blind Source Separation Based Noise Suppression to Hearing Aids (Part 3) Presenter: Cian-Bei Hong Advisor: Dr. Yeou-Jiunn Chen.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
PHASE-BASED DUAL-MICROPHONE SPEECH ENHANCEMENT USING A PRIOR SPEECH MODEL Guangji Shi, M.A.Sc. Ph.D. Candidate University of Toronto Research Supervisor:
Statistical Signal Processing Research Laboratory(SSPRL) UT Acoustic Laboratory(UTAL) A TWO-STAGE DATA-DRIVEN SINGLE MICROPHONE SPEECH ENHANCEMENT WITH.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
UCD Electronic and Electrical Engineering Robust Multi-modal Person Identification with Tolerance of Facial Expression Niall Fox Dr Richard Reilly University.
Statistical Signal Processing Research Laboratory(SSPRL) UT Acoustic Laboratory(UTAL) SINGLE CHANNEL SPEECH ENHANCEMENT TECHNIQUE FOR LOW SNR QUASI-PERIODIC.
Speech Enhancement based on
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
Presented By: Shamil. C Roll no: 68 E.I Guided By: Asif Ali Lecturer in E.I.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Speech Enhancement Summer 2009
4aPPa32. How Susceptibility To Noise Varies Across Speech Frequencies
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Speech and Audio Processing
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
A maximum likelihood estimation and training on the fly approach
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Speech Enhancement Based on Nonparametric Factor Analysis
Presentation transcript:

Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering q.zhu@strath.ac.uk

Presentation structure Introduction Speech Enhancement Improved Minima Controlled Recursive Averaging (IMCRA) Robust Voice Activity Detection (VAD) 1-D Local Binary Pattern (LBP) 1-D LBP of energy based VAD Performance Evaluation Improved IMCRA Discussion & Conclusion

Introduction Automatic speech recognition (ASR) Speech recognition system aims to create intelligent machines that can ‘hear’, ‘understand’ and ‘comply’ to speech input. Speech enhancement and VAD are applied as the integral parts in ASR system. Aim of current research Improve the recognition system performance in babble noisy background.

IMCRA IMCRA: IMCRA Processing * Israel Cohen, ‘Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging.’ (IEEE Tran. On speech and audio, 2003)

IMCRA with babble IMCRA Performance Clean Signal: Noisy Signal at 0 dB: Enhanced by IMCRA:

1-D LBP 2-D LBP 1-D LBP Extensively used in 2-D image processing Used for 1-D signal processing (Navin Chatlani, EUSIPCO 2010, Qiming Zhu, EUSIPCO 2012) LBP Code Calculation: 𝑳𝑩𝑷 𝑷 𝒙 𝒊 = 𝒓=𝟎 𝑷 𝟐 −𝟏 𝑺 𝒙 𝒊+𝒓− 𝑷 𝟐 −𝒙 𝒊 𝟐 𝒓 +𝑺 𝒙 𝒊+𝒓+𝟏 −𝒙 𝒊 𝟐 𝒓+ 𝑷 𝟐 where P is the number of neighbouring samples used. The Sign function S[∙] is: 𝑺 𝒙 = 𝟏, 𝒇𝒐𝒓 𝒙≥𝟎 𝟎, 𝒇𝒐𝒓 𝒙<𝟎 On-set detection of Myoelectric signal (Paul McCool, EUSIPCO 2012)

LBP code calculation for p=8 1-D LBP code 1-D LBP calculate the LBP code after thresholding the neighbour samples. LBP code calculation for p=8 *Navin Chatlani et al, ‘Local binary patterns for 1-D signal processing’, (EUSIPCO 2010)

1-D LBP histogram The distribution of the LBP codes can perform a histogram to describe the continuous signal 𝑥 𝑖 with the window size of N: 𝑯 𝒃 = 𝑷 𝟐 ≤𝒊≤𝑵− 𝑷 𝟐 𝜹( 𝑳𝑩𝑷 𝑷 𝒙 𝒊 ,𝒃) where 𝑏=0,1,⋯,𝐵 and B is the number of histogram bins. δ i,j is Kronecker Delta function. 1-D LBP perform the Histogram with the window data Overview of 1-D LBP procedure on a histogram

Speech Signals and the Short-time Energy 1-D LBP of energy Short-time energy and the histogram Speech Signals and the Short-time Energy a) energy of clean speech signal, b) energy of noisy speech signal, c) histogram of clean speech energy, d) histogram of noisy speech energy.

1-D LBP of energy with offset value LBP code with offset values 𝜶 𝐿𝐵𝑃 𝑃 ′ 𝐸 𝑚 = 𝑟=0 𝑃 2 −1 𝑆 𝐸 𝑚+𝑟− 𝑃 2 −𝐸 𝑚 −𝛼 2 𝑟 +𝑆 𝐸 𝑚+𝑟+1 −𝐸 𝑚 −𝛼 2 𝑟+ 𝑃 2 𝑯 𝟎 of the Energy with Different offset value 𝛂 a) 𝐸 𝑚 of noisy signal, b) 𝐻 0 with 𝛼=0.01, c) 𝐻 0 with 𝛼=0.02, d) 𝐻 0 with 𝛼=0.03, e) 𝐻 0 with 𝛼=0.04, f) 𝐻 0 with 𝛼=0.05

1-D LBP of energy based VAD System block diagram VAD block diagram

VAD performance Experimental background Test speech sampling frequency is 16 kHz.The total length of the test set used is 73 seconds. Mixed with babble noise from 0-20 dB. 𝛼 set to be 0.03. VAD 1: 1-D LBP of energy based VAD. VAD 0: VAD proposed by Navin Chatlani. G.729: G.729 B Standard VAD. HR0: Speech absence hit-rate: 𝑯𝑹 𝟎 = 𝑵 𝟎,𝟎 𝑵 𝟎 𝒓𝒆𝒇 FAR0: Speech absence false alarm rate: 𝑭𝑨𝑹 𝟎 =𝟏− 𝑵 𝟏,𝟏 𝑵 𝟏 𝒓𝒆𝒇

VAD performance VAD performance VAD performance

Improved IMCRA Experimental background 198 samples from VoxForge database, includes 9 people: 6 males and 3 females. Sampling frequency at 16 kHz. Babble noise from NOISEX-92 Database added at SNR from -10 dB to 10 dB. Energy widow size set to be 5 ms, p=2, histogram size set to be 30 ms. Segmental SNR and weighted spectrum slope (WSS) are used to compare the performance. *Klatt et al, ‘Prediction in perceived phonetic distance from critical band spectra’, IEEE Conference on Acoustics, 1982

Improved IMCRA with babble noise Performance Clean signal: Noisy signal ( SNR at 0 dB): IMCRA: Improved IMCRA:

Improved IMCRA with babble noise Performance Segmental SNR

Improved IMCRA with babble noise Performance Weighted spectrum slope

Discussion Conclusion for the results Future work 1-D LBP in energy domain can distinguish the voiced and unvoiced components of noisy speech signals. LBP in energy domain is shown to be superior to the G.729 VAD and Navin’s LBP VAD. Improved IMCRA is superior to IMCRA with enhanced segmental SNR and higher likelihood. Future work Applied this algorithm as the pre-processing of a ASR system.

Acknowledge Thank Prof. John Soraghan for the idea of babble noise reduction. Thank Paul and Navin for the previous work on 1-D LBP.

Thank you! Any Question?