Presentation on theme: "Advanced Speech Enhancement in Noisy Environments"— Presentation transcript:
1Advanced Speech Enhancement in Noisy Environments Qiming ZhuSupervisor: Prof. John SoraghanCentre for excellence in Signal and Image ProcessingDept Electronic and Electrical Engineering
2Presentation structure IntroductionSpeech EnhancementImproved Minima Controlled Recursive Averaging (IMCRA)Robust Voice Activity Detection (VAD)1-D Local Binary Pattern (LBP)1-D LBP of energy based VADPerformance EvaluationImproved IMCRADiscussion & Conclusion
3Introduction Automatic speech recognition (ASR) Speech recognition system aims to create intelligent machines that can ‘hear’, ‘understand’ and ‘comply’ to speech input.Speech enhancement and VAD are applied as the integral parts in ASR system.Aim of current researchImprove the recognition system performance in babble noisy background.
4IMCRA IMCRA: IMCRA Processing * Israel Cohen, ‘Noise spectrum estimation in adverse environments: improved minimacontrolled recursive averaging.’ (IEEE Tran. On speech and audio, 2003)
5IMCRA with babble IMCRA Performance Clean Signal: Noisy Signal at 0 dB:Enhanced by IMCRA:
61-D LBP 2-D LBP 1-D LBP Extensively used in 2-D image processing Used for 1-D signal processing (Navin Chatlani, EUSIPCO 2010, Qiming Zhu, EUSIPCO 2012)LBP Code Calculation:𝑳𝑩𝑷 𝑷 𝒙 𝒊 = 𝒓=𝟎 𝑷 𝟐 −𝟏 𝑺 𝒙 𝒊+𝒓− 𝑷 𝟐 −𝒙 𝒊 𝟐 𝒓 +𝑺 𝒙 𝒊+𝒓+𝟏 −𝒙 𝒊 𝟐 𝒓+ 𝑷 𝟐where P is the number of neighbouring samples used. The Sign function S[∙] is:𝑺 𝒙 = 𝟏, 𝒇𝒐𝒓 𝒙≥𝟎 𝟎, 𝒇𝒐𝒓 𝒙<𝟎On-set detection of Myoelectric signal (Paul McCool, EUSIPCO 2012)
7LBP code calculation for p=8 1-D LBP code1-D LBP calculate the LBP code after thresholding the neighbour samples.LBP code calculation for p=8*Navin Chatlani et al, ‘Local binary patterns for 1-D signal processing’, (EUSIPCO 2010)
81-D LBP histogramThe distribution of the LBP codes can perform a histogram to describe the continuous signal 𝑥 𝑖 with the window size of N:𝑯 𝒃 = 𝑷 𝟐 ≤𝒊≤𝑵− 𝑷 𝟐 𝜹( 𝑳𝑩𝑷 𝑷 𝒙 𝒊 ,𝒃)where 𝑏=0,1,⋯,𝐵 and B is the number of histogram bins. δ i,j is Kronecker Delta function.1-D LBP perform the Histogram with the window dataOverview of 1-D LBP procedure on a histogram
9Speech Signals and the Short-time Energy 1-D LBP of energyShort-time energy and the histogramSpeech Signals and the Short-time Energya) energy of clean speech signal, b) energy of noisy speech signal, c) histogram of clean speech energy, d) histogram of noisy speech energy.
101-D LBP of energy with offset value LBP code with offset values 𝜶𝐿𝐵𝑃 𝑃 ′ 𝐸 𝑚 = 𝑟=0 𝑃 2 −1 𝑆 𝐸 𝑚+𝑟− 𝑃 2 −𝐸 𝑚 −𝛼 2 𝑟 +𝑆 𝐸 𝑚+𝑟+1 −𝐸 𝑚 −𝛼 2 𝑟+ 𝑃 2𝑯 𝟎 of the Energy with Different offset value 𝛂a) 𝐸 𝑚 of noisy signal, b) 𝐻 0 with 𝛼=0.01, c) 𝐻 0 with 𝛼=0.02,d) 𝐻 0 with 𝛼=0.03, e) 𝐻 0 with 𝛼=0.04, f) 𝐻 0 with 𝛼=0.05
111-D LBP of energy based VAD System block diagramVAD block diagram
12VAD performance Experimental background Test speech sampling frequency is 16 kHz.The total length of the test set used is 73 seconds. Mixed with babble noise from 0-20 dB. 𝛼 set to be 0.03.VAD 1: 1-D LBP of energy based VAD.VAD 0: VAD proposed by Navin Chatlani.G.729: G.729 B Standard VAD.HR0: Speech absence hit-rate: 𝑯𝑹 𝟎 = 𝑵 𝟎,𝟎 𝑵 𝟎 𝒓𝒆𝒇FAR0: Speech absence false alarm rate: 𝑭𝑨𝑹 𝟎 =𝟏− 𝑵 𝟏,𝟏 𝑵 𝟏 𝒓𝒆𝒇
14Improved IMCRA Experimental background 198 samples from VoxForge database, includes 9 people: 6 males and 3 females. Sampling frequency at 16 kHz.Babble noise from NOISEX-92 Database added at SNR from -10 dB to 10 dB.Energy widow size set to be 5 ms, p=2, histogram size set to be 30 ms.Segmental SNR and weighted spectrum slope (WSS) are used to compare the performance.*Klatt et al, ‘Prediction in perceived phonetic distance from critical band spectra’,IEEE Conference on Acoustics, 1982
15Improved IMCRA with babble noise PerformanceClean signal:Noisy signal ( SNR at 0 dB):IMCRA:Improved IMCRA:
16Improved IMCRA with babble noise PerformanceSegmental SNR
17Improved IMCRA with babble noise PerformanceWeighted spectrum slope
18Discussion Conclusion for the results Future work 1-D LBP in energy domain can distinguish the voiced and unvoiced components of noisy speech signals.LBP in energy domain is shown to be superior to the G.729 VAD and Navin’s LBP VAD.Improved IMCRA is superior to IMCRA with enhanced segmental SNR and higher likelihood.Future workApplied this algorithm as the pre-processing of a ASR system.
19AcknowledgeThank Prof. John Soraghan for the idea of babble noise reduction.Thank Paul and Navin for the previous work on 1-D LBP.