EE Dept., IIT Bombay Indicon2013, Mumbai, 13-15 Dec. 2013, Paper No. 524 (Track 4.1,

nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com EE Dept., IIT Bombay Indicon2013, Mumbai, 13-15 Dec. 2013, Paper No. 524 (Track 4.1, Sat., 14 th Dec., 1730 – 1900 ) Speech Enhancement and Multi-band Frequency Compression for Suppression of Noise and Intraspeech Spectral Masking in Hearing Aids Nitya Tiwari, Santosh K. Waddi, Prem C. Pandey {nitya, pcpandey} @ ee.iitb.ac.in santosh4b6 @ gmail.com IIT Bombay

EE Dept., IIT Bombay 2/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Overview 1.Introduction 2.Noise Suppression 3.Multi-band Frequency Compression 4.Implementation for Real-time Processing 5.Test Results 6.Summary & Conclusion

EE Dept., IIT Bombay 3/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com 1. Introduction Sensorineural hearing loss Increased hearing thresholds and high frequency loss Decreased dynamic range & abnormal loudness growth Increased spectral & temporal masking → Degraded speech perception, particularly in noisy environment Signal processing in hearing aids Frequency selective amplification Automatic volume control Multichannel dynamic range compression ( settable attack time, release time, and compression ratios)

EE Dept., IIT Bombay 4/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Single-input speech enhancement for reducing the background noise (Boll 1979, Berouti et al. 1979, Martin 1994, Loizou 2007, Paliwal et al. 2010) Dynamic estimation of non-stationary noise spectrum during non-speech segments using voice activity detection, or continuously using statistical techniques Estimation of noise-free speech spectrum spectral noise subtraction, or multiplication by noise suppression function Speech resynthesis using enhanced magnitude and noisy phase

EE Dept., IIT Bombay 5/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Multi-band frequency compression for reducing the effect of increased spectral masking (Arai et al. 2004, Kulkarni et al. 2012) Splitting short-time spectrum into analysis bands and compressing the spectral samples towards the band center, for presenting the speech energy in relatively narrow bands to avoid masking by adjacent spectral components. Segmentation and spectral analysis Analysis-synthesis: fixed-frame or pitch-synchronous Analysis bands: constant bandwidth or auditory critical bandwidth Spectral modification o Modifying magnitude spectrum with original phase (Arai et al. 2004) o Modifying complex spectrum to reduce computation & processing related artifacts (Kulkarni et al. 2012) Speech resynthesis using overlap add method

EE Dept., IIT Bombay 6/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Research objective Real-time single-input speech enhancement and multi-band frequency compression for improving speech perception by persons with moderate sensorineural loss. Main challenges Noise estimation without voice activity detection Multi-band frequency compression with low processing artifacts Low signal delay (algorithmic + computational) for real-time application Low computational complexity & memory requirement for implementation on a low-power processor

EE Dept., IIT Bombay 7/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Proposed technique Spectral subtraction using cascaded-median based continuous updating of the noise spectrum, without using voice activity detection Multi-band frequency compression based on least square error estimation (LSEE) of modified spectrum Investigations using offline implementation Selection of processing parameters Real-time implementation 16-bit fixed-point DSP with on-chip FFT hardware Evaluation of the implementations Informal listening, PESQ measure

EE Dept., IIT Bombay 8/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com 2. Noise Suppression Power subtraction Windowed speech spectrum = X n (k) Estimated noise mag. spectrum = D n (k) Estimated speech spectrum Y n (k) = [|X n (k)| 2 – (D n (k)) 2 ] 0.5 e j<X n (k) Problems: residual noise due to under-subtraction, distortion in the form of musical noise & clipping due to over-subtraction. Generalized spectral subtraction (Berouti et al. 1979) |Y n (k)| = β 1/γ D n (k), |X n (k)| < (α + β) 1/γ D n (k) [ |X n (k)| γ – α(D n (k)) γ ] 1/γ, otherwise γ : exponent factor, α : o ver-subtraction factor, β : floor factor Re-synthesis with noisy phase without explicit phase calculation Y n (k) = |Y n (k)| X n (k) / |X n (k)|

EE Dept., IIT Bombay 9/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Dynamic estimation of noise magnitude spectrum Pseudo-median based estimation (Basha & Pandey 2012) Moving median approximated by p -point q -stage cascaded-median, with a saving in memory & computation for real-time implementation. Estimation improved by weighted average of medians from different stages. Condition for reducing sorting operations and storage: low p, q ≈ ln(M) Median typeStorage per freq. binNo. of sortings per frame per freq. bin M -point moving median 2M2M(M–1)/2 p -point q -stage cascaded median ( M = p q ) pqp(p–1)/2

EE Dept., IIT Bombay 10/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Investigations using offline implementation of spectral subtraction f s : 10 kHz, Frame length: 25.6 ms, Overlap: 75%, FFT size N: 512 Dynamic estimation of noise spectrum: 3 -frame 5 -stage weighted- average cascaded-median ( M=243, p=3, q=5 ) Moving median over 1.55 s Reduction in storage requirement: 486 to 15 samples per freq. bin Reduction in sorting operations: 121 to 3 per frame per freq. bin Empirically determined weights for averaging: 0, 0, 0, 0.2, 0.6, 0.2 Best combination of processing parameters: β = 0.01, α = 1. Speech clipping at larger α.

EE Dept., IIT Bombay 11/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Kulkarni et al. 2012: Compression on complex spectrum to reduce computation & processing related artifact 3. Multi-band Frequency Compression Segmentation & spectral analysis Spectral modification with spectral segment mapping Re-synthesis using overlap-add a = k ic – [(k ic – (k' – 0.5)) / c], b = a + 1/c Y c (k') = (m – a) Y(m) + Y(j) + (b – n) Y(n) Spectral segment mapping Edges of input spectrum: a, b, Compression factor: c

EE Dept., IIT Bombay 12/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Problems in implementation for real-time processing Fixed-frame analysis-synthesis: perceptible distortions Pitch-synchronous analysis-synthesis: delay and computational complexity incompatible with real-time processing Results from listening tests (Kulkarni et al. 2012) Processing for maximum improvement in speech perception o Pitch-synchronous analysis-synthesis o Auditory critical band based compression o c = 0.6 Evaluation using Modified Rhyme Test on 8 hearing impaired subjects with moderate loss o Increase of 16.5 % in recognition score o Decrease of 0.89 s in response time

EE Dept., IIT Bombay 13/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Proposed solution for integrating multi-band frequency compression and suppression of background noise Common FFT based analysis-synthesis platform for computational efficiency Modified fixed-frame multi-band frequency compression using Griffin-Lim method of least-square error based signal estimation from modified STFT (Griffin & Lim, 1984) for avoiding processing artifacts o Windowing, multiplication with analysis window, & FFT o Spectral modification o IFFT, multiplication with analysis window, overlap-add Window requirement: sum of square of all overlapped window samples should be unity. Modified Hamming window: window length L & shift S = L / 4 w(n) = [1 / √(4d 2 + 2e 2 )][d + e cos(2π (n + 0.5) / L)], where d = 0.54, e = -0.46

EE Dept., IIT Bombay 14/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Processing steps Spectral subtraction → Enhanced magnitude spectrum Enhanced magnitude spectrum & original phase spectrum → Complex spectrum Multi-band frequency compression → Compressed complex spectrum Resynthesis using IFFT and overlap-add Investigations using offline implementation of modified multi-band frequency compression ( f s = 10 kHz, frame length = 25.6 ms, FFT length N = 512) No perceptible distortions: output of modified fixed-frame processing similar to that from pitch-synchronous processing used by Kulkarni et al., 2012. Modified fixed-frame processing also suitable for non-speech audio

EE Dept., IIT Bombay 15/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com 4. Implementation for Real-time Processing 16 -bit fixed point DSP: TI/TMS320C5515 16 MB memory space : 320 KB on-chip RAM with 64 KB dual access RAM, 128 KB on-chip ROM Three 32 -bit programmable timers, 4 DMA controllers each with 4 channels FFT hardware accelerator ( 8 to 1024 -point FFT) Max. clock speed: 120 MHz DSP Board: eZdsp 4 MB on-board NOR flash for user program Codec TLV320AIC3204: stereo ADC & DAC, 16 / 20 / 24 / 32 -bit quantization, 8 – 192 kHz sampling Development environment for C: TI's 'CCStudio, ver. 4. 0 '

EE Dept., IIT Bombay 16/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Implementation One codec channel (ADC and DAC) with 16 -bit quantization Sampling frequency: 10 kHz Window length of 25.6 ms ( L = 256 ) with 75 % overlap, FFT length N = 512 Storage of input samples, spectral values, processed samples: 16 -bit real & 16 - bit imaginary parts

EE Dept., IIT Bombay 17/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Data transfers and buffering operations ( S = L/4 ) DMA cyclic buffers – 5 block input buffer – 2 block output buffer (each with S samples) Pointers –current input block –just-filled input block –current output block –write-to output block (incremented cyclically on DMA interrupt) Signal delay –Algorithmic: 1 frame ( 25.6 ms) –Computational ≤ frame shift ( 6.4 ms)

EE Dept., IIT Bombay 18/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com 5. Test Results Test material Speech: “Where were you a year ago?” from a male speaker. Noise: white, pink, babble, car, and train noises (AURORA ). SNR: ∞, 15, 12, 9, 6, 3, 0, -3, -6 dB. Evaluation methods Informal listening Objective evaluation using PESQ measure (Scale: 0 – 4.5, acceptable: 2.5)

EE Dept., IIT Bombay 19/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Results from offline processing ( f s = 10 kHz, Frame length = 25.6 ms, FFT size N = 512, β = 0.01, α = 1, c = 0.6) Informal listening No audible roughness or distortion in the enhanced and compressed speech Spectral subtraction o PESQ improvement: 0.37 – 0.86, for input with 0 dB SNR o Equivalent SNR improvement: 4 – 13 dB for PESQ of 2.5 Multi-band frequency compression PESQ of modified fixed-frame processing with pitch-synchronous processing as reference: 3.7

EE Dept., IIT Bombay 20/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Clean speech Noisy speech Output after noise suppression Example of spectral subtraction Speech: “Where were you a year ago” Noise: white Input SNR: 3 dB

EE Dept., IIT Bombay 21/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Clean speech Compression on clean speech Compression after noise suppression Example of multi-band frequency compression Speech: “Where were you a year ago” Noise: white Input SNR: 3 dB c = 0.6

EE Dept., IIT Bombay 22/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Informal listening: real-time output perceptually similar to the offline output PESQ for real-time w.r.t. offline : 2.5 – 3.4 Signal delay = 36 ms Lowest processor clock for satisfactory operation = 39 MHz → Processing capacity used ≈ 1/3 of the capacity with the highest clock of 120 MHz Processing example Speech: “Where were you a year ago” Noise: white Input SNR: 3 dB Parameters: β = 0.02, α = 1, c = 0.6 Clean speech Noisy speech Output after noise suppression Compression on clean speech Compression after noise suppression Results of real-time processing

EE Dept., IIT Bombay 23/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com 6. Summary & Conclusions Integration of processing techniques to reduce the effects of background noise and increased intraspeech spectral masking associated with sensorineural hearing loss Cascaded-median weighted-average approximation of moving median for dynamic estimation of noise spectrum for suppression of background noise. Modified fixed-frame analysis-synthesis for multi-band frequency compression with low computational complexity and without perceptible distortions. Processing suitable for speech and non-speech audio. Processing implemented using 16-bit fixed-point DSP chip and tested for satisfactory operation.

EE Dept., IIT Bombay 24/24 nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Further work Implementation along with automatic gain control, multi-band amplitude compression, and frequency selective amplification Listening tests for evaluating the improvement in speech perception

EE Dept., IIT Bombay nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com

EE Dept., IIT Bombay nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com Abstract Sensorineural hearing impairment is associated with increased intraspeech spectral masking and results in degraded speech perception in noisy environment due to increased masking. Speech enhancement using spectral subtraction can be used for suppressing the external noise. Multi-band frequency compression of the complex spectral samples has been reported to reduce the effects of increased intraspeech masking. A combination of these techniques is implemented for real-time processing for improving speech perception by persons with moderate sensorineural loss. For reducing computational complexity and memory requirement, spectral subtraction is carried out using a cascaded-median based estimation of the noise spectrum without voice activity detection. Multi-band frequency compression, based on auditory critical bandwidths, is carried out using fixed-frame processing along with least-squares error based signal estimation to reduce the processing delay. To reduce computational complexity the two processing stages share the FFT based analysis-synthesis. The processing is implemented and tested for satisfactory operation, with sampling frequency of 10 kHz, 25.6 ms window with 75% overlap, using a 16-bit fixed-point DSP processor. The real-time operation is achieved with signal delay of approximately 36 ms and using about one-third of the computing capacity of the processor.

EE Dept., IIT Bombay nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com References [1] H. Levitt, J. M. Pickett, and R. A. Houde, Eds., Senosry Aids for the Hearing Impaired. New York: IEEE Press, 1980. [2] B. C. J. Moore, An Introduction to the Psychology of Hearing, London, UK: Academic, 1997, pp 66–107. [3]J. M. Pickett, The Acoustics of Speech Communication: Fundamentals, Speech Perception Theory, and Technology. Boston, Mass.: Allyn Bacon, 1999, pp. 289–323. [4] H. Dillon, Hearing Aids. New York: Thieme Medical, 2001. [5]T. Baer, B. C. J. Moore, and S. Gatehouse, “Spectral contrast enhancement of speech in noise for listeners with sensorineural hearing impairment: effects on intelligibility, quality, and response times”, Int. J. Rehab. Res., vol. 30, no. 1, pp. 49–72, 1993. [6]J. Yang, F. Luo, and A. Nehorai, “Spectral contrast enhancement: Algorithms and comparisons,” Speech Commun., vol. 39, no. 1–2, pp. 33–46, 2003. [7]T. Arai, K. Yasu, and N. Hodoshima, “Effective speech processing for various impaired listeners,” in Proc. 18th Int. Cong. Acoust. (ICA 2004), Kyoto, Japan, 2004, pp. 1389–1392. [8]K. Yasu, M. Hishitani, T. Arai, and Y. Murahara, “Critical-band based frequency compression for digital hearing aids,” Acoustical Science and Technology, vol. 25, no. 1, pp. 61-63, 2004. [9]P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti, “Multi-band frequency compression for improving speech perception by listeners with moderate sensorineural hearing loss,” Speech Commun., vol. 54, no. 3 pp. 341–350, 2012. [10]P. C. Loizou, Speech Enhancement: Theory and Practice. New York: CRC, 2007. [11]R. Martin, “Spectral subtraction based on minimum statistics,” in Proc. 7th Eur. Signal Processing Conf. (EUSIPCO'94), Edinburgh, U.K., 1994, pp. 1182-1185. [12]I. Cohen, “Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging,” IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 466-475, 2003. [13]H. Hirsch and C. Ehrlicher, “Noise estimation techniques for robust speech recognition,” in Proc. IEEE ICASSP 1995, Detroit, MI, pp. 153-156. [14]V. Stahl, A. Fisher, and R. Bipus, “Quantile based noise estimation for spectral subtraction and Wiener filtering,” in Proc. IEEE ICASSP 2000, Istanbul, Turkey, pp. 1875-1878.

EE Dept., IIT Bombay nitya@ee.iitb.ac.in, pcpandey@ee.iitb.ac.in, santosh4b6@gmail.com [15]M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE ICASSP 1979, Washington, DC, pp. 208-211. [16]S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 2, pp. 113-120, 1979. [17]Y. Lu and P. C. Loizou, “A geometric approach to spectral subtraction,” Speech Commun., vol. 50, no. 6, pp. 453-466, 2008. [18]K. Paliwal, K. Wójcicki, and B. Schwerin, “Single-channel speech enhancement using spectral subtraction in the short-time modulation domain,” Speech Commun., vol. 52, no. 5, pp. 450–475, 2010. [19]S. K. Waddi, P. C. Pandey, and N. Tiwari, “Speech enhancement using spectral subtraction and cascaded-median based noise estimation for hearing impaired listeners,” in Proc. Nat. Conf. Commun. (NCC 2013), Delhi, India, 2013, paper no. 1569696063. [20]N. Tiwari, P. C. Pandey, and P. N. Kulkarni, “Real-time implementation of multi-band frequency compression for listeners with moderate sensorineural impairment,” in Proc. 13th Annual Conf. of the Int. Speech Commun. Assoc. (Interspeech 2012), Portland, Oregon, 2012, paper no. 689. [21]D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. Acoustics, Speech, Signal Proc., vol. 32, no. 2, pp. 236-243, 1984. [22]Texas Instruments, Inc. (2011) TMS320C5515 Fixed-Point Digital Signal Processor. [online]. Available: focus.ti.com/lit/ds/ symlink/ tms320c5515.pdf [23]Spectrum Digital, Inc. (2010) TMS320C5515 eZdsp USB Stick Technical Reference. [online]. Available: support.spectrum digital.com/boards/usbstk5515/reva/files/usbstk5515_TechRef_RevA.pdf [24]Texas Instruments, Inc. (2008) TLV320AIC3204 Ultra Low Power Stereo Audio Codec. [online]. Available: focus.ti.com/lit/ds/ symlink/ tlv320aic3204.pdf [25]ITU, “Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow- band telephone networks and speech codecs,” ITU-T Rec., P.862, 2001.

EE Dept., IIT Bombay Indicon2013, Mumbai, 13-15 Dec. 2013, Paper No. 524 (Track 4.1,

Similar presentations

Presentation on theme: "EE Dept., IIT Bombay Indicon2013, Mumbai, 13-15 Dec. 2013, Paper No. 524 (Track 4.1,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EE Dept., IIT Bombay Indicon2013, Mumbai, 13-15 Dec. 2013, Paper No. 524 (Track 4.1,

Similar presentations

Presentation on theme: "EE Dept., IIT Bombay Indicon2013, Mumbai, 13-15 Dec. 2013, Paper No. 524 (Track 4.1,"— Presentation transcript:

Similar presentations

About project

Feedback