Presentation is loading. Please wait.

Presentation is loading. Please wait.

Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis Hiroshi SARUWATARI, Tsuyoki NISHIKAWA, and Kiyohiro SHIKANO.

Similar presentations


Presentation on theme: "Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis Hiroshi SARUWATARI, Tsuyoki NISHIKAWA, and Kiyohiro SHIKANO."— Presentation transcript:

1

2 Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis Hiroshi SARUWATARI, Tsuyoki NISHIKAWA, and Kiyohiro SHIKANO Graduate School of Information Science Nara Institute of Science and Technology

3  Background of blind source separation (BSS)  Overview of Independent Component Analysis (ICA)  Time-domain ICA (TDICA)  Frequency-domain ICA (FDICA)  Disadvantages of TDICA and FDICA  Proposal of Multistage ICA  Experimental results under real acoustic condition  ConclusionContents

4 Background Hands-free speech recognition system Target Speech Microphone Speech recognition system ? Interference Interference is also observed at microphone. Speech recognition performance significantly degrades. Is it fine tomorrow?

5  Goal  Microphone array  Receiver which consists of multiple elements  To enhance target speech or reduce interference  Problem of microphone array processing  A priori information is required.  Directions of arrival of the sound sources  Breaks of target speech for filter adaptation Background (Cont’d)  Realization of high quality hands-free speech interface system

6 Problem of Microphone Array Delay and Sum : to produce narrow beam- pattern Adaptive : to update filters to reduce the noise θ Target High sidelobe gains noise. θ Set target direction. It is necessary to observe only noise for filter learning. Null

7  Approach taken to estimate source signals only from the observed mixed signals.  Any information about source directions and acoustic conditions is not required.  Independent component Analysis (ICA) is mainly used.  Previous works on ICA J. Cardoso, 1989 C. Jutten, 1990 (Higher-order decorrelation) P. Common, 1994 (define the term “ICA”) A. Bell et al., 1995 (infomax) Blind Source Separation (BSS)

8 Microphone2 Microphone1 MutuallyIndependent Known ICA-Based BSS Speaker2 Speaker1 Good Morning! Hello! Observed signal1 Observed signal2 Source2 Source1 To estimate source signals No a priori information (unsupervised adaptive filtering)

9 BSS for Instantaneous mixture Linearly Mixing Process Mixing Matrix Source Observed Separation Process Separated Unmixing Matrix Independent? Cost Function Optimize

10 Various Criterion for ICA Decorrelation –To minimize correlation among signals in multiple time durations Nonlinear function 1 –To minimize higher-order correlation Nonlinear function 2 –To assume p.d.f of sources Separated Signal : Sigmoid function

11 Cost Function for Nonlinear Function 2 Kullback-Leibler (KL) divergence between and 1. Joint Entropy of 2. Sum of marginal entropy of ・ Minimized when are mutually independent ・ This can be achieved by maximization of because hardly changes.

12 Derivation for Nonlinear Function 2 Nonlinear Function 2 ⇒ To be diagonalized where This can be approximated by Sigmoid Function in speech signal. To update along the negative gradient of

13 Why Instantaneous Mixture Model? ICA for Instantaneous Mixture model → Mixing matrix is represented as real-valued. → No assumption for time delay among microphones and room reflections. It is a useful example to show how ICA can work, but only mathematical “Toy model”. Is it applicable to sound signals in real acoustic environment? NO!

14 ICA for Convolutive Mixture Model (1) In application to microphone array, –each received signal has a time delay which corresponds to the sound direction and position of each element. –mixing matrix A is not simple scalar-valued coefficient, but is represented as “convolution filter”. Mixing Process in Real Environment Mixing Matrix (FIR-Filter) Source Observed Convolution ⇒ We should use FIR filter for separation process.

15 ICA for Convolutive Mixture Model (2) Mixing Process in Frequency Domain Complex-Valued Mixing Matrix Source observed ⇒ We only solve the complex-valued instantaneous mixture in each subband independently.  To simplify the convolutive mixture down to instantaneous mixtures by the frequency transform

16 Permutation Problem in FDICA Permutation and Gain Determinacy ・ ICA is conducted in each subband independently. ・ Ordering and Scaling of outputs are arbitrary in ICA. ICA at 1 kHz ICA at 1 kHz 2 kHz ICA 1 kHz ICA FDICA S1 S2 S1×0.5 S2×1 S2×0.4 S1×1.1

17 Solutions for Permutation Problem ・ To use correlation among outputs waveforms (Murata et al. 1998) ・ To use directivity pattern of W (Kurita and Saruwatari, 2000) ・ To use correlation among unmixing matrices in neighboring frequency bins (Parra et al, 2000, Asano et al, 2001) ・ To use correlation among outputs waveforms (Murata et al. 1998) ・ To use directivity pattern of W (Kurita and Saruwatari, 2000) ・ To use correlation among unmixing matrices in neighboring frequency bins (Parra et al, 2000, Asano et al, 2001)

18 Problem in ICA-Based BSS It is necessary to achieve robust BSS method against reverberation. method against reverberation. It is necessary to achieve robust BSS method against reverberation. method against reverberation. ・ Separation performance under reverberant conditions significantly degrades. Why ? Reverberation in typical room =2400-tap FIR-filter in 8 kHz sampling =2400-tap FIR-filter in 8 kHz sampling ⇒ The number of parameters is too large.

19 Conventional Approaches  Frequency-Domain ICA (FDICA)  To estimate the separation coefficients every frequency bin in the frequency domain  Time-Domain ICA (TDICA)  To estimate the separation FIR filter in the time domain

20 FDICA-Based BSS f <Advantages>  To simplify the convolutive mixture down to instantaneous mixtures by the frequency transform  Easy to converge the separation filter in iterative ICA learning with high stability<Advantages>  To simplify the convolutive mixture down to instantaneous mixtures by the frequency transform  Easy to converge the separation filter in iterative ICA learning with high stability

21  In conventional dereverberation processing, dereverberation performance is improved as the number of subbands (filter length) is enlarged. Performance of FDICA In FDICA, as the number of subbands is enlarged, is source-separation performance also improved? In FDICA, as the number of subbands is enlarged, is source-separation performance also improved? To investigate the relation between the number of subbands and source-separation performance

22 Experimental Setup  Sound source: 2-male and 2-female speech from ASJ corpus (12 combinations)  Evaluation: Noise Reduction Rate = Output SNR [dB] – Input SNR [dB] Interelement spacing is 4 cm. Reverberation time is 300 ms.

23 FDICA Algorithm  Iterative learning algorithm (Amari, 1996): where η: Step size parameter

24 Relation between Number of Subbands and Separation Performance Separation performance significantly degrades <Disadvantages>  When the number of subbands becomes too large, the independence assumption of narrow-band signals collapses (Nishikawa, Araki, et al., 2001).  Separation performance in FDICA is saturated before reaching a sufficient performance.<Disadvantages>  When the number of subbands becomes too large, the independence assumption of narrow-band signals collapses (Nishikawa, Araki, et al., 2001).  Separation performance in FDICA is saturated before reaching a sufficient performance.

25 Signal2 Signal1 32-Subbands2048-Subbands Narrow-Band Signals Real part (1 kHz)

26 Relation between Number of Subbands and Correlation Higher correlation If we increase the number of subbands too much, the correlation between narrow-band signals increases. If we increase the number of subbands too much, the correlation between narrow-band signals increases.

27 Trade-off Relation between Independence and Robustness against Reverberation Number of Subbands LargeSmall Noise Reduction Rate Low-Independence High-IndependenceRobust against Reverberation Poor against Reverberation

28 TDICA-Based BSS  To treat the fullband signals where the independence assumption of sources usually holds  High-convergence possibility near the optimal point  To treat the fullband signals where the independence assumption of sources usually holds  High-convergence possibility near the optimal point

29  In conventional dereverberation processing, dereverberation performance is improved as the filter length is lengthened. Performance of TDICA In TDICA, as the filter length is lengthened, is source-separation performance also improved? In TDICA, as the filter length is lengthened, is source-separation performance also improved? To investigate the relation between the filter length and source-separation performance

30 TDICA Algorithm  Cost function minimize where

31 TDICA Algorithm (Cont’d)  Iterative equation of separation filter  Natural gradient of Q ( w ( z )) where α is the step-size parameter

32 Proposed Iterative Equation of TDICA  Iterative equation of separation filter (TDICA1) Evaluate only correlation of same time  Iterative equation of separation filter (TDICA2) Expand to evaluate correlation of different time

33 Results of TDICA1 and TDICA2 TDICA 1 (only correlation of same time) TDICA 2 (correlation of different time)  It is necessary to evaluate correlation of different times to achieve a superior performance.  Source separation using long filter fails in TDICA because the iterative rule for FIR-filter learning is complicated.  It is necessary to evaluate correlation of different times to achieve a superior performance.  Source separation using long filter fails in TDICA because the iterative rule for FIR-filter learning is complicated.

34  Advantages To simplify the convolutive mixture down to instantaneous mixtures Easy to converge the separation filter with high stability  Disadvantages Independence assumption collapses in each narrow- band. Separation performance is saturated before reaching a sufficient performance. Problems and Solution FDICATDICA  Advantages To treat the fullband speech signals where the independence assumption of sources usually holds High-convergence possibility near the optimal point  Disadvantages Iterative rule for filter learning is complicated. Convergence degrades under reverberant conditions. Complement Second-stageFirst-stage To use advantages of FDICA and TDICA together Multistage ICA (MSICA) combining FDICA and TDICA Multistage ICA (MSICA) combining FDICA and TDICA

35 Separation Procedure of MSICA FDICA TDICA Mixing system  Separated signals of FDICA are regarded as the input signals for TDICA.  Residual cross-talk components of FDICA can be removed by TDICA.  Separated signals of FDICA are regarded as the input signals for TDICA.  Residual cross-talk components of FDICA can be removed by TDICA.

36  To investigate the relation between the filter length and source-separation performance of TDICA part in MSICA  Comparison among separation performances of TDICA, FDICA, and MSICA Comparison among Each ICA

37 Relation between Filter Length and Separation Performance 9.4  Separation performance of MSICA is improved even with the long filter.  TDICA is still useful near the optimal point.  Separation performance of MSICA is improved even with the long filter.  TDICA is still useful near the optimal point.

38  To investigate the relation between the filter length and source-separation performance of TDICA part in MSICA  Comparison among separation performances of TDICA, FDICA, and MSICA Comparison among Each ICA

39 Comparison Results Average: TDICA: 5.9 dB , FDICA: 9.4 dB , MSICA : 12.1 dB  Separation performance of MSICA is superior to those of TDICA and FDICA.  Combination of FDICA and TDICA is inherently effective for improving the separation performance.  Separation performance of MSICA is superior to those of TDICA and FDICA.  Combination of FDICA and TDICA is inherently effective for improving the separation performance.

40 Spectral Distortion (TDICA) 10 taps: No whitening, 1000 taps: whitening

41 Spectral Distortion (FDICA, MSICA) FDICA: No whitening, MSICA: No whitening

42 Sound Demonstration of MSICA  Reverberation time: 300 ms  Mixed speech (Female+Male)  Separated speech (Female)  Separated speech (Male) Female 40° Male -30°

43 Conclusion  Disadvantages of FDICA and TDICA  Separation performance of FDICA is saturated before reaching a sufficient performance.  Source separation using long filter fails in TDICA because the iterative rule for FIR-filter learning is complicated.  MSICA combining FDICA and TDICA  In TDICA part in MSICA, separation performance is improved even with the long filter.  Separation performance of MSICA is superior to those of TDICA and FDICA.

44 Future Work  Further evaluation in real environment  Robustness under reverberant conditions  Larger array with more than 2-element  To apply BSS to speech recognition system  Improvement of convergence speed  On-line and real-time algorithm

45 ICA and BSS: Where do we go? We should go to NARA-city! Why?

46 4 th International Symposium on ICA and BSS (ICA2003), in NARA  Date: April 1-4, 2003  Place: Nara, JAPAN  Scientific Areas:  ICA and Factor Analysis, PCA etc.  Blind source separation  Blind and semi-blind equalization and deconvolution  Blind identification  Any signal processing application related with ICA URL: http://ica2003.jp/

47 Analysis Conditions Filter length 10 ~ 2000 [taps] Number of blocks B (Block length ) 1 ~ 10 (Data length/B) ICA-iteration500 Number of subbands 1024 point Frame shift16 point ICA-iteration30 FDICA part in MSICA TDICA, TDICA part in MSICA

48 Analysis Conditions Filter length 10 (TDICA) 1000 (MSICA) Number of blocks B 3 (TDICA) 9 (MSICA) ICA-iteration500 Number of subbands 1024 point Frame shift16 point ICA-iteration30 FDICA, FDICA part in MSICA TDICA, TDICA part in MSICA


Download ppt "Blind Source Separation of Acoustic Signals Based on Multistage Independent Component Analysis Hiroshi SARUWATARI, Tsuyoki NISHIKAWA, and Kiyohiro SHIKANO."

Similar presentations


Ads by Google