Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implicit Speaker Separation DaimlerChrysler Research and Technology.

Similar presentations


Presentation on theme: "Implicit Speaker Separation DaimlerChrysler Research and Technology."— Presentation transcript:

1 Implicit Speaker Separation DaimlerChrysler Research and Technology

2 2 Problem Context Speaker separation Speech recognition drivercodriver ‚ text ‘

3 3 Algorithm Architecture Spatial Filte r + - Min Power Adaption during driver silences drivercodriver

4 4 Reminder on Least-Mean Square (LMS) w2w2 x 1 (signal ref) x 2 (noise ref) + The filter w 2 is adapted with the normalized Least-Mean Square (NLMS) algorithm. w 2 (n+1) (k) = w 2 (n) (k) -  y 1 (t)x 2 (t-k) /   x2 Converges if the target is not active: speaker activity detection required Convergence is assured (in the mean) if 0   y 1 = x 1 + x 2 * w 2

5 5 Reminder on Least-Mean Square (LMS) w2w2 x 1 (signal ref) x 2 (noise ref) + y 1 = x 1 + x 2 * w 2 NLMS w 2 (n+1) (k) = w 2 (n) (k) -  y 1 (t)x 2 (t-k) /   x2 Adapts slower when the interferer is loud (for stability)

6 6 From LMS to “Implicit” LMS w2w2 x 1 (signal ref) x 2 (noise ref) + NLMS w 2 (n+1) (k) = w 2 (n) (k) -  y 1 (t)x 2 (t-k) /   x2 Adapts slower when the interferer is loud (for the stability) Implicit LMS w 2 (n+1) (k)= w 2 (n) (k) –    y 1 (t)x 2 (t-k)} /   y1 Adapts slower when the target is loud : less target cancellation Adapts faster when the output (=target) is weak. Adapts… maybe to fast. y 1 = x 1 + x 2 * w 2

7 7 “Implicit” LMS stability condition w2w2 x 1 (signal ref) x 2 (noise ref) + NLMS w 2 (n+1) (k) = w 2 (n) (k) -  y 1 (t)x 2 (t-k) /   x2 ILMS w 2 (n+1) (k)= w 2 (n) (k) –    y 1 (t)x 2 (t-k) 2 /   y1 ILMS = NLMS with time varying step size  k     x2 /   y1 ILMS stability condion: 0 <  k  < 2 0 <     x2 /   y1 < 2 y 1 = x 1 + x 2 * w 2

8 8 “Implicit” LMS stability condition w2w2 x 1 (signal ref) x 2 (noise ref) + ILMS stability condition: 0 <     x2 /   y1 < 2 Fulfilled ? If yes then w 2 (n+1) (k)= w 2 (n) (k) –    y 1 (t)x 2 (t-k) 2 /   y1 If not then NLMS with step-size   w 2 (n+1) (k) = w 2 (n) (k) -    y 1 (t)x 2 (t-k) /   x2 y 1 = x 1 + x 2 * w 2

9 9 “Implicit” LMS stability condition w2w2 x 1 (signal ref) x 2 (noise ref) + ILMS stability condition: 0 <     x2 /   y1 < 2 Fulfilled ? If yes then w 2 (n+1) (k)= w 2 (n) (k) –    y 1 (t)x 2 (t-k) 2 /   y1 If not then NLMS with step-size   w 2 (n+1) (k) = w 2 (n) (k) -    y 1 (t)x 2 (t-k) /   x2 y 1 = x 1 + x 2 * w 2 When does it happen ?

10 10 “Implicit” LMS stability condition We describe the system with  mismatch = “how far is w 2 from optimum”  leakage = “how much driver speech is received in codriver microphone” ILMS stability condition:0 <     x2 /   y1 < 2 is not fulfilled if and only if (i) means: “we are close to optimum” (ii) means: “the driver is weak with respect to codriver” => NLMS normalization is convenient.

11 11 From ILMS to BSS (Blind Source Separation) w1w1 w2w2 y1y1 y2y2 x1x1 x2x2 w 1 and w 2 are jointly optimized such that the outputs are independent. + + Dependence measure Replace the noise reference x 2 with the best available reference y 2. No adaption control needed (blind). High complexity w.r.t. NLMS or ILMS w 2 (n+1) = w 2 (n) –  y 1 (t)x 2 (t-k)/  2 y1 ILMS (reminder) w 2 (n+1) = w 2 (n) –  y 1 (t) y 2 (t-k)/  2 y2 w 1 (n+1) = w 1 (n) –  y 2 (t) y 1 (t-k)/  2 y1

12 12 How does it sound ? Microphone signals: Blocwise adaptation Unsupervised NLMS Supervised NLMS Implicit ILMS BSS Samplewise adaptation Unsupervised NLMS Supervised NLMS Implicit ILMS

13 13 Conclusion NLMS converge fastest (target silent) and… … diverge fastest (double talk). 15 dB SIR improvement with perfect double detection ILMS very robust, no explicit speaker detection 10-12 dB SIR improvement low compexity BSS robust and converge fast SIR improvement 15 dB high complexity

14 14 SIR Improvement

15 15 Microphone power ratio Clean signalsSNR at x 1 = 15 dB


Download ppt "Implicit Speaker Separation DaimlerChrysler Research and Technology."

Similar presentations


Ads by Google