Download presentation

Presentation is loading. Please wait.

Published byMelina Hayne Modified about 1 year ago

1
Implicit Speaker Separation DaimlerChrysler Research and Technology

2
2 Problem Context Speaker separation Speech recognition drivercodriver ‚ text ‘

3
3 Algorithm Architecture Spatial Filte r + - Min Power Adaption during driver silences drivercodriver

4
4 Reminder on Least-Mean Square (LMS) w2w2 x 1 (signal ref) x 2 (noise ref) + The filter w 2 is adapted with the normalized Least-Mean Square (NLMS) algorithm. w 2 (n+1) (k) = w 2 (n) (k) - y 1 (t)x 2 (t-k) / x2 Converges if the target is not active: speaker activity detection required Convergence is assured (in the mean) if 0 y 1 = x 1 + x 2 * w 2

5
5 Reminder on Least-Mean Square (LMS) w2w2 x 1 (signal ref) x 2 (noise ref) + y 1 = x 1 + x 2 * w 2 NLMS w 2 (n+1) (k) = w 2 (n) (k) - y 1 (t)x 2 (t-k) / x2 Adapts slower when the interferer is loud (for stability)

6
6 From LMS to “Implicit” LMS w2w2 x 1 (signal ref) x 2 (noise ref) + NLMS w 2 (n+1) (k) = w 2 (n) (k) - y 1 (t)x 2 (t-k) / x2 Adapts slower when the interferer is loud (for the stability) Implicit LMS w 2 (n+1) (k)= w 2 (n) (k) – y 1 (t)x 2 (t-k)} / y1 Adapts slower when the target is loud : less target cancellation Adapts faster when the output (=target) is weak. Adapts… maybe to fast. y 1 = x 1 + x 2 * w 2

7
7 “Implicit” LMS stability condition w2w2 x 1 (signal ref) x 2 (noise ref) + NLMS w 2 (n+1) (k) = w 2 (n) (k) - y 1 (t)x 2 (t-k) / x2 ILMS w 2 (n+1) (k)= w 2 (n) (k) – y 1 (t)x 2 (t-k) 2 / y1 ILMS = NLMS with time varying step size k x2 / y1 ILMS stability condion: 0 < k < 2 0 < x2 / y1 < 2 y 1 = x 1 + x 2 * w 2

8
8 “Implicit” LMS stability condition w2w2 x 1 (signal ref) x 2 (noise ref) + ILMS stability condition: 0 < x2 / y1 < 2 Fulfilled ? If yes then w 2 (n+1) (k)= w 2 (n) (k) – y 1 (t)x 2 (t-k) 2 / y1 If not then NLMS with step-size w 2 (n+1) (k) = w 2 (n) (k) - y 1 (t)x 2 (t-k) / x2 y 1 = x 1 + x 2 * w 2

9
9 “Implicit” LMS stability condition w2w2 x 1 (signal ref) x 2 (noise ref) + ILMS stability condition: 0 < x2 / y1 < 2 Fulfilled ? If yes then w 2 (n+1) (k)= w 2 (n) (k) – y 1 (t)x 2 (t-k) 2 / y1 If not then NLMS with step-size w 2 (n+1) (k) = w 2 (n) (k) - y 1 (t)x 2 (t-k) / x2 y 1 = x 1 + x 2 * w 2 When does it happen ?

10
10 “Implicit” LMS stability condition We describe the system with mismatch = “how far is w 2 from optimum” leakage = “how much driver speech is received in codriver microphone” ILMS stability condition:0 < x2 / y1 < 2 is not fulfilled if and only if (i) means: “we are close to optimum” (ii) means: “the driver is weak with respect to codriver” => NLMS normalization is convenient.

11
11 From ILMS to BSS (Blind Source Separation) w1w1 w2w2 y1y1 y2y2 x1x1 x2x2 w 1 and w 2 are jointly optimized such that the outputs are independent. + + Dependence measure Replace the noise reference x 2 with the best available reference y 2. No adaption control needed (blind). High complexity w.r.t. NLMS or ILMS w 2 (n+1) = w 2 (n) – y 1 (t)x 2 (t-k)/ 2 y1 ILMS (reminder) w 2 (n+1) = w 2 (n) – y 1 (t) y 2 (t-k)/ 2 y2 w 1 (n+1) = w 1 (n) – y 2 (t) y 1 (t-k)/ 2 y1

12
12 How does it sound ? Microphone signals: Blocwise adaptation Unsupervised NLMS Supervised NLMS Implicit ILMS BSS Samplewise adaptation Unsupervised NLMS Supervised NLMS Implicit ILMS

13
13 Conclusion NLMS converge fastest (target silent) and… … diverge fastest (double talk). 15 dB SIR improvement with perfect double detection ILMS very robust, no explicit speaker detection 10-12 dB SIR improvement low compexity BSS robust and converge fast SIR improvement 15 dB high complexity

14
14 SIR Improvement

15
15 Microphone power ratio Clean signalsSNR at x 1 = 15 dB

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google