Presentation is loading. Please wait.

Presentation is loading. Please wait.

Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Similar presentations


Presentation on theme: "Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore."— Presentation transcript:

1 Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore Email: JHE007@e.ntu.edu.sg

2 Introduction – PAE based Spatial Audio System 2

3 Primary components highly correlated Ambient components uncorrelated Primary ambient components uncorrelated Ambient power balanced Stereo Signal Model 3 Signal = Primary + Ambient Assumptions

4 Stereo Signal Model 4 k 1 Center Right Left 1/10 10

5 PAE in full band, time domain 5 Compute parameters: k, γ

6 PAE in full band, frequency domain 6 Compute parameters: k, γ FFT

7 PAE in subband, frequency domain 7 f L R X(f)k(f)(f) k kkkkkkkkkkkkkkkk (1) (2) (3) (4) (5) (6) (7) (8) X(f) k represent the panning of the source Assumption In each band, only one source is dominant. The overlapping among the spectrum of different sources shall be minimal.

8 Correlation computation and time shifting 8 In time domain In frequency domain Find the inter-channel time difference (ICTD) Apply ICTD in frequency domain

9 φ0φ0 How to partition the bands? 9 f L R X(f) Ideally, the number of partitions = number of sources Fixed partitioning: independent of input Uniform (2, 4, 8, etc.) Non-uniform (e.g. ERB) Based on inter-channel cross- correlation coefficient (ICC) φ, Two thresholds: φ L, φ H φ1φ1 Adaptive partitioning: dependent of input Top-down Bottom-up … Conditions for partition: φ 0 < φ H max(φ 1, φ 2 ) > φ 0 min(φ 1, φ 2 ) > φ L Unknown φ2φ2

10 Multiple (2) sources 10 Three cases for the directions of two sources: 1.At different sides (DS) 2.One at the center (C) 3.At the same side (SS) Four ways to synthesize the source direction 1.Amplitude panning (AP) 2.Time shifting (TS) 3.Amplitude panning and time shifting (APTS) 4.HRTF filtering (HRTF)

11 Simulation testing: setup 11 Primary components: Speech, music Ambient components: white Gaussian noise Primary power ratio = 0.9 Frame length: 4096 Hanning window, 50% overlapping We test PCA and SPCA with different frequency partitioning Time domain, full band (Reference) Uniform partitioning with [1, 2, 4, 8, 16, 32, 64] partitions Non-uniform partitioning with 20 partitions (Faller, BCC [6]) Top-down (TD) partitioning, with φ L = 0.1; φ H =0.8 [6] C. Faller, and F. Baumgarte, “Binaural cue coding-part II: schemes and applications,” IEEE Tran. Speech and Audio Processing, vol. 11, no. 6, Nov. 2003. Performance measure: Error-to-Signal Ratio

12 Simulation Results: 1 source 12 Primary component: speech shifted by 20 lags, panned by k=3. T124816326420nonTD PCA-3.69-3.72-3.38-3.45-3.34-3.32-3.16-3.19-3.33 -3.72 SPCA-14.78-14.85-12.34-12.05-11.52-11.35-10.63-9.30-10.34-14.38 1.Generally, SPCA better than PCA. 2.The time domain PCA (SPCA) is very close to the frequency domain PCA (SPCA) when there is only one partition. 3.Significant worse performance is found in the frequency domain approaches with fixed partitioning. 4.The performance of the top down partitioning is acceptable. Primary panning factor ICTD

13 Simulation Results: 2 sources 13 Four ways to synthesize the source direction 1.Amplitude panning (AP) 2.Time shifting (TS) 3.Amplitude panning and time shifting (APTS) 4.HRTF filtering (HRTF) Three cases for the directions of two sources: 1.At different sides (DS) 2.One at the center (C) 3.At the same side (SS)

14 Simulation Results: 2 sources-AP 14 1. Generally, the performance of SPCA and PCA is similar. DS T124816326420nonTD PCA-7.95-8.10-8.13-8.22-8.34-8.56 -9.80-10.86 SPCA-7.94-8.15-8.18-8.26-8.36-8.61-8.39-9.33-9.95-8.36 C T124816326420nonTD PCA-10.15-10.25-10.14-10.22-10.27-10.38-10.34-11.16-11.99 SPCA-10.14-10.33-10.22-10.24-10.30-10.43-10.04-10.29-10.12-10.38 SS T124816326420nonTD PCA-13.04-13.10-11.82-11.88-11.75-11.53-11.31-11.40-11.93 SPCA-13.02-13.23-12.04-11.81-11.65-11.56-10.30-10.24-10.52-13.21 2. The performance is better when the two directions become closer. 3. The frequency domain approaches with fixed partitioning show some advantage when the primary components are not in the same side. 4. The frequency domain approach with top down partitioning yields a good performance.

15 Simulation Results: 2 sources-TS 15 1. Clearly, SPCA perform better than PCA. DS T124816326420nonTD PCA-5.16-5.21-5.23-5.24-5.23 -5.26-5.38-5.47-5.21 SPCA-7.98-8.44-8.43-8.59-8.69-8.73-8.62-9.12-8.58-8.91 C T124816326420nonTD PCA-9.10-9.14-9.18-9.20-9.18-9.17-9.14-9.18-9.27 -9.14 SPCA-9.13-9.60-9.70-9.85-9.97-9.85-9.54-9.91-9.35-10.05 SS T124816326420nonTD PCA-5.37-5.38-5.40-5.42-5.41-5.42-5.41-5.44-5.49 -5.38 SPCA-11.15-11.65-11.71-11.78-11.94-11.86-11.02-11.20-9.42-11.68 2. The performance of SPCA is better when no directions in the center. 3. The frequency domain approaches with fixed partitioning show some slightly advantage and does not vary too much in different partitioning. 4. The frequency domain approach with top down partitioning yields the best overall performance.

16 Simulation Results: 2 sources-APTS 16 1. Clearly, SPCA perform better than PCA. DS T124816326420nonTD PCA-5.16-5.21-5.23 -5.26-5.38-5.47 SPCA-7.99-8.44-8.43-8.59-8.69-8.73-8.62-9.12-8.58-8.91 C T124816326420nonTD PCA-8.06-8.28-8.19-8.27-8.34-8.46-8.44-9.04-9.55-8.28 SPCA-8.07-8.43-8.38-8.40-8.57-8.63-8.44-8.70-9.07-8.68 SS T124816326420nonTD PCA-4.18 -3.95-3.97-3.91-3.92-3.89-3.87-3.98 -4.19 SPCA-10.16-10.60-9.89-9.75-9.80-9.77-9.07-8.68-7.29-10.82 2. The performance is better when two directions are closer. 3. The frequency domain approaches with fixed partitioning perform better when the two directions are not in the same side. 4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

17 Simulation Results: 2 sources-APTS 17 1. Clearly, SPCA perform better than PCA. DS T12832 20no n TD18 0.05, 0.8 0.2,0. 8 0.2,0. 7 0.1,0. 7 0.05, 0.7 0.05, 0.9 0.1,0. 9 0.2,0. 9 PCA-4.74-5.04 -5.22-5.48-6.85 -5.03 SPCA-6.45-6.85 -7.11-7.25-7.73 -6.85 -8.36 -6.85 -7.93-8.58-6.85 C T12832 20no n TD18 0.05, 0.8 0.2,0. 8 0.2,0. 7 0.1,0. 7 0.05, 0.7 0.05, 0.9 0.1,0. 9 0.2,0. 9 PCA-8.06-8.28-8.19-8.34-8.44-9.55-8.28 -8.44 -8.28 -8.44 -8.27 SPCA-8.07-8.43-8.38-8.57-8.44-9.07-8.68-9.06 -8.44 -8.68-8.58-9.93-8.6-8.44 SS T12832 20no n TD18 0.05, 0.8 0.2,0. 8 0.2,0. 7 0.1,0. 7 0.05, 0.7 0.05, 0.9 0.1,0. 9 0.2,0. 9 PCA-4.18 -3.95-3.91-3.89-3.98 -4.19 SPCA - 10.16 - 10.60 -9.89-9.80-9.07-7.29 - 10.82 - 10.11 -10.6 - 10.41 -8.53 - 10.27 -10.6 2. The performance is better when two directions are closer. 3. The frequency domain approaches with fixed partitioning perform better when the two directions are not in the same side. 4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

18 Simulation Results: 2 sources-APTS 18 1. Clearly, SPCA perform better than PCA. DS T2832 20no n 0.05, 0.7 TD18 0.05, 0.8 0.2,0. 8 0.2,0. 7 0.1,0. 7 0.05, 0.7 0.05, 0.9 0.1,0. 9 0.2,0. 9 PCA-4.74-5.04-5.22-5.48-6.85 -5.03 SPCA-6.45-6.85-7.11-7.25-7.73 -7.93-6.85 -8.36 -6.85 -7.93-8.58-6.85 C T2832 20no n 0.05, 0.7 TD18 0.05, 0.8 0.2,0. 8 0.2,0. 7 0.1,0. 7 0.05, 0.7 0.05, 0.9 0.1,0. 9 0.2,0. 9 PCA-8.06-8.19-8.34-8.44-9.55 -8.44 -8.28 -8.44 -8.28 -8.44 -8.27 SPCA-8.07-8.38-8.57-8.44-9.07 -8.58 -8.68-9.06 -8.44 -8.68-8.58-9.93-8.6-8.44 SS T2832 20no n 0.05, 0.7 TD18 0.05, 0.8 0.2,0. 8 0.2,0. 7 0.1,0. 7 0.05, 0.7 0.05, 0.9 0.1,0. 9 0.2,0. 9 PCA-4.18-3.95-3.91-3.89-3.98-4.19 SPCA - 10.16 -9.89-9.80-9.07-7.29 - 10.41 - 10.82 - 10.11 -10.6 - 10.41 -8.53 - 10.27 -10.6 2. The performance is better when two directions are closer. 3. The frequency domain approaches with fixed partitioning perform better when the two directions are not in the same side. 4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

19 Simulation Results: 2 sources-HRTF 19 1. Clearly, SPCA perform better than PCA. DS T124816326420nonTD PCA-3.28-3.46 -3.44-3.50-3.74-3.77-3.89-3.85-2.47 SPCA-6.49-6.07-6.08-6.13-6.42-7.33-5.70-5.56-5.71-6.52 C T124816326420nonTD PCA-6.96-7.16-7.09-7.12-7.21-7.37-7.50-7.60-7.65 -7.16 SPCA-7.41-7.97-7.89-7.92-7.87-8.33-6.97-6.15-7.17-8 SS T124816326420nonTD PCA-1.14-1.26-1.14-1.15-1.12-1.06-1.04-1.07-1.19 -1.88 SPCA-6.58-6.81-6.48 -6.70-7.44-2.51-2.44-2.90-6.98 2. The performance of SPCA is better when one direction is in the center. 3. The frequency domain approaches with fixed partitioning show better performance only in some partitionings. 4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

20 Summary of Simulation Results: 2 sources 20 2. Generally, SPCA perform better than PCA in almost all cases. 3. The performance varies as the directions of the sources change. 4. The frequency domain approaches with fixed partitioning cannot always give a better performance. 5. The frequency domain approach with top down partitioning approach yields the best overall performance in most of the cases. 1. The overall performance of PAE: AP > TS > APTS > HRTF. How about perceptual testing? Usually only one source (speech) is better extracted. Because the spectrum of speech (as compared to music) is more focused in certain bands.

21 Conclusions and thoughts 21 2. Many considerations should go to the partitioning of the frequency bands. 4. The performance of PAE in frequency domain with fixed partitioning is not consistent as the directions of the sources change and the number of partitions changes. 3. Generally, SPCA outperforms PCA in almost all cases. 5. The frequency domain approach with top down partitioning approach yields some promising results in the performance in most of the cases. 1. A study of frequency domain PAE with different partitioning is conducted. It is targeted for multiple primary components that come from different directions concurrently. Two PAE approaches tested are PCA and shifted PCA. Thoughts: A more robust partitioning is required! How to determine the threshold for other input signals. Need more accurate estimation of primary panning factor, and ICTD/ICC. How about other PAE approaches such as least squares ?


Download ppt "Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore."

Similar presentations


Ads by Google