Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Jianjun HE, and Woon-Seng Gan 23 rd April 2015.

Similar presentations


Presentation on theme: "Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Jianjun HE, and Woon-Seng Gan 23 rd April 2015."— Presentation transcript:

1 Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Jianjun HE, and Woon-Seng Gan 23 rd April 2015 Jhe007@e.ntu.edu.sg Digital Signal Processing Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

2 WHY 2 Existing sound scene representations:  Channel-based Conventional, for a specific playback system;  Lacks the flexibility to support different playback configurations.  Object-based Emerging, for any playback system;  Lacks the efficiency: large storage and high transmission bandwidth. Primary-ambient based representation Inspired by human auditory system; Facilitates flexible and efficient rendering.  Primary-ambient extraction (PAE) from the channel-based audio (e.g., stereo). Existing approaches: mainly for one dominant source in primary components; Subband approaches: problematic when spectra overlap;  PAE with multiple sources (different directions) not well studied. To obtain a new representation of sound scenes in digital media, which is both flexible and efficient in spatial audio reproduction for any playback systems.

3 Stereo Signal Model 3 Primary components highly correlated Ambient components uncorrelated Primary ambient components uncorrelated Ambient power balanced Signal = Primary + Ambient Assumptions J. He, E. L. Tan and W. S. Gan, “Linear estimation based primary-ambient extraction for stereo audio signals,” IEEE Trans. Audio, Speech, Lang. Process., vol. 22, no. 2, pp. 505-517, Feb. 2014.

4 PCA for primary extraction 4 Objective M. M. Goodwin and J. M. Jot, “Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement,” in Proc. ICASSP, Hawaii, 2007, pp.9-12.

5 Shifted PCA for primary extraction 5 To account for the partial primary correlation (0-lag) caused by the time difference, we proposed shifted PCA (SPCA). Let d be the inter-channel time difference. J. He, E. L. Tan, and W. S. Gan, “Time-shifted principal component analysis based cue extraction for stereo audio signals,” in Proc. ICASSP, Vancouver, Canada, 2013, pp. 266-270. Shifted signal Shifted primary

6 Multi-Shift PCA for primary extraction 6 To account for concurrent directional sound sources (from different directions) in the primary components, we consider a few selective shifts. Typical structure of MSPCA (MSPCA-T)

7 Multi-Shift PCA: consecutive structure 7 To perform shifting consecutively lag by lag, and apply different weights to different shifted versions. The weights are derived based on inter-channel cross-correlation coefficient (ICC).

8 Experiment setup 8  Primary components: one piece of speech clip: amplitude panned by k = 3, time shifted by 20 lags; one piece of music clip: amplitude panned by k = 1/3, time shifted by -20 lags;  Ambient components: uncorrelated white Gaussian noise;  Overall power of speech, music and ambience are set equal;  Approaches evaluated: PCA SPCA MSPCA-T MSPCA (a=2) MSPCA (a=10)  ICTD searching range: ±50 lags, (~2ms for fs=44.1 kHz)

9 Comparison of weighting methods 9  PCA and SPCA: only one nonzero weight at different lags;  MSPCA-T: two weights at two lags, though the positive ICTD for the music is not as accurate;  For consecutive MSPCA, non-zero weights at all lags, and higher weights are given to those lags that are closer to the directions of the primary components;  As the exponent a increases, the differences among the weights at various lags become more significant;  When a is high (e.g., a = 10), the weighting method in consecutive MSPCA becomes similar to SPCA.

10 Objective performance: extraction accuracy 10 Error-to-signal ratio

11 Subjective performance: l ocalization accuracy 11 12 participants, score from 0-10  0 : worst, the two directions reversed;  10 : correct direction compared to reference;  5+ : at least one direction is correct;  3-7: neither directions are too close or too bad.

12 Conclusions 12 1.Investigated primary extraction from stereo signals when there are multiple concurrent distinct directions for the sources in the primary components. 2.Proposed multi-shift PCA to handle multiple directions a)MSPCA with typical structure involves limited selected shifts, but its performance is degraded when ICTD estimation is inaccurate; b)MSPCA with consecutive structure is more robust, by applying weights on every shifted versions. c)The weighting method for different shifts is critical; d)In general, applying a proper exponent of the ICC yields good (objective and subjective) performance. 3.Future work: how to determine the best exponent value for ICC based weighting, other weighting methods, and study the relation of multi-shifting and optimal filtering in primary-ambient extraction.

13 References 13 [1] M. M. Goodwin and J. M. Jot, “Primary-ambient signal decomposition and vector-based localization for spatial audio coding and enhancement,” in Proc. ICASSP, Hawaii, 2007, pp.9-12. [7] C. Faller and F. Baumgarte, “Binaural cue coding-part II: schemes and applications,” IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp.520-531, Nov. 2003. [8] M. M. Goodwin and J. M. Jot, “Binaural 3-D audio rendering based on spatial audio scene coding,” in Proc. 123rd Audio Eng. Soc. Conv., New York, 2007. [12] K. Sunder, J. He, E. L. Tan, and W. S. Gan, “Natural sound rendering for headphones,” IEEE Signal Processing Magazine, vol. 32, no.2, pp. 100-113, Mar. 2015. [13] C. Avendano and J. M. Jot, “A frequency-domain approach to multichannel upmix,” J. Audio Eng. Soc., vol. 52, no. 7/8, pp. 740-749, Jul./Aug. 2004. [14] C. Faller, “Multiple-loudspeaker playback of stereo signals,” J. Audio Eng. Soc., vol. 54, no. 11, pp. 1051-1064, Nov. 2006. [17] J. He, W. S. Gan, and E. L. Tan, “Primary-ambient extraction using ambient phase estimation with a sparsity constraint,” IEEE Signal Process. Letters, vol. 22, no. 8, pp. 1127-1131, Aug. 2015. [18] J. Merimaa, M. M. Goodwin, and J. M. Jot, “Correlation-based ambience extraction from stereo recordings,” in Proc. 123rd Audio Eng. Soc. Conv., New York, 2007. [21] J. He, E. L. Tan, and W. S. Gan, “Time-shifted principal component analysis based cue extraction for stereo audio signals,” in Proc. ICASSP, Vancouver, Canada, 2013, pp. 266-270. [22] J. He, E. L. Tan and W. S. Gan, “Linear estimation based primary-ambient extraction for stereo audio signals,” IEEE Trans. Audio, Speech, Lang. Process., vol. 22, no. 2, pp. 505-517, Feb. 2014. [24] J. He, W. S. Gan and E. L. Tan, “A study on the frequency-domain primary-ambient extraction for stereo audio signals,” in Proc. ICASSP, Florence, Italy, 2014, pp. 2868-2872.

14 Acknowledgement 14 THIS WORK IS SUPPORTED BY THE SINGAPORE MINISTRY OF EDUCATION ACADEMIC RESEARCH FUND TIER-2, UNDER RESEARCH GRANT MOE2010-T2-2-040.

15 Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Jianjun HE Jhe007@e.ntu.edu.sg Nanyang Technological University, Singapore Thank you!


Download ppt "Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Jianjun HE, and Woon-Seng Gan 23 rd April 2015."

Similar presentations


Ads by Google