On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,

Slides:



Advertisements
Similar presentations
Frequency response adaptation in binaural hearing David Griesinger Cambridge MA USA
Advertisements

Frequency response adaptation in binaural hearing
Evaluation of Dummy-Head HRTFs in the Horizontal Plane based on the Peak-Valley Structure in One-degree Spatial Resolution Wersényi György SZÉCHENYI ISTVÁN.
Architecting an Intercommunicative Experimental Environment with Research Participants and its Parameters Proceedings of the 2012 Japan-Taiwan Symposium.
Learning the cues associated with non-individualised HRTFs John Worley and Jonas Braasch Binaural and Spatial Hearing Group.
Listening Tests and Evaluation of Simulated Sound Fields Using VibeStudio Designer Wersényi György Hesham Fouad SZÉCHENYI ISTVÁN UNIVERSITY, Hungary VRSonic,
Extracting the pinna spectral notches Vikas C. Raykar | Ramani Duraiswami University of Maryland, CollegePark B. Yegnanaryana Indian Institute of Technology,
3-D Sound and Spatial Audio MUS_TECH 348. Wightman & Kistler (1989) Headphone simulation of free-field listening I. Stimulus synthesis II. Psychophysical.
Improvement of Audibility for Multi Speakers with the Head Related Transfer Function Takanori Nishino †, Kazuhiro Uchida, Naoya Inoue, Kazuya Takeda and.
What Virtual Audio Synthesis Could Do for Visually Disabled Humans in the New Era? György Wersényi Széchenyi István University Department of Telecommunications.
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Multi-Shift Principal Component Analysis based Primary Component Extraction for Spatial Audio Reproduction Jianjun HE, and Woon-Seng Gan 23 rd April 2015.
Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization Tyler B. Johnson and Carlos Guestrin University of Washington.
Binaural Sound Localization and Filtering By: Dan Hauer Advisor: Dr. Brian D. Huggins 6 December 2005.
Boston University--Harvard University--University of Illinois--University of Maryland A System Identification Problem from Lord Rayleigh P. S. Krishnaprasad.
Speech Segregation Based on Sound Localization DeLiang Wang & Nicoleta Roman The Ohio State University, U.S.A. Guy J. Brown University of Sheffield, U.K.
ACCURATE TELEMONITORING OF PARKINSON’S DISEASE SYMPTOM SEVERITY USING SPEECH SIGNALS Schematic representation of the UPDRS estimation process Athanasios.
Supervisor: Dr Richard So Speaker: Brian Ngan
Graduate Category: Engineering and Technology Degree Level: Ph.D. Abstract ID# 122 On-Chip Spectral Analysis for Built-In Testing and Digital Calibration.
HELSINKI UNIVERSITY OF TECHNOLOGY Savioja and Välimäki INTERPOLATED 3-D DIGITAL WAVEGUIDE MESH WITH FREQUENCY WARPING Lauri Savioja 1 and Vesa Välimäki.
On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan,
Improved 3D Sound Delivered to Headphones Using Wavelets By Ozlem KALINLI EE-Systems University of Southern California December 4, 2003.
METHODOLOGY INTRODUCTION ACKNOWLEDGEMENTS LITERATURE Low frequency information via a hearing aid has been shown to increase speech intelligibility in noise.
Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University.
UNIVERSITY OF BRITISH COLUMBIA RESEARCH PROGRESS IN DEC 2014 Bambang A.B. Sarif.
42nd IEEE Conference on Decision and Control, Dec 9-12, 2003
Issac Garcia-Munoz Senior Thesis Electrical Engineering Advisor: Pietro Perona.
Time state Athanassios Katsamanis, George Papandreou, Petros Maragos School of E.C.E., National Technical University of Athens, Athens 15773, Greece Audiovisual-to-articulatory.
Virtual Worlds: Audio and Other Senses. VR Worlds: Output Overview Visual Displays: –Visual depth cues –Properties –Kinds: monitor, projection, head-based,
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Ilmenau University of Technology Communications Research Laboratory 1 Deterministic Prewhitening to Improve Subspace based Parameter Estimation Techniques.
Study on the Use of Error Term in Parallel- form Narrowband Feedback Active Noise Control Systems Jianjun HE, Woon-Seng Gan, and Yong-Kim Chong 11 th Dec,
3-D Sound and Spatial Audio MUS_TECH 348. Main Types of Errors Front-back reversals Angle error Some Experimental Results Most front-back errors are front-to-back.
Studies of Information Coding in the Auditory Nerve Laurel H. Carney Syracuse University Institute for Sensory Research Departments of Biomedical & Chemical.
THE MANIFOLDS OF SPATIAL HEARING Ramani Duraiswami | Vikas C. Raykar Perceptual Interfaces and Reality Lab University of Maryland, College park.
Simulation of small head-movements on a Virtual Audio Display using headphone playback and HRTF synthesis Wersényi György SZÉCHENYI ISTVÁN UNIVERSITY,
3-D Sound and Spatial Audio MUS_TECH 348. Physical Modeling Problem: Can we model the physical acoustics of the directional hearing system and thereby.
Jens Blauert, Bochum Binaural Hearing and Human Sound Localization.
Spatial and Spectral Properties of the Dummy-Head During Measurements in the Head-Shadow Area based on HRTF Evaluation Wersényi György SZÉCHENYI ISTVÁN.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
TIME-SHIFTED PRINCIPAL COMPONENT ANALYSIS BASED CUE EXTRACTION FOR STEREO AUDIO SIGNALS Jianjun HE, Ee-Leng Tan, Woon-Seng Gan Digital Signal Processing.
Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
GUIDED BY T.JAYASANKAR, ASST.PROFESSOR OF ECE, ANNA UNIVERSITY OF TIRUCHIRAPPALLI. PRESENTED BY C.SENTHILKUMAR, REG.NO: , M.E(MBCBS),COM SYSTEM,VI.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Feedback Cancellation in Public Address systems using notch filters 12th Nov Shailesh Kulkarni Vaibhav Mathur Digital Signal Processing-Advanced.
Surround-Adaptive Local Contrast Enhancement for Preserved Detail Perception in HDR Images Geun-Young Lee 1, Sung-Hak Lee 1, Hyuk-Ju Kwon 1, Tae-Wuk Bae.
Basics and Principles of Scientific Research By Ass. Prof. Dr. Majid S. Naghmash Diglah University College Department of Computer Engineering Techniques.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street,
Automatic Transcription System of Kashino et al. MUMT 611 Doug Van Nort.
On the improvement of virtual localization in vertical directions using HRTF synthesis and additional filtering Wersényi György SZÉCHENYI ISTVÁN UNIVERSITY,
Ultrasonic imaging parameters ~Attenuation coefficient Advisor: Pai-Chi Li Student: Mei-Ru Yang Wei-Ning Lee.
3-D Sound and Spatial Audio MUS_TECH 348. Are IID and ITD sufficient for localization? No, consider the “Cone of Confusion”
On the manifolds of spatial hearing
Investigating the Frequency Dependence Elements of CMOS RFIC Interconnects for Physical Modeling B. H. Ong; C. B. Sia; K. S. Yeo; J. G. Ma; M. A. Do; E.
Review on Test-Based Approach of Software Reliability November 22 nd, 2010 Nuclear I&C and Information Engineering LabKAIST Bo Gyung Kim.
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.
3-D Sound and Spatial Audio MUS_TECH 348. What do these terms mean? Both terms are very general. “3-D sound” usually implies the perception of point sources.
BRAIN Alliance Research Team Annual Progress Report (Jul – Feb
Auditory Localization in Rooms: Acoustic Analysis and Behavior
Evaluation of mA Switching Method with Penalized Weighted Least-Square Noise Reduction for Low-dose CT Yunjeong Lee, Hyekyun Chung, and Seungryong Cho.
Recognition of arrhythmic Electrocardiogram using Wavelet based Feature Extraction Authors Atrija Singh Dept. Of Electronics and Communication Engineering.
Amin Zehtabian, Hassan Ghassemian CONCLUSION & FUTURE WORKS
Linglong Dai and Zhaocheng Wang Tsinghua University, Beijing, China
3. Applications to Speaker Verification
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Speech Enhancement Based on Nonparametric Factor Analysis
Online Education Evaluation for Signal Processing Course
Presentation transcript:

On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE, Woon-Seng Gan, and Ee-Leng Tan 24 th April 2015 Digital Signal Processing Lab, School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

WHY 2 1.Head-related transfer functions (HRTFs) are highly individualized; 2.HRTFs are closely related to Anthropometry (torso, head, pinna); 3.Anthropometry can be used for HRTF individualization. Individualization Anthropometry of a new person HRTF of the new person Anthropometry database HRTF database In this paper, we aim to answer: 1.Whether the preprocessing and postprocessing methods affect the performance of HRTF individualization? 2.If so, what is the best preprocessing and postprocessing combination? 3.And, how good is it?

CIPIC Anthropometric data (35 subjects) 3 V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in Proc. IEEE WASPAA, New Paltz, NY, Oct

Methodology 4 Individualization A1A1 H1?H1? A H Anthropometry database A : S subjects * 1 set of Anthropometry (F features) Anthropometry of a new person A 1 : 1 subjects * 1 set of Anthropometry HRTF database H : S subjects * 1 set of HRTF (D directions * K points) HRTF of a new person H 1 : 1 subjects * 1 set of HRTF P. Bilinski, J. Ahrens, M. R. P. Thomas, I. Tashev, and J. C. Plata, “HRTF magnitude synthesis via sparse representation of anthropometric features,” in Proc. IEEE ICASSP, Florence, Italy, pp , May 2014.

Methodology 5 Preprocessing of Anthropometry i 1.Direct 2.Min-max normalization 3.Standard score 4.Standard deviation normalization Preprocessing of HRTF m 1.Magnitude 2.Log magnitude 3.Power Sparse representation j 1.Direct 2.Nonnegative Postprocessing of Anthropometry l 1.Direct 2.Normalized In total, we have variants of methods!

Evaluation 6 CIPIC HRTF database; Cross validation technique to selection the regularization parameter; S test = 35 test cases, all 1250 directions, and full frequency range. Performance varies among different preprocessing and postprocessing methods! Sparse representation PostAPreH PreA DirectMin-max Standard score Standard deviation Direct Mag Log mag Power Normalized Mag Log mag Power Nonnegative Direct Mag Log mag Power Normalized Mag Log mag Power

Results 7 Direct sparse representation 1. PreA: standard deviation best, standard score worst; 2. PreH: log mag best, power worst; 3. PostA, PostH: minimal effect for good PreA, PreH. Nonnegative sparse representation 1. Better than corresponding direct sparse representation (especially for standard score); 2. Trend in PreA/PreH not obvious; 3. Normalized PostA can improve the performance (especially for standard score). (a) Direct sparse; Direct PostA (b) Direct sparse; Normalized PostA (c) Nonnegative sparse; Direct PostA (d) Nonnegative sparse; Normalized PostA

MethodSpecificationSD (dB) Single best Select one single set of HRTF with the corresponding closest anthropometry 8.11 Tashev et al Min-max PreA Magnitude PreH Direct sparse representation No reported postprocessing 6.57 Our best Standard score PreA Log magnitude PreH Nonnegative sparse representation Normalized PostA 5.86 Lower bound Linear regression based HRTF individualization 5.12 Comparison 8

Conclusions 9 1.Introduced preprocessing and postprocessing in HRTF individualization based on sparse representation of anthropometric features. 2.Investigated 48 variants of preprocessing and postprocessing methods, and found a)Preprocessing and postprocessing methods do affect the performance of HRTF individualization, though the effects differ in different combinations; b)Adding nonnegative constraints in sparse representation improves the performance; c)The best combination for HRTF individualization is. 3.Established the lower bound for this type of HRTF individualization and verified that “our best” combination outperforms existing approaches and is quite close to the lower bound. 4.Future work: subjective evaluation of HRTF individualization.

References 10 [1] D. R. Begault, 3-D Sound for Virtual Reality and Multimedia, Cambridge, MA: Academic Press, [2] J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization, The MIT Press, revised edition, [3] H. Møller, “Fundamentals of Binaural Technology,” Applied Acoustics, vol. 36, , [4] W. G. Gardner, and K. D. Martin, “HRTF Measurements of a KEMAR,” J. Acoust. Soc. Amer., vol., vol. 97, pp , See also [5] H. Møller, M. F. Sørensen, D. Hammershøi, and C. B. Jensen, “Head-Related Transfer Functions of Human Subjects,” J. Aud. Eng. Soc., vol. 43, pp , [6] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in Proc. IEEE WASPAA, New Paltz, NY, USA, Oct [7] E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman, “Localization Using Non-individualized Head- Related Transfer Functions,” J. Acoust. Soc. Amer., vol. 94, pp , [8] S. Xu, Z. Li, and G. Salvendy, “Individualization of Head-related transfer function for three-dimensional virtual auditory display: a review,” in R. Shumaker (Ed.): Virtual Reality, HCII 2007, LNCS 4563, pp. 397–407, [9] K. Sunder, J. He, E. L. Tan, and W. S. Gan, “Natural sound rendering for headphones,” IEEE Signal Processing Magazine, vol. 32, no.2, pp , Mar [26] P. Bilinski, J. Ahrens, M. R. P. Thomas, I. Tashev, and J. C. Plata, “HRTF magnitude synthesis via sparse representation of anthropometric features,” in Proc. IEEE ICASSP, Florence, Italy, pp , May [28] S. J. Kim, K. Koh, M. Lusig, S. Boyd, and D. Gorinevsky, “An interior-point method for large-scale l1- regularized least squares,” J. Selected topics in signal processing, vol. 1, no. 4, pp , Dec [30] J. Breebaart, F. Nater, and A. Kohlrausch, “Spectral and spatial parameter resolution requirements for parametric, filter-bank-based HRTF processing,” J. Audio Eng. Soc., vol. 58, no. 3, pp , Mar

Acknowledgement 11 THIS WORK IS SUPPORTED BY THE SINGAPORE MINISTRY OF EDUCATION ACADEMIC RESEARCH FUND TIER-2, UNDER RESEARCH GRANT MOE2010-T

On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of Anthropometric Features Jianjun HE Nanyang Technological University, Singapore Thank you!