Gammachirp Auditory Filter

Slides:



Advertisements
Similar presentations
Decibel values: sum and difference. Sound level summation in dB (1): Incoherent (energetic) sum of two different sounds: Lp 1 = 10 log (p 1 /p rif ) 2.
Advertisements

Frequency analysis.
Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.
1 Non-Linearities Linear systems (e.g., filters) can change the intensity and phase of a signal input. Non-linear systems (e.g., amplfiers) not only can.
Psychoacoustics Riana Walsh Relevant texts Acoustics and Psychoacoustics, D. M. Howard and J. Angus, 2 nd edition, Focal Press 2001.
Stefan Bleeck, Institute of Sound and Vibration Research, Hearing and Balance Centre University of Southampton.
Hearing and Deafness 2. Ear as a frequency analyzer Chris Darwin.
Hearing and Deafness Outer, middle and inner ear.
MIMICKING THE HUMAN EAR Philipos Loizou (author) Oliver Johnson (me)
The peripheral auditory system David Meredith Aalborg University.
AES 120 th Convention Paris, France, 2006 Adaptive Time-Frequency Resolution for Analysis and Processing of Audio Alexey Lukin AES Student Member Moscow.
Time-Frequency Analysis of Non-stationary Phenomena in Electrical Engineering Antonio Bracale, Guido Carpinelli Universita degli Studi di Napoli “Federico.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
An Introduction to S-Transform for Time-Frequency Analysis S.K. Steve Chang SKC-2009.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
HEARING MUSICAL ACOUSTICS Science of Sound Chapter 5.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Hearing and Deafness 1. Anatomy & physiology Chris Darwin Web site for lectures, lecture notes and filtering lab:
The Auditory System. Audition (Hearing)  Transduction of physical sound waves into brain activity via the ear. Sound is perceptual and subjective. 
Structure and function
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
EE 198 B Senior Design Project. Spectrum Analyzer.
Welcome To The Odditory System! Harry I. Haircell: Official Cochlea Mascot K+K+ AIR FLUID amplification.
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Audio and Music Representations (Part 2) 1.
LE 460 L Acoustics and Experimental Phonetics L-13
Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Abstract We report comparisons between a model incorporating a bank of dual-resonance nonlinear (DRNL) filters and one incorporating a bank of linear gammatone.
EE Audio Signals and Systems Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
A physiologically motivated gammachirp auditory filterbank Toshio Irino (NTT Communication Sciences. Lab. Japan) Masashi Unoki (CNBH, Univ. Cambridge/JAIST)
Multiresolution STFT for Analysis and Processing of Audio
The Story of Wavelets.
By Sarita Jondhale1 Signal Processing And Analysis Methods For Speech Recognition.
Hearing Chapter 5. Range of Hearing Sound intensity (pressure) range runs from watts to 50 watts. Frequency range is 20 Hz to 20,000 Hz, or a ratio.
Basics of Neural Networks Neural Network Topologies.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Studies of Information Coding in the Auditory Nerve Laurel H. Carney Syracuse University Institute for Sensory Research Departments of Biomedical & Chemical.
Wavelet transform Wavelet transform is a relatively new concept (about 10 more years old) First of all, why do we need a transform, or what is a transform.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Figure 7 Measured equal loudness curves. 4 Modelling random changes in the parameters along the length of the cochlea and the effect on hearing sensitivity.
HEARING MUSICAL ACOUSTICS Science of Sound Chapter 5 Further reading: “Physiological Acoustics” Chap. 12 in Springer Handbook of Acoustics, ed. T. Rossing.
Applied Psychoacoustics Lecture 3: Masking Jonas Braasch.
A NEW FEATURE EXTRACTION MOTIVATED BY HUMAN EAR Amin Fazel Sharif University of Technology Hossein Sameti, S. K. Ghiathi February 2005.
“Digital stand for training undergraduate and graduate students for processing of statistical time-series, based on fractal analysis and wavelet analysis.
Hearing Research Center
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
A Model of Binaural Processing Based on Tree-Structure Filter-Bank
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
Performance Comparison of Speaker and Emotion Recognition
The Traveling Wave. Reminder 2 Frequency Amplitude Frequency Phase Frequency domain Time domain (time) waveform Amplitude spectrum.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Analysis of Traction System Time-Varying Signals using ESPRIT Subspace Spectrum Estimation Method Z. Leonowicz, T. Lobos
The Story of Wavelets Theory and Engineering Applications
Piano Music Transcription Wes “Crusher” Hatch MUMT-614 Thurs., Feb.13.
Auditory Computation Overview: I. Physiological Foundations II. Elements of Auditory Computation.
HEARING MUSICAL ACOUSTICS Science of Sound Chapter 5 Further reading: “Physiological Acoustics” Chap. 12 in Springer Handbook of Acoustics, ed. T. Rossing.
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
Hearing, not trying out for a play
Govt. Polytechnic Dhangar(Fatehabad)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Gammachirp Auditory Filter Alex Park May 7th, 2003

Project Overview Goal: Background: Comparison: Extension: Investigate use of (non-linear) auditory filters for speech analysis Background: Sound analysis in auditory periphery similar to wavelet transform Comparison: Traditional Short-Time Fourier analysis Gammatone wavelet based analysis (auditory filter) Extension: Gammachirp filter has level-dependent parameters which can model non-linear characteristics of auditory periphery Implementation: Specifics of Gammachirp implementation How to incorporate level dependency

Auditory Physiology Sound pressure variation in the air is transduced through the outer and middle ears onto end of cochlea Basilar membrane which runs throughout the cochlea maps place of maximal displacement to frequency Outer ear Middle ear Cochlea Auditory Nerve Low freq (200 Hz) Cortex High freq (20 kHz) Basilar Membrane

Motivation – Why better auditory models? Automatic Speech Recognition (ASR) ASR systems perform adequately in ‘clean’ conditions Robustness is a major problem; degradation in low SNR conditions is much worse than humans Hearing research Build better hearing aids and cochlear implants Hearing impaired subjects with damaged cochlea have trouble understanding speech in noisy environments Current hearing aids perform linear amplification, amplify noise as well as the signal Is the lack of compressive non-linearity in the front-end a common link?

Non-stationary Nature of Speech Why is speech a good candidate for local frequency analysis? Waveform of the word “tapestry” /t/ transient /ae/ tone /s/ noise

Time-Frequency Representation The most common way of representing changing spectral content is the Short Time Fourier Transform (STFT) Power FFT

Spectrogram from STFT “tapestry”

STFT Characteristics We can think of the STFT as filtering using the following basis In the frequency domain, we are using a filterbank consisting of linearly spaced, constant bandwidth filters Freq (Hz)

Auditory Filterbanks Unlike the STFT, physiological data indicates that auditory filters: are spaced more closely at lower freq than at high freq have narrower bandwidths at lower frequencies (constant-Q) The Gammatone filter bank proposed by Patterson, models these characteristics using a wavelet transform. The mother wavelet, or kernel function, is Gamma Envelope Tone carrier

Gammatone Characteristics Unlike the STFT, the Gammatone filterbank uses the following basis The corresponding frequency responses are Freq (Hz)

What are we missing? The Gammatone filterbank has constant-Q bandwidths and logarithmic spacing of center frequencies Also, Gamma envelope guarantees compact support But, the filters are 1) symmetric and 2) linear Psychophysical experiments indicate that auditory filter shapes are: 1) Asymmetric Sharper drop-off on high frequency side 2) Non-linear Filter shape and gain change depending on input level Compressive non-linearity of the cochlea Important for hearing in noise and for dynamic range

Gammachirp Characteristics The Gammachirp filter developed by Irino & Patterson uses a modified version of the Gammatone kernel Gamma Envelope Tone carrier Chirp term Frequency response is asymmetric, can fit passive filter Level-dependent parameters can fit changes due to stimulus

Implementation Looking in the frequency domain, the Gammachirp can be obtained by cascading a fixed Gammatone filter with an asymmetric filter To fit psychophysical data, a fixed Gammachirp is cascaded with level-dependent asymmetric IIR filters

Comparison: Tone vs. Passive Chirp outputs Gammatone output seems to have better frequency res. Passive Gammachirp output seems to have better time res.

Comparison: Tone vs. Active Chirp Outputs

Incorporating level dependency As illustrated in previous slide, passive Gammachirp output offers little advantage on clean speech using fixed stimulus levels We can incorporate parameter control via feedback Compute Passive GC Spectrogram Segment into frames Get stimulus level/channel Filter w/ level specific filter S1 S2 : SN-1 SN For each time frame Reconstruct Frames

Sample outputs Clean 30dB SNR 40dB SNR 20dB SNR

References Bleeck, S., Patterson, R.D., and Ives, T. (2003) Auditory Image Model for Matlab. Centre for the Neural Basis of Hearing. http://www.mrc-cbu.cam.ac.uk/cnbh/aimmanual/Introduction/ Irino, T. and Patterson, R.D. (2001). “A compressive gammachirp auditory filter for both physiological and psychophysical data,” J. Acoust. Soc. Am. 109, 2008-2022. Pickles, J.O. (1988). An Introduction to the Physiology of Hearing (Academic, London). Slaney, M. (1993). “An efficient implementation of the Patterson-Holdsworth auditory filterbank,” Apple Computer Technical Report #35. Slaney, M. (1998). “Auditory Toolbox for Matlab,” Interval Research Technical Report #1998-010. http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/

Sidenote Clean 40 dB SNR 30 dB SNR