Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), 09-11 March 2013, Allahabad, India 09 March 2013 Speech.

Slides:



Advertisements
Similar presentations
Normal Aspects of Articulation. Definitions Phonetics Phonology Articulatory phonetics Acoustic phonetics Speech perception Phonemic transcription Phonetic.
Advertisements

Acoustic Characteristics of Consonants
Speech Perception Dynamics of Speech
Psychoacoustics Riana Walsh Relevant texts Acoustics and Psychoacoustics, D. M. Howard and J. Angus, 2 nd edition, Focal Press 2001.
Hearing and Deafness 2. Ear as a frequency analyzer Chris Darwin.
Hearing and Deafness Outer, middle and inner ear.
More From Music music through a cochlear implant Dr Rachel van Besouw Hearing & Balance Centre, ISVR.
EE Dept., IIT Bombay Workshop “AICTE Sponsored Faculty Development Programme on Signal Processing and Applications", Dept. of Electrical.
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
Hossein Sameti Department of Computer Engineering Sharif University of Technology.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Speech Perception Richard Wright Linguistics 453.
Recap: Vowels & Consonants V – central “sound” of the syllable C – outer “shell” of the syllable (C) V (C) (C)(C)(C)V(C)(C)(C)
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
TOPIC 4 BEHAVIORAL ASSESSMENT MEASURES. The Audiometer Types Clinical Screening.
Chapter 6: The Human Ear and Voice
Audiology Training Course ——Marketing Dept. Configuration of the ear ① Pinna ② Ear canal ③ Eardrum ④ Malleus ⑤ Incus ⑥ Eustachian tube ⑦ Stapes ⑧ Semicircular.
Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment [Ref.: N. Tiwari, P. C. Pandey, P. N.
Dr. P. C. Pandey EE Dept, IIT Bombay Education B.Tech. (BHU, 1979), M.Tech. (IIT Kanpur,1981), Ph.D. (Toronto, 1987) Employment.
SIGNAL PROCESSING IN HEARING AIDS
EE Dept., IIT Bombay NCC2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No (Session III, Sat., 1 st Mar., 1020 – 1200) A Sliding-band.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Speech Enhancement Using Noise Estimation Based on
1 SPEECH PROCESSING FOR BINAURAL HEARING AIDS Dr P. C. Pandey EE Dept., IIT Bombay Feb’03.
IIT Bombay Dr. Prem C. Pandey Dr. Pandey is a Professor in Electrical Engineering at IIT Bombay. He is currently also the Associate.
BASIC OVERVIEW OF THE EAR AND HEARING LOSS The Ear.
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
Speech Science Fall 2009 Oct 26, Consonants Resonant Consonants They are produced in a similar way as vowels i.e., filtering the complex wave produced.
♠♠♠♠ 1Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄◄ ► ► 1/161Intro 2.Loudness 3.Method. 4.Results 5.Concl. ♦♦ ◄ ► IIT Bombay ICA 2010 : 20th Int.
EE Dept., IIT Bombay Indicon2013, Mumbai, Dec. 2013, Paper No. 524 (Track 4.1,
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Hearing Test ng_test/ ng_test/
1/18 1.Intro 2. Implementation 3. Results 4. Con.
EE Dept., IIT Bombay NCC 2013, Delhi, Feb. 2013, Paper 3.2_2_ ( Sat.16 th, 1135 – 1320, 3.2_2) Speech Enhancement.
EE Dept., IIT Bombay NCC 2015, Mumbai, 27 Feb.- 1 Mar. 2015, Paper No (28 th Feb., Sat., Session SI, 10:05 – 11:15, Paper.
♠ 1.Intro 2. List. tests 3. Results 4 Concl.♠♠ 1.Intro 2. List. tests 3. Results 4 Concl. ♥♥ ◄◄ ► ► 1/17♥♥◄ ► IIT Bombay ICA 2010 : 20th Int. Congress.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Gammachirp Auditory Filter
EE Dept., IIT Bombay IEEE Workshop on Intelligent Computing, IIIT Allahabad, Oct Signal processing for improving speech.
Stops Stops include / p, b, t, d, k, g/ (and glottal stop)
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
EE Dept., IIT Bombay P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions,”
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
EE Dept., IIT Bombay Part B Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) P. C. Pandey, "Signal processing.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
You better be listening… Auditory Senses Sound Waves Amplitude  Height of wave  Determines how loud Wavelength  Determines pitch  Peak to peak High.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
Speech Generation and Perception
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
EE Dept., IIT Bombay CEP-cum-TEQUIP-KITE Course “Digital Signal Processing”, IIT Bombay, 2–6 November 2015, Course Coordinator:
Hearing As with the eye, the ear receives waves, this time of sounds. As with the eye, the ear receives waves, this time of sounds. Length of wave = pitch.
HIGH-RESOLUTION SINUSOIDAL MODELING OF UNVOICED SPEECH GEORGE P. KAFENTZIS, YANNIS STYLIANOU MULTIMEDIA INFORMATICS LABORATORY DEPARTMENT OF COMPUTER SCIENCE.
Saketh Sharma, Nitya Tiwari, & Prem C. Pandey
You better be listening…
Review: Hearing.
Automated Detection of Speech Landmarks Using
A Smartphone App-Based
Speech Generation and Perception
Speech Perception (acoustic cues)
Speech Generation and Perception
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech Processing for Persons with Moderate Sensorineural Hearing Impairment Prem C. Pandey EE Dept., IIT Bombay ee.iitb.ac.in

Outline A. Speech & Hearing B. Noise Suppression S. K. Waddi, P. C. Pandey, N. Tiwari Speech Enhancement Using Spectral Subtraction and Cascaded Median Based Noise Estimation for Hearing Impaired Listeners (Proc. NCC 2013, Delhi, Feb. 2013, Paper 3.2_2_ ) C: Reducing the Effect of Increased Spectral Masking N. Tiwari, P. C. Pandey, P. N. Kulkarni Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment (,Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689) A. Speech & Hearing B. Noise Suppression S. K. Waddi, P. C. Pandey, N. Tiwari Speech Enhancement Using Spectral Subtraction and Cascaded Median Based Noise Estimation for Hearing Impaired Listeners (Proc. NCC 2013, Delhi, Feb. 2013, Paper 3.2_2_ ) C: Reducing the Effect of Increased Spectral Masking N. Tiwari, P. C. Pandey, P. N. Kulkarni Real-time Implementation of Multi-band Frequency Compression for Listeners with Moderate Sensorineural Impairment (,Proc. Interspeech 2012, Portland, Oregon, 9-13 Sept 2012, Paper 689)

Speech Production Mechanism Excitation source & filter model Excitation: voiced/unvoiced glottal, frication Filtering: vocal tract filter

Speech segments Words Syllables Phonemes Sub-phonemic segments Phonemes: basic speech units Vowels: Pure vowels, Diphthongs Consonants: Semivowels, Stops, Fricatives, Affricates, Nasals /aba/ /apa//aga/ /ada/

Phonemic features Modes of excitation Glottal Unvoiced (aspiration, constriction at the glottis) Voiced (vibration of vocal chords) Frication Unvoiced (constriction in vocal tract) Voiced (constriction in vocal tract & glottal vibration) Movement of articulators Continuant (steady-state vocal tract configuration): vowels, nasal stops, fricatives Non-continuant (changing vocal tract): diphthongs, semivowels, oral stops (plosives) Place of articulation (place of maximum constriction in vocal tract) Bilabial, Labio-dental, Linguo-dental, Alveolar, Palatal, Velar, Gluttoral Changes in voicing frequency (Fo) Supra-segmental features Intonation Rhythm

Hearing Mechanism Peripheral auditory system External ear (sound collection) Pinna Auditory canal Middle ear (impedance matching) Ear drum Middle ear bones Inner ear (analysis and transduction): cochlea Auditory nerve (transmission of neural impulses) Central auditory system Information processing & interpretation

Tonotopic map of cochleaAuditory system

Types of hearing losses Conductive loss Sensorineural loss Central loss Functional loss Sensorineural hearing loss Elevated hearing thresholds Reduced intelligibility as speech components are inaudible Reduced dynamic range & loudness recruitment (abnormal loudness growth) Distortion of loudness relationship among speech components Increased temporal masking Poor detection of acoustic events Increased spectral masking (due to widening of auditory filters) Reduced frequency selectivity Reduced ability to sense spectral shapes of speech sounds >> Poor intelligibility and degraded perception of speech Types of hearing losses Conductive loss Sensorineural loss Central loss Functional loss Sensorineural hearing loss Elevated hearing thresholds Reduced intelligibility as speech components are inaudible Reduced dynamic range & loudness recruitment (abnormal loudness growth) Distortion of loudness relationship among speech components Increased temporal masking Poor detection of acoustic events Increased spectral masking (due to widening of auditory filters) Reduced frequency selectivity Reduced ability to sense spectral shapes of speech sounds >> Poor intelligibility and degraded perception of speech Hearing impairment

Currently available Frequency selective amplification Improves audibility but may not improve intelligibility in presence of noise Automatic volume control Multichannel dynamic range compression (settable attack time, release time, and compression ratios) Compresses the natural dynamic range into the reduced dynamic range Under Investigation Improvement of consonant-to-vowel ratio (CVR): for reducing the effects of increased temporal masking Techniques for reducing the effects of increased spectral masking: Binaural dichotic presentation, Spectral contrast enhancement, Multi-band frequency compression Noise suppression Currently available Frequency selective amplification Improves audibility but may not improve intelligibility in presence of noise Automatic volume control Multichannel dynamic range compression (settable attack time, release time, and compression ratios) Compresses the natural dynamic range into the reduced dynamic range Under Investigation Improvement of consonant-to-vowel ratio (CVR): for reducing the effects of increased temporal masking Techniques for reducing the effects of increased spectral masking: Binaural dichotic presentation, Spectral contrast enhancement, Multi-band frequency compression Noise suppression Signal processing in hearing aids

 Analog Hearing Aids Pre-amp → AVC → Selectable Freq. Response → Amp.  Programmable Digital Hearing Aids Pre-amp → AVC → Multi-band Amplitude Compression & Freq. Response → Amp.  Major Problems Noisy environment & reverberation Distortions due to multiband amplitude compression Poor speech perception due to increased spectral & temporal masking Visit to audiologist for change of settings  Proposed Hearing Aids (with user selectable settings) Pre-amp → AVC → Noise Suppression → Processing for Reducing the Effects of Increased Spectral Masking → Processing for Reducing the Effects of Increased Temporal Masking → Multi-band Amplitude Compression & Freq. Response → Amp.  Analog Hearing Aids Pre-amp → AVC → Selectable Freq. Response → Amp.  Programmable Digital Hearing Aids Pre-amp → AVC → Multi-band Amplitude Compression & Freq. Response → Amp.  Major Problems Noisy environment & reverberation Distortions due to multiband amplitude compression Poor speech perception due to increased spectral & temporal masking Visit to audiologist for change of settings  Proposed Hearing Aids (with user selectable settings) Pre-amp → AVC → Noise Suppression → Processing for Reducing the Effects of Increased Spectral Masking → Processing for Reducing the Effects of Increased Temporal Masking → Multi-band Amplitude Compression & Freq. Response → Amp.

Our Research Objectives  Developing techniques for improving speech perception by listeners with moderate-to-severe sensorineural loss Reduction of effects of increased spectral masking Binaural aids: Binaural dichotic presentation using comb filters for spectral splitting Monoaural aids: Mutiband frequency compression Reduction of spectral masking Enhancement of transient parts (weak & short but perceptually important ) Noise Suppression  Implementation of the techniques using a low-power DSP chip for real-time operation and with acceptable signal delay (< 60 ms) Our Research Objectives  Developing techniques for improving speech perception by listeners with moderate-to-severe sensorineural loss Reduction of effects of increased spectral masking Binaural aids: Binaural dichotic presentation using comb filters for spectral splitting Monoaural aids: Mutiband frequency compression Reduction of spectral masking Enhancement of transient parts (weak & short but perceptually important ) Noise Suppression  Implementation of the techniques using a low-power DSP chip for real-time operation and with acceptable signal delay (< 60 ms)

Our Research Objectives  Developing techniques for improving speech perception by listeners with moderate-to-severe sensorineural loss Reduction of effects of increased spectral masking Binaural aids: Binaural dichotic presentation using comb filters for spectral splitting Monoaural aids: Mutiband frequency compression Reduction of spectral masking Enhancement of transient parts (weak & short but perceptually important ) Noise Suppression  Implementation of the techniques using a low-power DSP chip for real-time operation and with acceptable signal delay (< 60 ms) Our Research Objectives  Developing techniques for improving speech perception by listeners with moderate-to-severe sensorineural loss Reduction of effects of increased spectral masking Binaural aids: Binaural dichotic presentation using comb filters for spectral splitting Monoaural aids: Mutiband frequency compression Reduction of spectral masking Enhancement of transient parts (weak & short but perceptually important ) Noise Suppression  Implementation of the techniques using a low-power DSP chip for real-time operation and with acceptable signal delay (< 60 ms)

P. C. Pandey (EE Dept, IIT Bombay): " Speech Processing for Persons with Moderate Sensorineural Hearing Impairment", Plenary talk, Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India Abstract Our objective is to develop techniques for improving speech perception by listeners with moderate-to-severe sensorineural loss and to implement these techniques using a low-power DSP chip for real-time operation and with acceptable signal delay (< 60 ms). Here we present two techniques to reduce the adverse effects of increased spectral masking associated with sensorimeural loss. The first technique reduces the effects of noise in the listening environment and the second one reduces the effects of increased intra-speech spectral masking. A spectral subtraction technique is presented for real-time speech enhancement in the aids used by hearing impaired listeners. For reducing computational complexity and memory requirement, it uses a cascaded-median based estimation of the noise spectrum without voice activity detection. The technique is implemented and tested for satisfactory real- time operation, with sampling frequency of 12 kHz, processing using window length of 30 ms with 50% overlap, and noise estimation by 3-frame 4-stage cascaded-median, on a 16-bit fixed-point DSP processor with on-chip FFT hardware. Enhancement of speech with different types of additive stationary and non-stationary noise resulted in SNR advantage of 4 – 13 dB. Widening of auditory filters in persons with sensorineural hearing impairment leads to increased spectral masking and degraded speech perception. Multi-band frequency compression of the complex spectral samples using pitch- synchronous processing has been reported to increase speech perception by persons with moderate sensorineural loss. It is shown that implementation of multi-band frequency compression using fixed-frame processing along with least- squares error based signal estimation reduces the processing delay and the speech output is perceptually similar to that from pitch-synchronous processing. The processing is implemented on a 16-bit fixed-point DSP processor and real-time operation is achieved using about one-tenth of its computing capacity. P. C. Pandey (EE Dept, IIT Bombay): " Speech Processing for Persons with Moderate Sensorineural Hearing Impairment", Plenary talk, Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India Abstract Our objective is to develop techniques for improving speech perception by listeners with moderate-to-severe sensorineural loss and to implement these techniques using a low-power DSP chip for real-time operation and with acceptable signal delay (< 60 ms). Here we present two techniques to reduce the adverse effects of increased spectral masking associated with sensorimeural loss. The first technique reduces the effects of noise in the listening environment and the second one reduces the effects of increased intra-speech spectral masking. A spectral subtraction technique is presented for real-time speech enhancement in the aids used by hearing impaired listeners. For reducing computational complexity and memory requirement, it uses a cascaded-median based estimation of the noise spectrum without voice activity detection. The technique is implemented and tested for satisfactory real- time operation, with sampling frequency of 12 kHz, processing using window length of 30 ms with 50% overlap, and noise estimation by 3-frame 4-stage cascaded-median, on a 16-bit fixed-point DSP processor with on-chip FFT hardware. Enhancement of speech with different types of additive stationary and non-stationary noise resulted in SNR advantage of 4 – 13 dB. Widening of auditory filters in persons with sensorineural hearing impairment leads to increased spectral masking and degraded speech perception. Multi-band frequency compression of the complex spectral samples using pitch- synchronous processing has been reported to increase speech perception by persons with moderate sensorineural loss. It is shown that implementation of multi-band frequency compression using fixed-frame processing along with least- squares error based signal estimation reduces the processing delay and the speech output is perceptually similar to that from pitch-synchronous processing. The processing is implemented on a 16-bit fixed-point DSP processor and real-time operation is achieved using about one-tenth of its computing capacity.