A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.

Slides:



Advertisements
Similar presentations
Basic Spectrogram & Clinical Application: Consonants
Advertisements

Acoustic Characteristics of Consonants
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Advanced Speech Enhancement in Noisy Environments
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Speaker Adaptation for Vowel Classification
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Communications & Multimedia Signal Processing Formant Tracking LP with Harmonic Plus Noise Model of Excitation for Speech Enhancement Qin Yan Communication.
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
LE 460 L Acoustics and Experimental Phonetics L-13
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Topics covered in this chapter
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Second International Conference on Intelligent Interactive Technologies and Multimedia (IITM 2013), March 2013, Allahabad, India 09 March 2013 Speech.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
IIT Bombay 1/26 Automated CVR Modification for Improving Perception of Stop Consonants A. R. Jayan & P. C. Pandey EE Dept, IIT.
Pitch Determination by Wavelet Transformation Santhosh Bellikoth ECE Speech Processing Instructor: Dr Kepuska.
國立交通大學 電信工程研究所 National Chiao Tung University Institute of Communication Engineering 1 Phone Boundary Detection using Sample-based Acoustic Parameters.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
IIT Bombay ICSCI 2004, Hyderabad, India, Feb’ 04 Introduction Analysis / synthesis Spec. Sub. Methodology Results Conclusion and.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
EE Dept., IIT Bombay P. C. Pandey, "Signal processing for persons with sensorineural hearing loss: Challenges and some solutions,”
Performance Comparison of Speaker and Emotion Recognition
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
EE Dept., IIT Bombay Part B Sliding-band Dynamic Range Compression (N. Tiwari & P. C. Pandey, NCC 2014) P. C. Pandey, "Signal processing.
EE Dept., IIT Bombay Workshop “Radar and Sonar Signal Processing,” NSTL Visakhapatnam, Aug 2015 Coordinator: Ms. M. Vijaya.
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
Acoustic Phonetics 3/14/00.
Comparison of filters for burst detection M.-A. Bizouard on behalf of the LAL-Orsay group GWDAW 7 th IIAS-Kyoto 2002/12/19.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
IIT Bombay ICSCN International Conference on Signal Processing, Communications and Networking 1/30 Intro.Intro. Clear speech.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Speech Enhancement Algorithm for Digital Hearing Aids
Speech Enhancement Summer 2009
Vocoders.
Automated Detection of Speech Landmarks Using
QRS Detection Linda Henriksson 1.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Matched Filtering Junwei Cao (曹军威) and Junwei Li (李俊伟)
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Norm-Based Coding of Voice Identity in Human Auditory Cortex
EE513 Audio Signals and Systems
Attentive Tracking of Sound Sources
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to incorporate properties of clear speech. It needs automated detection of stop land-marks and enhancement of bursts and transition segments. A technique for accurate detection of stop landmarks in continuous speech based on parameters derived from Gaussian mixture modeling (GMM) of log magnitude spectrum is presented. Applying the technique on sentences from the TIMIT database resulted in burst detection rates of 98, 97, 95, 90, and 73 % at temporal accuracies of 30, 20, 15, 10, and 5 ms respectively.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 2 1. INTRODUCTION Acoustic Landmarks Regions with concentration of phonetic information, important for speech perception Stop Landmarks  Closure  Release burst  Onset of voicing Closure ▲ Release burst ▲ ▲ Onset of voicing /apa/

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 3 Problems in Stop Perception Perception of transient sounds with low intensity severely affected by noise / hearing impairment Clear Speech  Style adapted by speakers under noisy conditions (~17 % more intelligible than conversational speech)  Acoustic landmarks modified in duration & intensity ◄ Conversational ▼ Clear ‘the book tells a story’

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 4 Speech Intelligibility Enhancement Using Properties of Clear Speech  Automated detection of landmarks with  Good temporal accuracy  High detection rate and low false detections  Modification of speech characteristics around the stop landmarks

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 5 Some Earlier Landmark Detection Techniques  Liu (1996): Rate-of-rise measures of parameters from a set of fixed spectral bands. Detection rate: 84 % at ms, ~50 % at 5-10 ms.  Niyogi & Sondhi (2002): Optimal filtering approach with log energy, log energy in the band > 3 kHz & Wiener entropy. Detection rate  90 % at 20 ms.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 6 Objective Detection of stop landmarks using Gaussian mixture modeling (GMM) of speech spectrum ▪ for improving the temporal accuracy of detection and reducing insertion errors ▪ with adaptation to speech variability ▪ for enhancing burst and transition segments to improve speech intelligibility under adverse listening conditions

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 7 2. GAUSSIAN MIXTURE MODELING OF SHORT- TIME SPEECH SPECTRUM Approximation of spectrum using a weighted sum of Gaussian functions  Means  Variances  Mixture weights  Good spectral approximation with 4 or 5 Gaussians (approximating the spectral resonances)  Adaptive to speech variability

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 8 Spectral Modeling  Short-time log magnitude spectrum of speech signal (S.R. = 10 kHz)  6 ms Hanning windowed frames (for suppressing the harmonic structure)  1 frame per ms (for tracking abrupt variations)  512-point DFT  Estimation of GMM parameters using Expectation Maximization (EM) algorithm

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 9 Estimation of GMM Parameters  Spectrum treated as histogram with rectangular bins placed at each frequency index  Iterative computation of parameters as maximum likelihood estimates  Initialization  Means: Average formant frequencies [600, 1200, 2400, 3600 Hz]  Variances: Extreme formant bandwidths [160, 200, 300, 400 Hz]  Mixture weights: Equal for all Gaussians  Number of iterations: ≤ 12

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 10 Example: Modeling for a segment of vowel / a / Modeling of a segment of vowel /a/: (a) windowed segment of 6 ms, (b) log magnitude spectrum (in dB), (c) smoothened spectrum (in dB), (d) GMM approximated spectrum with dotted lines indicating the individual Gaussian components. Ag(n)Ag(n) g(n)g(n) g(n)g(n)

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay DETECTION OF STOP LANDMARKS Detection based on  Rate of change (ROC) of GMM parameters  Voicing onset offset detector  Spectral flatness measure

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 12

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 13 GMM Rate of Change  A g,  g,  g smoothened by 30-point median filter  ROC: First difference (time step = 2 ms)  ROC Peak → Possible location of burst onset

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 14 Voicing Onset-Offset Detection [Liu, 1996]  Energy variations E(n) in 0:400 Hz band (6 ms Hanning windowed segments, every 1 ms)  Rate-of-rise r e (n) with 26 ms time-step  Voicing onset [+g]: r e ( n )  +9dB Voicing offset [-g]: r e (n)  -9dB Spectral Flatness Measure [ Skowronski & Harris, 2006] (20 ms Hanning windowed segments, every 1 ms)  Fricative segments: SFM  1  Voiced segments: SFM  0

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 15 Stop Landmark Detection  For a voicing onset [ +g ] or voicing offset [ -g ] at t, locate the preceding [ +g ] or [ -g ] If [ -g ] at t 0, select GMM ROC peak at t b during ( t 0 -50, t ms ), Else select GMM ROC peak at t b during ( t-50, t ms) as the burst candidate.  A burst is declared, if { SFM > 0.5 for 1 ms during ( t b -15, t b +15 ms)} and {each of the norm. ampl. A 2, A 3, A 4 < 0.5 for at least 10 ms during ( t 0, t b )}.  For burst at t b, closure is located at t 0, and voicing onset at t.

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 16 / apa /: Waveform (a), Gaussian parameter tracks (b: 1 st, c: 2 nd, d: 3 rd, e: 4 th ). (a) (b) (c) (d) (e) Ag(n)Ag(n) g(n)g(n) g(n)g(n)

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 17 / apa /: Waveform (a), Spectrogram (b), GMM spectrogram (c), Gaussian ROC (d) (a) (b) (c) (d)

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 18 -g +g (a) (b) (c) (d) (e) A2A2 A4A4 A3A3 / apa /: Waveform (a), -g, +g peaks (b), SFM (c), GMM ROC (d), Normalized Gaussian amplitudes for Gaussian 2, 3, 4 (e) tbtb t t0t0 ROC peak SFM

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay TEST RESULTS Comparison with manually labeled landmarks  VCV utterances ▪ Stops / b /, / d /, / g /, / p /, / t /, / k / & vowels / a /, / i /, / u / ▪ 10 speakers (5 F, 5 M)  TIMIT sentences ▪ 50 sentences ▪ 5 speakers (3 F, 2 M)

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay Det. Rates for VCV Utterances

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 21 Det. Rates for TIMIT Sentences Insertions : 13 % ( Clicks, glottal stops : 8 %, Vowel-semivowel : 4 %, Stop to /l/, /r/ : 1 % )

A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay CONCLUSION Detection rate obtained using GMM based technique: comparable to other methods at ms temporal accuracy, better at ms.