In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research.

Slides:



Advertisements
Similar presentations
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Advertisements

Spectral envelope analysis of TIMIT corpus using LP, WLSP, and MVDR Steve Vest Matlab implementation of methods by Tien-Hsiang Lo.
Combining Heterogeneous Sensors with Standard Microphones for Noise Robust Recognition Horacio Franco 1, Martin Graciarena 12 Kemal Sonmez 1, Harry Bratt.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Microphone Array Post-filter based on Spatially- Correlated Noise Measurements for Distant Speech Recognition Kenichi Kumatani, Disney Research, Pittsburgh.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Improvement of Audibility for Multi Speakers with the Head Related Transfer Function Takanori Nishino †, Kazuhiro Uchida, Naoya Inoue, Kazuya Takeda and.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
INCORPORATING MULTIPLE-HMM ACOUSTIC MODELING IN A MODULAR LARGE VOCABULARY SPEECH RECOGNITION SYSTEM IN TELEPHONE ENVIRONMENT A. Gallardo-Antolín, J. Ferreiros,
HIWIRE Progress Report Chania, May 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Advances in WP2 Trento Meeting – January
Speaker Adaptation for Vowel Classification
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Speech Recognition in Noise
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Optimal Adaptation for Statistical Classifiers Xiao Li.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Dynamic Time Warping Applications and Derivation
CS 2750 Project Report Jason D. Bakos. Project Goals Data Sensor readings from 11 different people walking in a controlled environment An accelerometer.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Acoustic Echo Cancellation Using Digital Signal Processing. Presented by :- A.Manigandan( ) B.Naveen Raj ( ) Parikshit Dujari ( )
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Regression Approaches to Voice Quality Control Based on One-to-Many Eigenvoice Conversion Kumi Ohta, Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, and.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Generalized Model Selection For Unsupervised Learning in High Dimension Vaithyanathan and Dom IBM Almaden Research Center NIPS ’ 99.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
RECONSTRUCTION OF MULTI- SPECTRAL IMAGES USING MAP Gaurav.
Date of download: 6/22/2016 Copyright © 2016 SPIE. All rights reserved. Glucose sensor architecture. The lamp provides broadband electromagnetic radiation.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Speech Enhancement Summer 2009
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Online Multiscale Dynamic Topic Models
Linear Predictive Coding Methods
Statistical Models for Automatic Speech Recognition
10701 / Machine Learning Today: - Cross validation,
A Tutorial on Bayesian Speech Feature Enhancement
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research Nagoya University

Background In-car Speech Recognition using multiple microphones –Since the position of the speaker and noise are not fixed, many sophisticated algorithms are difficult to apply. –Robust criterion for parameter optimizing is necessary. Multiple Regression of Log Spectra (MRLS) –Minimize the log spectral distance between the reference speech and the multiple regression results of the signals captured by distributed microphones. Filter parameter optimization for microphone array (M.L. Seltzer, 2002) –Maximize the likelihood by changing the filter parameters of a microphone array system for a reference utterance.

Sample utterances idling city area expressway

・・・ distant microphones Regression Weights Speech Signal MR Spectrum Analysis Spectrum Analysis Spectrum Analysis ・・・ log MFB output ・・・ Speech Recognition Approximate log MFB output Block diagram of MRLS

Modified spectral subtraction S N X1X1 XiXi XNXN HiHi GiGi Assume that power spectrum at each microphone position obey power sum rule.

Taylor expansion of log spectrum

Multiple regression of log spectrum Minimum error is given when

1 1 0 Optimal regression weights Reduction of freedom in optimization

Experimental Setup for Evaluation Recorded with 6 microphones Training data –Phonetically balanced sentences –6,000 sentences while idling –2,000 sentences while driving –200 speakers Test data –50 isolated word utterances –15 different driving conditions road (idling/ city area/ expressway) in-car (normal/ fan-low/ fan-hi/ CD play/ window open) –18 speakers top view side view distributed microphone positions

Recognition experiments HMMs: –Close-talking: close-talking microphone speech. –Distant-mic.: nearest distant microphone (mic. #6) speech. –MLLR: nearest distant mic. speech after MLLR adaptation. –MRLS: MRLS results obtained by the optimal regression weights for each training utterance. Test Utterances –Close-talking speech (CLS-TALK) –Distant-microphone speech (DIST) –Distant-microphone speech after MLLR adaptation (MLLR) –MRLS results of the 6 different weights optimized for: each utterance (OPT) each speaker (SPKER) each driving condition (DR) all training corpus (ALL)

Performance Comparison (average over 15 different conditions) MRLS

Clustering in-car sound environment Clustering in-car sound environment using a spectrum feature concatenating distributed microphone signals normalCDfan lofan hi window open Class Class Class Class Clustering Results

Adapting weights to sound environment Vary regression weights in accordance with the classification results. Same performance with speaker/condition dependent weights.

Summary Results –Log spectral multiple regression is effective for in-car speech recognition using distributed multiple microphones. –Especially, when the regression weights are trained for a particular driving condition, very high performance can be obtained. –Adapting weights to the diving condition improves the performance. Future works –Combing with microphone array.