Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.

Slides:



Advertisements
Similar presentations
Advances in WP1 Trento Meeting January
Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.
Online PLCA for Real-Time Semi-supervised Source Separation Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University.
Advanced Speech Enhancement in Noisy Environments
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:
An Alternative Approach of Finding Competing Hypotheses for Better Minimum Classification Error Training Mr. Yik-Cheung Tam Dr. Brian Mak.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis 1. EECS Department, Northwestern University 2. Advanced Technology Labs, Adobe Systems Inc. 3. University.
Advances in WP1 Turin Meeting – 9-10 March
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
Speech Recognition in Noise
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Advances in WP1 and WP2 Paris Meeting – 11 febr
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
SINGLE CHANNEL SPEECH MUSIC SEPARATION USING NONNEGATIVE MATRIXFACTORIZATION AND SPECTRAL MASKS Jain-De,Lee Emad M. GraisHakan Erdogan 17 th International.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
-1- ICA Based Blind Adaptive MAI Suppression in DS-CDMA Systems Malay Gupta and Balu Santhanam SPCOM Laboratory Department of E.C.E. The University of.
SPECTRO-TEMPORAL POST-SMOOTHING IN NMF BASED SINGLE-CHANNEL SOURCE SEPARATION Emad M. Grais and Hakan Erdogan Sabanci University, Istanbul, Turkey  Single-channel.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Noise Compensation for Speech Recognition with Arbitrary Additive Noise Ji Ming School of Computer Science Queen’s University Belfast, Belfast BT7 1NN,
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Ali Al-Saihati ID# Ghassan Linjawi
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Hsin-Ju Hsieh 謝欣汝, Wen-hsiang Tu 杜文祥, Jeih-weih Hung 洪志偉 暨南國際大學電機工程學系 報告者:汪逸婷 2012/03/20.
USE OF IMPROVED FEATURE VECTORS IN SPECTRAL SUBTRACTION METHOD Emrah Besci, Semih Ergin, M.Bilginer Gülmezoğlu, Atalay Barkana Osmangazi University, Electrical.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Subproject II: Robustness in Speech Recognition. Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Performance Comparison of Speaker and Emotion Recognition
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
2D-LDA: A statistical linear discriminant analysis for image matrix
Automatic speech recognition using an echo state network Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University.
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Motorola presents in collaboration with CNEL Introduction  Motivation: The limitation of traditional narrowband transmission channel  Advantage: Phone.
Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs C.G. Puntonet and A. Prieto (Eds.): ICA 2004 Presenter.
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Missing feature theory
Speech / Non-speech Detection
Presented by Chen-Wei Liu
Emad M. Grais Hakan Erdogan
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Speech Enhancement Based on Nonparametric Factor Analysis
Presentation transcript:

Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1

 Introduction  Background methods  Presented methods with NMF  Experimental Results  Conclusion 2

 The effect of noise in speech signals ◦ This thesis focuses on developing new algorithms to reduce the noise effect in speech recognition (Convolutional Noise) Channel Effect Background Noise (Additive Noise) Noisy Speech Clean Speech 3

 Noise causes a serious mismatch in the modulation spectrum of speech feature streams ◦ We try to normalize the modulation spectra under different SNR cases. 4

 The nonnegative matrix factorization(NMF) is a novel method in processing the modulation spectrum.  By using the following criterion 5

 In general, the two nonnegative matrix W and H is obtained in an iterative manner ◦ With an initial guess of W and H, the following multiplicative updating rule is employed to achieve a local minimum of the cost function and 6

 Conventional NMF for dealing with the modulation spectrum is relatively complicated: ◦ Using an iterative approach to find the best possible encoding vector h(given the basis matrix W is fixed) :  Iteration :  Termination : ◦ Processing the entire modulation frequency band of speech features  Only the first-half low frequency part is important for speech recognition 7

 Two ways to reduce the complexity ◦ Orthogonal projection  Find the orthogonal basis B for the basis matrix W, which can be done off-line  Obtain the new modulation spectrum by projecting the old modulation spectrum onto the column space of B ◦ Updating the low-frequency modulation spectrum  Reducing the computation while keeping the effect of enhancement in modulation spectrum 8

9

10

 Experimental setup ◦ Aurora-2 database  Clean condition trainng  Noise environment: Test set A: subway, babble, car, and exhibition noises Test set B:restaurant, street, airport, and train station noises Test set C: MIRS subway and MIRS street noises SNRs: clean, 20dB, 15dB, 10dB, 5dB, 0dB, -5dB ◦ HMM for each digit: 16 states and 20 Gaussian mixtures per state 11

 The iteration function :  The projection function :  The computational complexity(for a feature sequence) 12

13

 The negative spectra magnitude appeared (averaged for a feature sequence) ◦ The probability of producing negative magnitudes in NMF(p,f) and PCA(p,f) is very small even though they have no nonnegative constraints 14

 The recognition accuracy (%) of each method for different numbers of frequency points forming the low sub-band 15

16

 The proposed two schemes reduce the computational complexity of NMF a lot  Using the orthogonal projection in NMF further improves the recognition accuracy  Normalizing the low sub-band provides very similar results when compared with the full-band normalization 17