Presentation is loading. Please wait.

Presentation is loading. Please wait.

畳み込み非負値行列因子分解を用いた 音声パターンの教師無し学習と音素分類 Unsupervised learning of speech patterns and phone classifi- cation using Convolutive Non-negative Matrix Factorization.

Similar presentations


Presentation on theme: "畳み込み非負値行列因子分解を用いた 音声パターンの教師無し学習と音素分類 Unsupervised learning of speech patterns and phone classifi- cation using Convolutive Non-negative Matrix Factorization."— Presentation transcript:

1 畳み込み非負値行列因子分解を用いた 音声パターンの教師無し学習と音素分類 Unsupervised learning of speech patterns and phone classifi- cation using Convolutive Non-negative Matrix Factorization 2012/07/27(Fri.) M2 in Chikayama Lab. 37-106488 Wataru Hariya T-05-MU Convolutive Non-negative Matrix Factorization Convolutive Non-Negative Matrix Factorization (CNMF) [2] can extract transitional patterns. : Input Matrix (Spectrogram) : Weight Matrix: Pattern Matrix at time “t” : Approximate Matrix (1) (example) decomposition (R=2, T=2) 10010 10001 01001 01010 A = 1 11100001111000010 X = 0 00011110000111101 10000 00010 Y = 0 1 1234 5678 When A = 0123 0567 (A) = 1→1→ 3400 7800, (A) = ←2←2 Operator (example) Proposed Objective Function Kullback-Leibler Divergence with some constraints is used as the objective function of this experiment. This function makes Y as sparse as possible, i.e. one speech pattern exists at a time. Elements of a pattern matrix become smaller. (2) Proposed Max-Pooling for CNMF There is a problem that elements of a weight matrix become larger at columns where the corresponding pattern occurs. →A modified version of Max-Pooling for Convolutive Deep Belief Networks [3] was proposed in our paper. Max-Pooling 10000 00010 Y = 11000 00011 Y = mp Occurring Point Existing Point Future Work Stochastic Gradient Descent (R. G. J. Wijnhoven & P. H. N. de With, 2010) will be applied to our method in order to learn more large training set. It would be useful to refer to a way for estimating parameters R and T (R. J. Weiss & J. P. Bello, 2010). References [1] 藪謙一郎 他 “ 発話障害者支援のための音声生成器-その研究アプローチと設計概念 ", 電 子情報通信学会技術研究報告, 106(613):25-30, 2007. [2] P. Smaragdis, "Non-negative matrix factor deconvolution; extraction of multiple sound sources from monophonic inputs", in Lecture Notes in Computer Science3195 Springer, pp.494-499, 2004. [3] D. Scherer et al, "Evaluation of pooling operations in convolutional architectures for object recognition", in Proc. of the International Conference on Artificial Neural Networks, pp.92-101, 2010. Introduction It has been known that the consonant /s/ has a transit state similar to one of /t/ [1]. Phoneme Phoneme model Discrete model with transit states Observable signal A transit state Inference Our goal is to extract speech patterns corresponding to states in the discrete model and to improve accuracy of Speech Recognition. A better inference might be possible considering models with independent transit states. /sa//ta/ Reduction Experiments Phone Classification using the extracted patterns is conducted by Support Vector Machine (SVM) with Radial Basis Function (RBF). Corpus Corpus : TIMIT Acoustic-Phonetic Continuous Corpus Training Set : 100 utterances (20 speaker×5 utterances) Test Set : 100 utterances (20 speaker×5 utterances) CNMF Parameters Results Classification was successful to some extent. →Speech patterns related with phones could be extracted. Fig.1 : Accuracy of Our Method (R:Number of Patterns) Tab.1 : Accuracy of Other Methods Improvement


Download ppt "畳み込み非負値行列因子分解を用いた 音声パターンの教師無し学習と音素分類 Unsupervised learning of speech patterns and phone classifi- cation using Convolutive Non-negative Matrix Factorization."

Similar presentations


Ads by Google