Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao Lin and Hsin-His Chen Backward Machine Transliteration by Learning Phonetic Similarity PRESENTED AT SIXTH CONFERENCE ON NATURAL LANGUAGE LEARNING, TAIPEI, TAIWAN,2002

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction Grapheme-to-Phoneme( 音素, 音位 ) Transformation Similarity Measurement Learning Phonetic Similarity Experimental Result Conclusions Personal Opinion

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation a similarity-based framework to model the task of backward transliteration a learning algorithm to automatically acquire phonetic similarities from a corpus Backward transliteration: from a transliteration to original language, like “ 本拉登 ” =>Bin Laden

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective Backward machine transliteration by learning phonetic similarity 雨果 (Yu-guo) => Hugo

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction IPA : International Phonetic Alphabet( 國際音標 ) Yu-guo =>h j u g oU Hugo =>v k uo Similarity Measurement

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction CMU pronunciation dictionary 0.6 版 ftp://ftp.cs.cmu.edu/project/fgdata/dict

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Similarity Measurement-alignment Set is the alphabet set of two strings S 1 and S 2.,where ‘_’ stands for space. Space can be inserted into S 1 ’ and S 2 ’ S 1 ’ and S 2 ’ are aligned

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Similarity Measurement-score the phoneme pair (v k uo, h j u g oU) ={h, j, u, v, g, k, oU, uo, _}

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Similarity Measurement-score ={h, j, u, v, g, k, oU, uo, _}

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Similarity Measurement-Dynamic Dynamic programming to trade off : alignment similarity scoring matrix M OPTIMAL S 1 (j h u g oU) S 2 (v k uo)

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dynamic programming-Dynamic Set T is a n+1 by m+1 table where n is the length S 1, m is the length of S 2.

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Phonetic Similarity develop a learning algorithm to remove the efforts of assigning scores in the matrix capture the subtle difference How to prepare a training corpus, followed by the learning algorithm.

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Phonetic Similarity Positive pairs: original words and the transliterated words are matched Negative pairs: mismatch the original words and the transliterated words E i : original English C i : transliterated Chinese Corpus with n pairs 克林頓 本拉登 魯賓遜 Clinton Bin Laden Robinson n positive pair n (n-1) negative pair

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm Treat each training sample as a linear equation m is the size of the phoneme sets, m=9 w i, j is the row i and the column j of the scoring matrix x i, j is a binary value indicating the presence of w i, j in the alignment y is the similarity score.

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm Linear equation in the corpus can be conveniently represented in the matrix form,, R is the number of pairs in the corpus i stands for the i th sample pair in the corpus w i, j is the scoring matrix x i, j is a binary value y is the similarity score

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm The criterion is the sum-of-squared error minimized. The classical solution is to take the pseudo inverse of, i.e.,to obtain the w that minimizes the SSE, i.e. adopt the Widrow-Hoff rule to solve

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm k stands for the k th row in the matrix X i for the number of iterations is the learning rate is the momentum coefficient. is empirically set as as follows,

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm The w(i) is updated iteratively until the learned w appears to overfit. The iterations to ensure the w will converge to a vector satisfying Update w(i) immediately after encountering a new training sample instead of accumulating all errors of training samples The other speed-up technique is the momentum used to damp the oscillations..

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments.corpus is consisted of 1574 pairs of names 313 have no entries in the pronouncing dictionary. 97 phonemes used to represent these names, in which 59 and 51 phonemes are used for Chinese and English names. Rank is the position of the correct original word in a list of candidate words sorted.

21 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments.

22 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments.

23 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions Without any phonological analysis, the learning algorithm can acquire those similarities without human intervention.

24 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Personal Opinion Drawback obtain the score matrix depend on a few empirically rule Is the experiment tie in with the testing samples ? Application A different method to compute the similarity between words. Future Work The Widrow-Hoff rule may estimate the parameter to substitute for attempting intervention blinded. Combine sound speech recognize with this method to output a new objectivity method


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao."

Similar presentations


Ads by Google