Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: Hichem.
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On Rival Penalization Controlled Competitive Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On-line Learning of Sequence Data Based on Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Presenter : Chien-Hsing Chen Author: Jong-Hoon Oh Key-Sun.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Virus Pattern Recognition Using Self-Organization Map.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning Phonetic Similarity for Matching Named Entity.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Plagiarism Detection Technique for Java Program Using.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Motivated Reinforcement Learning for Non-Player Characters.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Efficient Optimal Linear Boosting of a Pair of Classifiers.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Automatic Extraction of Translational Japanese- KATAKANA.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: YU-SHENG.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors :
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A text mining approach on automatic generation of web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Iterative Translation Disambiguation for Cross-Language.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Gustavo.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2005.ACM GECCO.8.Discriminating and visualizing anomalies.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Prediction model building and feature selection with support.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A clustering-based approach for prediction of cardiac.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology IEEE EC1 Generating War Game Strategies Using A Genetic.
Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao Lin and Hsin-His Chen Backward Machine Transliteration by Learning Phonetic Similarity PRESENTED AT SIXTH CONFERENCE ON NATURAL LANGUAGE LEARNING, TAIPEI, TAIWAN,2002

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction Grapheme-to-Phoneme( 音素, 音位 ) Transformation Similarity Measurement Learning Phonetic Similarity Experimental Result Conclusions Personal Opinion

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation a similarity-based framework to model the task of backward transliteration a learning algorithm to automatically acquire phonetic similarities from a corpus Backward transliteration: from a transliteration to original language, like “ 本拉登 ” =>Bin Laden

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective Backward machine transliteration by learning phonetic similarity 雨果 (Yu-guo) => Hugo

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction IPA : International Phonetic Alphabet( 國際音標 ) Yu-guo =>h j u g oU Hugo =>v k uo Similarity Measurement

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction CMU pronunciation dictionary 0.6 版 ftp://ftp.cs.cmu.edu/project/fgdata/dict

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Similarity Measurement-alignment Set is the alphabet set of two strings S 1 and S 2.,where ‘_’ stands for space. Space can be inserted into S 1 ’ and S 2 ’ S 1 ’ and S 2 ’ are aligned

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Similarity Measurement-score the phoneme pair (v k uo, h j u g oU) ={h, j, u, v, g, k, oU, uo, _}

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Similarity Measurement-score ={h, j, u, v, g, k, oU, uo, _}

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Similarity Measurement-Dynamic Dynamic programming to trade off : alignment similarity scoring matrix M OPTIMAL S 1 (j h u g oU) S 2 (v k uo)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dynamic programming-Dynamic Set T is a n+1 by m+1 table where n is the length S 1, m is the length of S 2.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Phonetic Similarity develop a learning algorithm to remove the efforts of assigning scores in the matrix capture the subtle difference How to prepare a training corpus, followed by the learning algorithm.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Phonetic Similarity Positive pairs: original words and the transliterated words are matched Negative pairs: mismatch the original words and the transliterated words E i : original English C i : transliterated Chinese Corpus with n pairs 克林頓 本拉登 魯賓遜 Clinton Bin Laden Robinson n positive pair n (n-1) negative pair

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm Treat each training sample as a linear equation m is the size of the phoneme sets, m=9 w i, j is the row i and the column j of the scoring matrix x i, j is a binary value indicating the presence of w i, j in the alignment y is the similarity score.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm Linear equation in the corpus can be conveniently represented in the matrix form,, R is the number of pairs in the corpus i stands for the i th sample pair in the corpus w i, j is the scoring matrix x i, j is a binary value y is the similarity score

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm The criterion is the sum-of-squared error minimized. The classical solution is to take the pseudo inverse of, i.e.,to obtain the w that minimizes the SSE, i.e. adopt the Widrow-Hoff rule to solve

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm k stands for the k th row in the matrix X i for the number of iterations is the learning rate is the momentum coefficient. is empirically set as as follows,

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Algorithm The w(i) is updated iteratively until the learned w appears to overfit. The iterations to ensure the w will converge to a vector satisfying Update w(i) immediately after encountering a new training sample instead of accumulating all errors of training samples The other speed-up technique is the momentum used to damp the oscillations..

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments.corpus is consisted of 1574 pairs of names 313 have no entries in the pronouncing dictionary. 97 phonemes used to represent these names, in which 59 and 51 phonemes are used for Chinese and English names. Rank is the position of the correct original word in a list of candidate words sorted.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions Without any phonological analysis, the learning algorithm can acquire those similarities without human intervention.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Personal Opinion Drawback obtain the score matrix depend on a few empirically rule Is the experiment tie in with the testing samples ? Application A different method to compute the similarity between words. Future Work The Widrow-Hoff rule may estimate the parameter to substitute for attempting intervention blinded. Combine sound speech recognize with this method to output a new objectivity method