Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Transliteration Bhargava Reddy 110050078 B.Tech 4 th year UG.

Similar presentations


Presentation on theme: "Machine Transliteration Bhargava Reddy 110050078 B.Tech 4 th year UG."— Presentation transcript:

1 Machine Transliteration Bhargava Reddy 110050078 B.Tech 4 th year UG

2 Contents Fundamental Definition of Machine Transliteration History of Machine Transliteration Modelling of the Transliteration Bridge Transliteration System Syllabification and the use of CRF Substring alignment and re-raking methods Using Hybrid Models

3 Definition of Machine Transliteration Conversion of a given name in source language to a name in target language such that the target language is: 1.Phonemically equivalent to the source language 2.Conforms to the phonology of the target language 3.Matches the user intuition of the equivalent of the source language name in the target language, considering the culture and orthographic character usage in the target language We need to note that all are equivalent in it’s own kind Ref: Report of NEWS 2012 Machine Transliteration Shared Task

4 Brief History of the work carried out Early Models for MT 1.Grapheme-based transliteration model (ψ G ) 2.Phoneme-based transliteration model (ψ P ) 3.Hybrid transliteration model (ψ H ) 4.Correspondence-based transliteration model (ψ C ) ψ G is known as direct method because it directly transforms source language graphemes to target language ψ P is called as pivot method because it uses source language as pivot when it produces target language graphemes Ref: A comparison of Different Machine Transliteration Models. 2006

5 Hybrid and Correspondence Models Both the model were combined as resulted in ψ C and ψ H ψ H directly combines phoneme based transliteration probability Pr(ψ P ) and grapheme based transliteration probability Pr(ψ G ) using linear interpolation (Dependence between them is not considered) ψ C made use of the correspondence between a source grapheme and a source phoneme when it produces target language graphemes Ref: 1. Improving back-transliteration by combining information sources 2. An English-Korean transliteration model using pronunciation and contextual rules

6 Graphical Representation Ref: A comparison of Different Machine Transliteration Models. 2006

7 Modelling the Components Maximum entropy model (MEM) is a widely used probability model that can incorporate heterogeneous information effectively. Thus used in Hybrid model Decision-Tree Learning used for creating the training set for the models Memory-based Learning (MBL) also known as “instance based learning” and “case-based learning”, is an example-based learning method. Useful for computation of φ (SP)T Ref: A comparison of Different Machine Transliteration Models. 2006

8 Study of MT through Bridging Languages Data is available between a language pair due to one of the following three reasons: 1.Politically related languages: Due to the political dominance of English it is easy to obtain parallel names data between English and most languages 2.Genealogically related languages: Languages sharing the same origin. Might have significant overlap between their phonemes and graphemes 3.Demographically related languages: Hindi and Telugu. Might not have the same origin but due to the shared culture and demographics there will be similarities Ref: Everybody loves a rich cousin: An empirical study of transliteration through bridge languages. NAACL 2010

9 Bridge Transliteration Methodology Ref: Everybody loves a rich cousin: An empirical study of transliteration through bridge languages. NAACL 2010

10 Results for the Bridge System We must remember that Machine Transliteration is a lossy conversion In the bridge system we can assume that we will get loss in information and thus the accuracy score will drop down The results have shown that there has been a drop in accuracy of about 8-9%(ACC1) and about 1-3%(Mean F-score) NEWS 2009 was used as a dataset for this training and evaluation of the results Ref: Everybody loves a rich cousin: An empirical study of transliteration through bridge languages. NAACL 2010

11 Stepping though an intermediate language Ref: Everybody loves a rich cousin: An empirical study of transliteration through bridge languages. NAACL 2010

12 Syllabification? No gold standard syllable segmentation Yang et al. (2009) applied N-gram joint source channel and EM algorithm Aramaki and Abekawwa (2009) made use of word alignment tool in GIZA++ to obtain a syllable segmentation and alignment corpus from the training data given Yand et al. (2010) proposed a joint optimization method to reduce the propaganda of alignment error The paper made syllabification of Chinese words

13 Forward-Backward Machine Transliteration between English and Chinese based on Combined CRFs The transliteration is implemented as a 2 phase CRF The first CRF splits the word into chunks(Similar to syllabification) The second CRF to label what target characters are transliterated Final transliteration is the sequence of all the target characters

14 Using CRF models for MT Hindi to English Machine transliteration model using CRF’s has been proposed by Manikrao, Shantanu and Tushar in the paper: “Hindi to English Machine Transliteration of named entities using CRFs” during the International Journal of Computer Applications(0975-8887) on June 2012 The description is show as follows:

15 Model Flow

16 Results of the CRF model proposed

17 English-Korean Transliteration using substring alignment and re-ranking methods Chun-Kai Wu, Yu-Chun Wang, Richard Tzong-Han Tsai described in their paper the approach for the MT. It consisted of 4 parts: 1.Pre-Processing 2.Letter-to-phoneme alignment 3.DirecTL-p training 4.Re-ranking results

18 Hybrid models Dong Yang, Paul Dixon, Yi-Cheng Pan, Tasuku Oonishi, Masanobu Nakamura and Sadaoki Furui of Computer Science Department in Tokyo Institute of Technology have combined the 2-step CRF model with a joint source channel model for Machine Transliteration

19 References Report of NEWS 2012 Machine Transliteration Shared Task(2012), Min Zhang, haizhou Li, A Kumaran and Ming Lui. ACL 2012 A comparison of Different Machine Transliteration Models (2006), A Comparison of Different Machine Transliteration Models Improving back-transliteration by combining information sources. (2004). Bilac S., & Tanaka, H. In Proceedings of IJCNLP2004, pp. 542–547 An English-Korean transliteration model using pronunciation and contextual rules. (2002). Oh, J. H., & Choi, K. S. In Proceedings of COLING2002, pp. 758–764 Everybody loves a rich cousin: An empirical study of transliteration through bridge languages. (2010). Mitesh M. Khapra, A Kumaran, Pushpak Bhattacharyya

20 References Forward-Backward Machine Transliteration between English and Chinese based on Combined CRFs. (2011). Ying Qin, Guohua Chen. Nov’ 12 FWBW Hindi to English Machine transliteration model of named entities using CRFs(2012). Manikrao, Shantanu and Tushar. International Journal of Computer Applications(0975-8887) on June 2012 English-korean named entity transliteration using substring alignment and re- ranking methods. (2012). Chun-Kai Wu, Yu-Chun Wang, and Richard TzongHan Tsai. In Proc. Named Entities Workshop at ACL 2012 Combining a two-step CRF and a joint source channel model for machine transliteration. (2009). D Yang, P Dixon, YC Pan, R Oonishi, M Nakamura in NEWS ‘09 proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration


Download ppt "Machine Transliteration Bhargava Reddy 110050078 B.Tech 4 th year UG."

Similar presentations


Ads by Google