ABC--- A Phrase-to-Phrase Alignment Method Integrating monolingual and bilingual information in sub sentential phrase alignment Ying Zhang (Joy)

Slides:

Advertisements

Similar presentations

Statistical Machine Translation

Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.

Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.

June 2004 D ARPA TIDES MT Workshop Measuring Confidence Intervals for MT Evaluation Metrics Ying Zhang Stephan Vogel Language Technologies Institute Carnegie.

The current status of Chinese- English EBMT -where are we now Joy (Ying Zhang) Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.

Progress in Chinese EBMT for LingWear Ying Zhang (Joy) Language Technologies Institue Carnegie Mellon University Sep.

Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.

Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.

Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.

The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.

Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.

ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.

Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.

MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan.

1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.

A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.

Comparable Corpora Kashyap Popat( ) Rahul Sharnagat(11305R013)

An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:

English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Direct Translation Approaches: Statistical Machine Translation

Graphical models for part of speech tagging

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.

Coping with Surprise: Multiple CMU MT Approaches Alon Lavie Lori Levin, Jaime Carbonell, Alex Waibel, Stephan Vogel, Ralf Brown, Robert Frederking Language.

Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.

The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.

1 Alignment Entropy as an Automated Predictor of Bitext Fidelity for Statistical Machine Translation Shankar Ananthakrishnan Rohit Prasad Prem Natarajan.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Korea Maritime and Ocean University NLP Jung Tae LEE

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Evaluation Decoding Dynamic Programming.

Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.

LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

(Statistical) Approaches to Word Alignment

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.

Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

CMU MilliRADD Small-MT Report TIDES PI Meeting 2002 The CMU MilliRADD Team: Jaime Carbonell, Lori Levin, Ralf Brown, Stephan Vogel, Alon Lavie, Kathrin.

Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.

Statistical Machine Translation Part II: Word Alignments and EM

Web Data Extraction Based on Partial Tree Alignment

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

N-Gram Model Formulas Word sequences Chain rule of probability

Spectral methods for Global Network Alignment

Statistical Machine Translation Papers from COLING 2004

A Path-based Transfer Model for Machine Translation

Statistical NLP : Lecture 9 Word Sense Disambiguation

Presentation transcript:

ABC--- A Phrase-to-Phrase Alignment Method Integrating monolingual and bilingual information in sub sentential phrase alignment Ying Zhang (Joy) ISL, Carnegie Mellon Univ. July 08, 2002

07/09/2002 Copyright. Joy, 2 Overview The advantage of phrase to phrase alignment Existing methods Algorithm Integrating bilingual information with monolingual information Experiments and results Discussion and future work

07/09/2002 Copyright. Joy, 3 SMT and sub-sentential alignment Statistical Machine Translation (SMT) system is based on the noise channel model Translation Model Language Model

07/09/2002 Copyright. Joy, 4 SMT and sub-sentential alignment (Cont.) Through sub-sentential alignment, we are training the Translation Model (TM) In our system, TM contains word to word, or phrase to phrase transducers. E.g.

07/09/2002 Copyright. Joy, 5 Why phrases? Mismatch between languages

07/09/2002 Copyright. Joy, 6 Why phrases? (Cont.) Phrases encapsulate the context of words – Tense:e.g. Word to word alignment Phrase to phrase alignment

07/09/2002 Copyright. Joy, 7 Why phrases? (Cont.) Local reordering – E.g. Relative clauses in Chinese Which still needs global reordering, which is our future work

07/09/2002 Copyright. Joy, 8 Why phrases? (Cont.) For languages need word segmentation, such as Chinese – The word segmenter can not segment the sentence perfectly, due to the incomplete coverage of word list and segmentation ambiguity – Previous work (Zhang 2001) tries to identify phrases in the corpus using only monolingual information and augment the word list with new phrases found Precision: Hard to decide on phrase boundary Prediction: Phrase identified may not occur in the future testing data

07/09/2002 Copyright. Joy, 9 Why phrases? (Cont.) Example of using phrases to soothe word segmentation failure

07/09/2002 Copyright. Joy, 10 Some alignment algorithms IBM models(Brown 93) HMM alignment: phrase to phrase (Vogel 96) Competitive links: word to word (Melamed 97) Flow network (Gaussier 98) Bitext Map (Melamed 01)

07/09/2002 Copyright. Joy, 11 Algorithm Given a sentence pair (S,T), S= T=, where s i /t j are src/tgt words. Given an m*n matrix B, where B(i,j)= co-occurrence(s i, t j )= N=a+b+c+d; tjtj ~t j sisi ab ~s i cd

07/09/2002 Copyright. Joy, 12 Algorithm (Cont.) Goal: find a partition over matrix B, under the constraint that one src/tgt word can only align to one tgt/src word or one tgt/src phrase (adjacent word sequence) Legal segmentation, imperfect alignmentIllegal segmentation, perfect alignment

07/09/2002 Copyright. Joy, 13 Algorithm (Cont.) While(still has row or column not aligned){ Find cell[i,j], where B(i,j) is the max among all available(not aligned) cells; Expand cell[i,j] with similarity sim_thresh to region[RowStart,RowEnd; ColStart,ColEnd] Mark all the cells in the region as aligned } Output the aligned regions as phrases

07/09/2002 Copyright. Joy, 14 Algorithm (Cont.) Expand cell[i,j] with sim_thresh current aligned region: region[RowStart=i, RowEnd=i; ColStart=j, ColEnd=j] While(still ok to expand){ if all cells[m,n], where m=RowStart-1, ColStart<=n<=ColEnd, B(m,n) is similar to B(i,j) then RowStart = RowStart --; //expand to north if all cells[m,n], where m=RowEnd+1, ColStart<=n<=ColEnd, B(m,n) is similar to B(i,j) then RowStart = RowStart ++; //expand to south … //expand to east … //expand to west } Define similar(x,y)= true, if abs((x-y)/y) < 1-similarity_thresh

07/09/2002 Copyright. Joy, 15 Algorithm (Cont.) Expand to North Expand to South Expand to East Expand to West

07/09/2002 Copyright. Joy, 16 Find the best similarity threshold Simlarity_threshold is critical in this algorithm The algorithm described above used ONE Simlarity_threshold value for all region expansions in the matrix, and the same ONE value for all sentence pairs Ideally, it is better to use different threshold values for each region and find the global best segmentation for one matrix – A search tree, combinational explosion

07/09/2002 Copyright. Joy, 17 Find the best similarity threshold (Cont.) One practical solution: For one matrix B: For(st=0.1;st<=0.9;st+=0.1){ find segmentation of B given similarity_threshold = st; } Select the solution with the highest performance(solution)

07/09/2002 Copyright. Joy, 18 Integrating monolingual information Motivation: – Use more information in the alignment – Easier for aligning phrases – There is much more monolingual data than bilingual data PittsburghLos Angeles Somerset Union townSanta Monica Santa Clarita Corona

07/09/2002 Copyright. Joy, 19 Integrating monolingual information (Cont.) Given a sentence pair (S,T), S= and T=, where s i /t j are src/tgt words. Construct an m*m matrix A, where A(i,j) = collocation(s i, s j ); Only A(i,i-1) and A(i,i+1) have values Construct an n*n matrix C, where C(i,j) = collocation(t i, t j ); Only C(j-1,j) and A(j+1,j) have values Construct an m*n matrix B, where B(i,j)= co-occurrence(s i, t j ).

07/09/2002 Copyright. Joy, 20 Integrating monolingual information (Cont.) Normalization: – Assign self2self value α(s i )  A(i,i), 0<=α(s i )<=1 – Assign self2self value β(tj)  C(j,j), 0<= β(tj)<=1 – Normalize A so that:

07/09/2002 Copyright. Joy, 21 Integrating monolingual information (Cont.) – Normalize C so that: – Normalize B so that:

07/09/2002 Copyright. Joy, 22 Integrating monolingual information (Cont.) Calculating new src-tgt matrix B’ OK. That’s it! Yes, that’s the whole story!

07/09/2002 Copyright. Joy, 23 Example With pure bilingual information: After integration with monolingual information:

07/09/2002 Copyright. Joy, 24 Visualization Left: Using pure bilingual information Right: Integrated with monolingual information

07/09/2002 Copyright. Joy, 25 What Is the Self2self Value? Take a look at: What actually happens is: stands for how much word Si should “make use of” its neighbours’ relation with the target words. For content words, self2self value should be higher, and for function words, it should be lower.

07/09/2002 Copyright. Joy, 26 How To Set the Self2self Values Well, this is tricky Before June evaluation I set α = 0.6 for all src words and β = 0.48 for all tgt words – Not good – “the” should have lower self2self value and “Pittsburgh” should have a higher self2self value

07/09/2002 Copyright. Joy, 27 Calculating Self2self Values Observation: Source language content words tend to align to a few target words with high scores while function words tend to align to many target words with low scores “has”“the” “beijing”“computer”“bus” “in”

07/09/2002 Copyright. Joy, 28 Calculating Self2self Values (Cont.) Calculating the entropy of a word over the distribution of normalized co-occurrence scores – Given word s i, for all the possible co-occurred word t j, their co- occurrence score is C(i,j), – Let – Define Map the score linearly to a value between 0~1 Better map the scores to a range narrower than 0~1. E.g. 0.45~0.85, why?

07/09/2002 Copyright. Joy, 29 A Modification to the Segmentation Algorithm Original algorithm calculates A*B*C only once In the modified version: – Set B[i,j] to 0 for all aligned cells when a new aligned region is found – Re-calculate A*B*C Motivation: – Since we found an aligned region, the boundary of this phrase is known. It should not affect the unaligned neighbors More computationally expensive Experiments showed better performance

07/09/2002 Copyright. Joy, 30 Updating Bilingual Information by Iteration Using EM to update the bilingual co- occurrence scores – Doesn’t help too much

07/09/2002 Copyright. Joy, 31 Results The Dev-test on small data track (3540 sentence pair training data + 10K glossary) NISTBleu Baseline(IBM1+Gloss) Original Algorithm (+5.9%) (+20.0%) Modified Algorithm (+7.0%) (+22.4%) After LM-fillNISTBleu Baseline(IBM1+Gloss)+LM Original Algorithm+LM6.6754(+4.7%)0.1611(+13.7%) Modified Algorithm+LM6.7987(+6.6%)0.1712(+20.8%)

07/09/2002 Copyright. Joy, 32 Results (Cont.) No LM-fillWith LM-fill NISTBleuNISTBleu Baseline(IBM1+Gloss) HMM+IBM1+Gloss ARV+IBM1+Gloss JOY+IBM1+Gloss ARV+JOY+IBM1+Gloss

07/09/2002 Copyright. Joy, 33 Conclusion Simple Efficient – Unlike stochastic bracketing (Wu 95) which is O(m 3 n 3 ), the algorithm of segmenting the matrix is linear O(min(m,n)). The construction of A*B*C is O(m*n); Effective – Improved the translation quality from baseline (NIST=6.0097, Bleu=0.1231) to (NIST=6.4310, Bleu=0.1507) on small data track dev-test

07/09/2002 Copyright. Joy, 34 Future work Find a better segmentation algorithm (dynamic threshold) Find a method which is mathematically more sound for self2self values Investigate the possibility of using trigram or distance bi-gram monolingual information

07/09/2002 Copyright. Joy, 35 References Peter F. Brown, Stephen A. Della Pietra, Vin-cent J. Della Pietra, and Robert L. Mercer The mathematics of statistical machinetranslation: Parameter estimation. Computa-tional Linguistics, 19 (2) : Gaussier, E. (1998) Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora. In Proceedings of COLING-ACL-98, Montreal, pp I. Dan Melamed. "A Word-to-Word Model of Translational Equivalence". In Procs. of the ACL97. pp Madrid Spain, I. Dan Melamed (2001). Empirical Methods for Exploiting Parallel Texts MIT Press. Stephan Vogel, Hermann Ney, and Christoph Till-mann HMM-based word alignment in statistical translation. In COLING '96: The 16th Int. Conf. on Computational Linguistics, pages , Copenhagen, August. Dekai Wu, An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words, ACL, June 1995 Ying Zhang, Ralf D. Brown, Robert E. Frederking and Alon Lavie. "Pre-processing of Bilingual Corpora for Mandarin-English EBMT". MT Summit VIII, Sep

07/09/2002 Copyright. Joy, 36 Acknowledgement I would like to thank Stephan Vogel, Jian Zhang, Jie Yang, Jerry Zhu, Ashish and other people for their valuable advice and suggestions during this work.

07/09/2002 Copyright. Joy, 37 Questions and Comments