Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Automatic Histogram Threshold Using Fuzzy Measures 呂惠琪.
Probabilistic Detection of Context-Sensitive Spelling Errors Johnny Bigert Royal Institute of Technology, Sweden
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Orange: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin & Franz Josef Och (presented by Bilmes) or Orange: a.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Evaluating Hypotheses
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Mining and Summarizing Customer Reviews
Image Annotation and Feature Extraction
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
A New Approach for Cross- Language Plagiarism Analysis Rafael Corezola Pereira, Viviane P. Moreira, and Renata Galante Universidade Federal do Rio Grande.
Machine translation Context-based approach Lucia Otoyo.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Concept Unification of Terms in Different Languages for IR Qing Li, Sung-Hyon Myaeng (1), Yun Jin (2),Bo-yeong Kang (3) (1) Information & Communications.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
A Language Independent Method for Question Classification COLING 2004.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Presenter: Shanshan Lu 03/04/2010
Korea Maritime and Ocean University NLP Jung Tae LEE
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
Cache-based Document-level Statistical Machine Translation Prepared for I 2 R Reading Group Gongzhengxian 10 OCT 2011.
Yuya Akita , Tatsuya Kawahara
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Mutual bilingual terminology extraction Le An Ha*, Gabriela Fernandez**, Ruslan Mitkov*, Gloria Corpas*** * University of Wolverhampton ** Universidad.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
CpSc 810: Machine Learning Analytical learning. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Language Identification and Part-of-Speech Tagging
Neural Machine Translation
Statistical Machine Translation Part II: Word Alignments and EM
Information Retrieval
Presentation transcript:

Multi-Layer Filtering algorithm Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei Hou LING 575 MT WIN07

Multi-Layer Filtering algorithm 2 What is the “Chunk” here ? I n this paper: The “Chunk” doesn’t rely on the information from tagging, parsing, syntax analyzing or segmenting A “Chunk” is a continuous words order

Multi-Layer Filtering algorithm 3 Why do we use “Chunk” in translations? C an leads to more fluent translations since chunk-based translations capture local reordering phenomena. C an successfully makes long sentences shorter, which benefits SMT algorithm’s performance. O btains accurate one-to-one alignment of each pair bilingual chunks. G reatly decrease search space and time complexity during translation.

Multi-Layer Filtering algorithm 4 What about other approaches? What about word-based translations?

Multi-Layer Filtering algorithm 5 Some background SMT systems employ word-based alignment models based on the five word-based statistical models proposed by IBM. P roblem: Still suffer from poor performance when used in the language pairs which have great differences in structures since these models fundamentally rely on word-level translation.

Multi-Layer Filtering algorithm 6 Some background A lignment algorithms based on phrases, chunks or structures and most of them based on complex syntax information. P roblem: Have proven to yield poor performance when dealing with long sentences; Heavily depend on the performance of associated tools such as parsers, POS taggers....

Multi-Layer Filtering algorithm 7 How do we get improvements from those problems by using chunk-based translations?

Multi-Layer Filtering algorithm 8 T o discover one-to-one pairs of bilingual chunks in the untagged well-formed bilingual sentence pairs M ulti-Layers are used to extract bilingual chunks according to different features of chunks in the bilingual corpus.

Multi-Layer Filtering algorithm 9 Summarization of Procedures F iltering the most frequent chunks C lustering the similar words and filtering the most frequent structures D eal with the remnant fragment K eeping one-to-one alignment

Multi-Layer Filtering algorithm 10 Filtering the most frequent chunks -- Step 1 A ssumption: The most co-occurrent word lists might be a potential chunk. A pply the formula-1 list below, we filter those word lists as initial monolingual chunks; formula-1 formula-2

Multi-Layer Filtering algorithm 11 The result of Filtering Step 1 What || kind || of || room || do || you || want || to || reserve 你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间 A n example :

Multi-Layer Filtering algorithm 12 Filtering the most frequent chunks -- Step 2 N ow we have : All the cohesion degrees between any two adjacent words in Source and Target sentences. A pplying the formula-3 list below, we will find the entire set of initial monolingual chunks; formula-3

Multi-Layer Filtering algorithm 13 The result of Filtering Step 2-1 What || kind || of || room || do || you || want || to || reserve 你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间 I n this case: n = int{ 10/4 } = 2;

Multi-Layer Filtering algorithm 14 The result of Filtering Step 2-(1)-EN Initial Chunks DkDkDkDk Dk*Dk*Dk*Dk* DkDkDkDk Dk*Dk*Dk*Dk* What kind1.36 You want0.61 What kind of You want to Kind of1.31 You want to reserve Do you10.07 Want to2.11 Do you want Want to reserve Do you want to To reserve0.077 N ow we get a table of the initial monolingual chunks ; formula-4

Multi-Layer Filtering algorithm 15 The result of Filtering Step 2-(2)-EN Initial Chunks DkDkDkDk Dk*Dk*Dk*Dk* DkDkDkDk Dk*Dk*Dk*Dk* What kind1.36 What kind of Kind of1.31 Do you10.07 Want to2.11 S et threshold D k *> 1.0, we get : W e still need more steps to do maximum matching and overlap discarding;

Multi-Layer Filtering algorithm 16 The result of Filtering Step 2-(3)-EN Initial Chunks DkDkDkDk Dk*Dk*Dk*Dk* DkDkDkDk Dk*Dk*Dk*Dk* What kind1.36 Want to2.11 What kind of Kind of1.31 Do you10.07 A ccording to the maximum matching principle and Preventing overlapping problem, we need to apply : formula-4:formula-5:

Multi-Layer Filtering algorithm 17 The result of Filtering Step 2-(4)-EN D eal with the remnant fragment: we simply combine such individual or sequential words as a chunk. S o we get a much shorter sentence lists below: What & kind & of || room || do & you || want & to || reserve

Multi-Layer Filtering algorithm 18 The result of Filtering Step 2-(1)-CN What || kind || of || room || do || you || want || to || reserve 你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间 I n this case: n = int{ 10/4 } = 2;

Multi-Layer Filtering algorithm 19 The result of Filtering Step 2-(2)-CN Initial Chunks DkDkDkDk Dk*Dk*Dk*Dk* DkDkDkDk Dk*Dk*Dk*Dk* 你想 0.69 么样的房 预定 2.39 样的 0.30 什么 7.80 样的房 什么样 样的房间 什么样的 的房 1.27 么样 0.87 的房间 么样的 房间 4.52 N ow we get a table of the initial monolingual chunks ; formula-4

Multi-Layer Filtering algorithm 20 The result of Filtering Step 2-(3)-CN S et threshold D k *> 1.0, we get : W e still need more steps to do maximum matching and overlap discarding; Initial Chunks DkDkDkDk Dk*Dk*Dk*Dk* DkDkDkDk Dk*Dk*Dk*Dk* 预定 2.39 什么 7.80 什么样的 的房 1.27 的房间 房间 4.52

Multi-Layer Filtering algorithm 21 The result of Filtering Step 2-(4)-CN Initial Chunks DkDkDkDk Dk*Dk*Dk*Dk* DkDkDkDk Dk*Dk*Dk*Dk* 预定 2.39 的房 1.27 什么 7.80 的房间 什么样的 房间 4.52 A ccording to the maximum matching principle : 的 By applying formula-4: max( D 什么样的 / D 什么样, D 的房间 / D 房间 ) = max(2.44,1.30) = 2.44 ?

Multi-Layer Filtering algorithm 22 The result of Filtering Step 2-(5)-CN D eal with the remnant fragment: we simply combine such individual or sequential words as a chunk. S o we get a much shorter sentence lists below: 你 || 想 || 预 & 定 || 什 & 么 & 样 & 的 || 房 & 间

Multi-Layer Filtering algorithm 23 Some problems After fisrt filtering process, suppose we found an aligned chunk pairs: || 在 & 五 & 点 || || at & five & o’clock || But some potentially good chunks like: Might have been broken into several fragments like: Since this structure include word sequences with low frequency of occurrence (we suppose “six” is lower frequent than “five” here ) || at & six & o’clock || || at || six || o’clock ||

Multi-Layer Filtering algorithm 24 Clustering the similar words and filtering the most frequent structures M any frequent chunks have similar structures but different in detail. W e can cluster similar words according to the position vectors of their behavior relative to anchor words. F or all of the words in the same class, we suppose they are good chunks, then filter the most frequent structures according the method introduced before.

Multi-Layer Filtering algorithm 25 Clustering the similar words and filtering the most frequent structures – Step 1 I n the corpus resulting from the first filtering process, find the most frequent words as anchor words, for example: Rank Wordtheatothisforinonofatroom W hy we use most frequent words? As the anchor words are the most common words, a great deal of information can be obtained. Words in similar position vectors in relation to anchor words can be assumed to belong to similar word classes.

Multi-Layer Filtering algorithm 26 Clustering the similar words and filtering the most frequent structures – Step 2 B uild words vectors and define the size of the window for observation. (in this case windows size = 5) For instance, we build a word vector which anchor word is “in” and we observe a candidate word “the” to be clustered falls within the window: Size5 Positionw-2w-1ww+1w+2 Wordthe inthe Value Formula-7,8:

Multi-Layer Filtering algorithm 27 Clustering the similar words and filtering the most frequent structures – Step 3 I n order to compare vectors fairly, these vectors must be normalized by formular-9 as follows: E xample : “in/that” and “in/this”

Multi-Layer Filtering algorithm 28 Clustering the similar words and filtering the most frequent structures – Step 4 M easure the similarities of various vectors and cluster the words which have similar distributions relative to the anchor words: Euclidian distance: E xample result: Word classis Anchor words Single double twin standard suite different quiet(a, room) the my your this that our(in, room) America all fact Japan English(in, )

Multi-Layer Filtering algorithm 29 Clustering the similar words and filtering the most frequent structures – Step 5 F or all of the words in the same class, replace with a particular symbol, and then consider this symbol as an ordinary word. Then filter the most frequent structures my Multi-Layer Filtering algorithm again. For instance, if we have: || 在 & 五 & 点 || || at & five & o’clock || parallel word classes: & { One, two,…, five..., twelve } We will get : { 一, 二, …, 五..., 十二 } || 在 & 一 & 点 || || at & one & o’clock || || 在 & 两 & 点 || || at & two & o’clock ||...

Multi-Layer Filtering algorithm 30 Keeping one-to-one alignment Next step:

Multi-Layer Filtering algorithm 31 Keeping one-to-one alignment N ow we have a pair of new parallel sentences with chunks: 你 || 想 || 预 & 定 || 什 & 么 & 样 & 的 || 房 & 间 What & kind & of || room || do & you || want & to || reserve the chunks to be aligned may occur almost equally in the corresponding parallel texts. O ur purpose is to find one-to-one chunk alignment on the assumption that the chunks to be aligned may occur almost equally in the corresponding parallel texts.

Multi-Layer Filtering algorithm 32 Keeping one-to-one alignment B y applying the formular-11, we can get a alignment table: formular-11:θ你想预定什么样的房间 What kind of Room Do you Want to reserve

Multi-Layer Filtering algorithm 33 Experiments T raining data: 55,000 pairs of Chinese-English spoken parallel sentences T est data: 400 pairs of Chinese-English spoken parallel sentences were chosen randomly from the same corpus. These 400 pairs sentences manually partitioned to obtain monolingual chunks and then manually aligned the corresponding bilingual chunks for computing the chunking and alignment accuracy.

Multi-Layer Filtering algorithm 34 Experiments E valuation: Comparing the automatically obtained monolingual chunks and aligned bilingual chunks to chunks discovered manually, we compute their precision, recall and F-Measure value by the followed formula:

Multi-Layer Filtering algorithm 35 Experiments The accuracy of chunking Precision(%)Recall(%)F-Measure R esults: The accuracy of alignment Precision(%)Recall(%)F-Measure

Multi-Layer Filtering algorithm 36 Experiments C omparisions of chunk-based translation to word-based translation: SystemsBLEUNIST Word-based Chunk-based Improvement T he improvement is about 10%.

Multi-Layer Filtering algorithm 37 Conclusions T his chunking and alignment algorithm doesn’t rely on the information from tagging, parsing or syntax analysis, and doesn’t even require sentence segmentation. I t obtains accurate one-to-one alignment of chunks I t greatly decreases search space and time complexity during translation. T he performance is better than baseline word alignment system. (in some tasks)

Multi-Layer Filtering algorithm 38 Problem / Weakness A uthors didn’t say anything. M aybe we can do some improvement at: The step of maximum matching The step of building position vectors