Motivations for transfer-based translation lexical ambiguity structural differences See further Ingo 91.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Machine Translation II How MT works Modes of use.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Machine Translation MÖSG vt 2004 Anna Sågvall Hein.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Anna Sågvall Hein, GSLT, January 2003 A grammar rule SVE.GRAM CL.IMP :=: 'CL, :=: 'IMP, = 'VERB, :=: 'VERB, = 'IMP, :=:, :=:, :=:, ADVANCE,
@ Anna Sågvall Hein 2005 An example of a good translation En inbyggd oljepump levererar olja under tryck både till hydraulsystemet och växellådans oljesystem.
Anna Sågvall Hein, GSLT, January 2003 Direct translation no intermediary sentence structure translation proceeds in a number of steps, each step dedicated.
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Translation Anna Sågvall Hein Mösg F
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Chapter3: Language Translation issues
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Creation of a Russian-English Translation Program Karen Shiells.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
1 A Chart Parser for Analyzing Modern Standard Arabic Sentence Eman Othman Computer Science Dept., Institute of Statistical Studies and Research (ISSR),
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
9/8/20151 Natural Language Processing Lecture Notes 1.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Interpreting Dictionary Definitions Dan Tecuci May 2002.
8 November 2003 PP attachment problem1 Prepositional Phrase Attachment Problem 03M05601 Ashish Almeida.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
Introduction to CL & NLP CMSC April 1, 2003.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
CSA2050 Introduction to Computational Linguistics Parsing I.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Supertagging CMSC Natural Language Processing January 31, 2006.
LING 6520: Comparative Topics in Linguistics (from a computational perspective) Martha Palmer Jan 15,
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
Avenue Architecture Learning Module Learned Transfer Rules Lexical Resources Run Time Transfer System Decoder Translation Correction Tool Word- Aligned.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Minority Languages Katharina Probst Language Technologies Institute Carnegie Mellon.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Approaches to Machine Translation
Basic Parsing with Context Free Grammars Chapter 13
Statistical NLP: Lecture 13
Machine Learning in Natural Language Processing
Transfer-based translation
Approaches to Machine Translation
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Motivations for transfer-based translation lexical ambiguity structural differences See further Ingo 91

Example 1 Sv. Fyll på olja i växellådan.  En. Fill gearbox with oil. (from the Scania corpus) fyll på  fill obj  adv adv  obj

Example 2 Sv. I oljefilterhållaren sitter en överströmningsventil.  En. The oil filter retainer has an overflow valve. (from the Scania corpus) sitter  has adv  subj subj  obj

Transfer-based translation intermediary sentence structure basic processes –analysis –transfer –generation (synthesis) language modules –dictionary and grammar of SL –transfer dictionary and transfer rules –dictionary and grammar of TL

SLTL Interlingua Direct translation Transfer Multra Metal

Levels of intermediary structure cf. J&M, Chapter 21 word order

Metal See H&S

MULTRA Multilingual Support for Translation and Writing translation engine transfer-based –shake-and-bake modular unification-based preference machinery trace-able

Analysis chart parser (Lisp  C) –procedural formalism unification and other kinds of operations sentence structure –feature structure –grammatical relations –surface order implicit via grammatical relations See further Sågvall Hein&Starbäck (99),Weijnitz (02), Dahllöf (89)

Transfer unification-based declarative formalism –Multra transfer formalism (Beskow 93) lexical and structural rules rules are partially ordered a more specific rule takes precedence over a less specific one –specificity in terms of number of transfer equations all applicable rules are applied written in prolog

Generation syntactic generation –Multra syntactic generation formalism (Beskow 97a) –PATR-like style unification concatenation typed features morphological generation (Beskow 97b) –lexical insertion rules –morphological realisation and phonological finish in prolog written in prolog

An example: Tippa hytten. Tippa hytten. : (* = (PHR.CAT = CL MODE = IMP SUBJ = 2ND VERB = (WORD.CAT = VERB INFF = IMP DIAT = ACT LEX = TIPPA.VB.1 VSURF = +) OBJ.DIR = (PHR.CAT = NP NUMB = SING GENDER = UTR CASE = BASIC DEF = DEF HEAD = (LEX = HYTT.NN.1 WORD.CAT = NOUN))) REG = (V1.LEM = TIPPA.VB) SEP = (WORD.CAT = SEP LEX = STOP.SR.0)))

Transfer structure [VERB : [WORD.CAT : VERB LEX : TILT.VB.0 DIAT : ACT INFF : IMP] OBJ.DIR : [PHR.CAT : NP DEF : DEF NUMB : SING HEAD : [WORD.CAT : NOUN LEX : CAB.NN.0]] MODE : IMP SUBJ: 2ND VSURF: + SEP : [WORD.CAT : SEP LEX : STOP.SR.0] PHR.CAT : CL]

Generation Tilt the cab.

A grammar rule defrule legal.obj { = 'np, not = 'gen, not = 'subj }

Transfer rules copy feature delete feature transfer feature assign feature

Copy feature LABEL mode SOURCE = ?x1 TARGET = ?x2 TRANSFER

Delete feature LABEL REG SOURCE = ANY TARGET = TRANSFER

Transfer feature LABEL OBJ.DIR SOURCE = ?x1 TARGET = ?x2 TRANSFER ?x1 ?x2

Define feature LABEL trycka.in-press SOURCE =trycka.vb+in.ab.1 =VERB TARGET =press.vb.1 =VERB TRANSFER

A generation rule LABEL CL.IMP X1 ---> X2 X3 X4 : = CL = = IMP =

A contextual lexical rule LABEL tänka.på-think.about SOURCE = tänka.vb.1 = pp = ?prep = på.pp.1 = ?rect1 TARGET = pp = PREP = about.pp.1 = ?rect2 TRANSFER ?rect1 ?rect2

A generation trace 1-Applying Rule cl-sep 1- Applying Rule cl.imp 1- Applying Rule subj2nd-verb-obj.dir 1- Applying Rule verb.main.act 1- Applying Rule np.the-df 1- Applying Rule ng.noun-def 1-Success!

Language resources in the MATS system dictionary in a database with different views analysis grammar transfer grammar –incl. contextually defined lexical rules generation grammar

sv-en_LinkLexicon

en-Inflections

en_LemmaLexicon

en_LexemeLexicon

en_Lexicon

en_StemLexicon

sv_Inflections

sv_LemmaLexicon

sv_LexemeLexicon

sv_Lexicon

sv_StemLexicon

The MATS system Frozen demo…

Assignment 2: Working with MATS ml

Lexicalistic translation Identify (lexical) translation units in the source sentence Translate each unit separately (considering the context) Order the result in agreement with a model of the target language Formulation due to Lars Ahrenberg; see further AH (reading list) ; see also Beaven, L. John, Shake-and- Bake Machine Translation. Coling –92, Nantes, Aout 1992.

T4F – a lexicalistic system processes in T4F –tokenisation –tagging –transfer –transposition –filtering See further AH (in the reading list)

Interlingua translation See SN

Applications of alignment translation memories translation dictionaries lexicalistic translation statistical machine translation example-based translation

Translation memories based on sentence links optionally, sub sentence links See further Macklovitch, E. (2000)

Translation dictionaries based on word links refinement of word links

Refinement of word alignment data neutralise capital letters where appropriate lemmatise or tag source and target units identify ambiguities –search for criteria to resolve them identify partial links –compounds? –remove or complete them manual revision?

Informally about statistical MT build a translation dictionary based on word alignment aim for as big fragments as possible keep information on link frequency build an n-gram model of the target language implement a direct translation strategy –including alternatives ordered by length and frequency process the output by the n-gram model filtering out the best alternatives and adjust the translation accordingly

Example-based MT HS (in the reading list)

Some current research topics intersentential dependences hybrid systems: data-driven and rule-driven improved alignment techniques improved language modeling in ST automatic learning from post-editing translation by structural correspondences translation of spoken language improved preference strategies ambiguity preserving translation

Intersentential dependencies pronoun resolution lexical ambiguity resolution, such as –(torkar)motorn the motor –(förbrännings)motornthe engine fluency

Preserving the information structure information structure is expressed in different ways in the source and the target syntactic clues are exploited in the analysis to compute the information structure (topic-focus articulation) information structure is used to guide the generation

An example Torkarmotorn M2 är sammankopplad med omkopplare S24 och intervallrelä R22. För att inte motorn skall överbelastas, t.ex. om torkarbladen fastnat, finns en inbyggd termovakt som bryter strömmen till motorn när … Wiper motor M2 is connected to switch S24 and intermittent relay R22. To prevent motor overload, e.g. if the wiper blade gets stuck, there is an integral thermal sensor which breaks the current to the motor when …

Preferences syntactic preferences –the principle of right association –the principle of minimal attachment –two-stage processing semantic preferences –lexical selectional restrictions –lexical contextual rules –conceptual taxonomies –likelihood of occurrence See further Bennet, P. & Paggio, P., 1993, Preference in Eurotra.

Preferences in Multra parsing –a formalism for expressing syntactic preferences in the parse not fully developed transfer –contextual lexical rules –rule specificity generation –rule specificity

Hybrid systems aims components problems architecture scores

Aims of a hybrid system simple techniques for simple tasks complex techniques for complex tasks

Components of a hybrid systems component strategies –translation memory full sentences fragments direct translation –statistical translation –ebmt

Component strategies, cont’d rule-based translation –simplistic analysis (cf. direct translation) word by word (S  sequence of words) phrase by phrase (S  sequence of phrases) –partial parsing –full parsing

Problems of a hybrid system how does the system know when a simple technique is appropriate? –does the source tell? –does the target tell?

Architecture and scores simple first? concerting results? scoring?

Improved techniques for re-use of translation combining clues for word alignment (Tiedemann 2003) interactive word alignment (Ahrenberg et al. 2003) parallel treebanks

Translation by structural correspondences LFG HPSG

Translation of spoken language See Krauver, Steven (ed.), 2000, Machine Translation, June Volume 15, Issue 1-2, Special issue on Spoken Language Translation.