Download presentation
Presentation is loading. Please wait.
Published byArely Sudbury Modified over 9 years ago
1
Agreement matters: Challenges of translating into a MRL Yoav Goldberg Ben Gurion University
2
Why should we care about syntactic modeling of MRLs?
3
A brief summary
4
What Kevin said.
5
Example: English Hebrew I wash the car
6
Example: English Hebrew אני רוחץ את המכונית I wash the car (I wash the car)
7
Example: English Hebrew I wash the floor אני רוחץ את המכונית I wash the car (I wash the car)
8
Example: English Hebrew I wash the floor אני רוחץ את המכונית I wash the car (I wash the car) אני שוטפת את הרצפה (I wash the floor)
9
Example: English Hebrew I wash the floor אני רוחץ את המכוניתאני שוטפת את הרצפה I wash the car (I wash the floor) (I wash the car)
10
Example: English Hebrew I wash the floor אני רוחץ את המכוניתאני שוטפת את הרצפה I wash the car (I wash the floor) (I wash the car) FeminineMasculine
11
Hebrew Verbs are morphologically marked for Gender (and Number, and Person, and Tense..) Hebrew Fact 1
12
Example: English Hebrew I wash the floor אני רוחץ את המכוניתאני שוטפת את הרצפה I wash the car (I wash the floor) (I wash the car) FeminineMasculine Are these bad translations?
13
Example: English Hebrew I wash the floor אני רוחץ את המכוניתאני שוטפת את הרצפה I wash the car (I wash the floor) (I wash the car) FeminineMasculine Are these bad translations? No. These are actually quite good. No gender information in source. Target must indicate gender. translator uses world knowledge.
14
Let’s have some fun
15
Language Models as Social Indicators I love her I love him
16
Language Models as Social Indicators I love her I love him אני אוהב אותה אני אוהבת אותו
17
Language Models as Social Indicators I love her I love him I love meat I love vegetables אני אוהב אותה אני אוהבת אותו
18
Language Models as Social Indicators I love her I love him I love meat I love vegetables אני אוהב אותה אני אוהבת אותו אני אוהב בשר אני אוהבת ירקות
19
Language Models as Social Indicators I love her I love him I love meat I love vegetables I love to eat I love to cook אני אוהב אותה אני אוהבת אותו אני אוהב בשר אני אוהבת ירקות
20
Language Models as Social Indicators I love her I love him I love meat I love vegetables I love to eat I love to cook אני אוהב אותה אני אוהבת אותו אני אוהב בשר אני אוהבת ירקות אני אוהב לאכול אני אוהבת לבשל
21
Language Models as Social Indicators I love her I love him I love meat I love vegetables I love to eat I love to cook I love hash I love marijuana אני אוהב אותה אני אוהבת אותו אני אוהב בשר אני אוהבת ירקות אני אוהב לאכול אני אוהבת לבשל
22
Language Models as Social Indicators I love her I love him I love meat I love vegetables I love to eat I love to cook I love hash I love marijuana אני אוהב אותה אני אוהבת אותו אני אוהב בשר אני אוהבת ירקות אני אוהב לאכול אני אוהבת לבשל אני אוהב חשיש אני אוהבת מריחואנה
23
Language Models as Social Indicators I hate him. אני שונאת אותו.
24
Language Models as Social Indicators I hate him. I hate her. אני שונאת אותו.
25
Language Models as Social Indicators I hate him. I hate her. אני שונאת אותו. אני שונאת אותה.
26
Language Models as Social Indicators I hate him. I hate her. I hate him אני שונאת אותו. אני שונאת אותה. אני שונא אותו
27
Language Models as Social Indicators I hate him. I hate her. I hate him אני שונאת אותו. אני שונאת אותה. אני שונא אותו Really? A dot?! Not very stable…
28
Language Models as Social Indicators I hate him. I hate her. I hate him I hate her אני שונאת אותו. אני שונאת אותה. אני שונא אותו אני שונאת אותה Really? A dot?! Not very stable…
29
Language Models as Social Indicators I hate him. I hate her. I hate him I hate her אני שונאת אותו. אני שונאת אותה. אני שונא אותו אני שונאת אותה Really? A dot?! Not very stable… Hmm… is there a message here after all?
30
I love I hate Language Models as Social Indicators
31
I love I hate Language Models as Social Indicators אני אוהב אני שונאת
32
Back to Machine Translation
33
One English Many Hebrew love אוהב אוהבת אוהבים אוהבות Need to acquire more knowledge … use larger parallel corpora … use dictionaries … use FSAs to model inflections Let’s assume this is solved
34
One English Many Hebrew love אוהב אוהבת אוהבים אוהבות Need to acquire more knowledge … use larger parallel corpora … use dictionaries … use FSAs to model inflections Let’s assume this is solved
35
One English Many Hebrew love אוהב אוהבת אוהבים אוהבות Need to acquire more knowledge … use larger parallel corpora … use dictionaries … use FSAs to model inflections Let’s assume this is solved
36
One English Many Hebrew love אוהב אוהבת אוהבים אוהבות Need to acquire more knowledge … use larger parallel corpora … use dictionaries … use FSAs to model inflections Let’s assume this is solved
37
One English Many Hebrew love אוהב אוהבת אוהבים אוהבות Need to acquire more knowledge … use larger parallel corpora … use dictionaries … use FSAs to model inflections Let’s assume this is solved
38
One English Many Hebrew love אוהב אוהבת אוהבים אוהבות Need to acquire more knowledge … use larger parallel corpora … use dictionaries … use FSAs to model inflections Let’s assume this is solved
39
One English Many Hebrew Hebrew English: easy אוהבת אני I love אוהב אני I love אוהבים אנו We love אוהבות אנו We love English Hebrew: hard I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות Don’t worry about it. Just say “love”. The reader will decide. Which form to choose? Translator must decide. HOW??
40
One English Many Hebrew Hebrew English: easy אוהבת אני I love אוהב אני I love אוהבים אנו We love אוהבות אנו We love English Hebrew: hard I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות Don’t worry about it. Just say “love”. The reader will decide. Which form to choose? Translator must decide. HOW??
41
One English Many Hebrew Hebrew English: easy אוהבת אני I love אוהב אני I love אוהבים אנו We love אוהבות אנו We love English Hebrew: hard I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות Don’t worry about it. Just say “love”. The reader will decide. Which form to choose? Translator must decide. HOW??
42
One English Many Hebrew Hebrew English: easy אוהבת אני I love אוהב אני I love אוהבים אנו We love אוהבות אנו We love English Hebrew: hard I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות Don’t worry about it. Just say “love”. The reader will decide.
43
One English Many Hebrew Hebrew English: easy אוהבת אני I love אוהב אני I love אוהבים אנו We love אוהבות אנו We love English Hebrew: hard I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות Don’t worry about it. Just say “love”. The reader will decide. Which form to choose? Translator must decide. HOW??
44
When translating into an MRL: Many possible word forms – Hard to acquire [but assume its solved] Need to choose correct inflection
45
One English Many Hebrew Hebrew English: easy אוהבת אני I love אוהב אני I love אוהבים אנו We love אוהבות אנו We love English Hebrew: hard I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות Don’t worry about it. Just say “love”. The reader will decide. Which form to choose? Translator must decide. HOW??
46
One English Many Hebrew Hebrew English: easy אוהבת אני I love אוהב אני I love אוהבים אנו We love אוהבות אנו We love English Hebrew: hard I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות Don’t worry about it. Just say “love”. The reader will decide. Which form to choose? Translator must decide. HOW?? Just choose one at random? In the worst case we’ll insult someone..
47
Hebrew Verbs agree with Subject on gender and number Hebrew Fact 2
48
I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות Agreement dictates form
49
I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות singular Agreement dictates form
50
I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות singular plural Agreement dictates form
51
I love? אני אוהב ? אני אוהבת ? אני אוהבים ? אני אוהבות singular plural ✔ ✔ ✘ ✘ Agreement dictates form
52
The girls love? הבחורות אוהב ? הבחורות אוהבת ? הבחורות אוהבים ? הבחורות אוהבות plural / fem no missing information
53
The girls love? הבחורות אוהב ? הבחורות אוהבת ? הבחורות אוהבים ? הבחורות אוהבות plural / fem no missing information plural / fem sing / masc plural / masc sing / fem plural / fem
54
The girls love? הבחורות אוהב ? הבחורות אוהבת ? הבחורות אוהבים ? הבחורות אוהבות plural / fem ✘ no missing information ✘ ✔ ✘ plural / fem sing / masc plural / masc sing / fem plural / fem
55
When translating into an MRL: Many possible word forms – Hard to acquire [but assume its solved] Need to choose correct inflection Inflection is determined based on information which is external to the word
56
Back to Google Translate The boy washes the car The girl washes the car
57
Back to Google Translate The boy washes the car הנער רוחץ את המכונית The girl washes the car הילדה רוחצת את המכונית
58
Back to Google Translate The boy washes the car הנער רוחץ את המכונית The girl washes the car הילדה רוחצת את המכונית ✔
59
Back to Google Translate The boy washes the car הנער רוחץ את המכונית The girl washes the car הילדה רוחצת את המכונית Good job Franz! ✔
60
Back to Google Translate The boy washes the car הנער רוחץ את המכונית The girl washes the car הילדה רוחצת את המכונית Good job Franz?
61
Back to Google Translate The boy washes the car הנער רוחץ את המכונית The girl washes the car הילדה רוחצת את המכונית Good job Franz? childyoung-man
62
Back to Google Translate The boy with the sunglasses washes the floor הנער עם משקפי השמש שוטפת את הרצפה Good job Franz? The girl with the sunglasses washes the car הבחורה עם משקפי השמש שוטף את המכונית chick
63
Back to Google Translate The boy with the sunglasses washes the floor הנער עם משקפי השמש שוטפת את הרצפה Good job Franz? The girl with the sunglasses washes the car הבחורה עם משקפי השמש שוטף את המכונית
64
Back to Google Translate The boy with the sunglasses washes the floor הנער עם משקפי השמש שוטפת את הרצפה Good job Franz? The girl with the sunglasses washes the car הבחורה עם משקפי השמש שוטף את המכונית ✘ ✘
65
Back to Google Translate The boy with the sunglasses washes the floor הנער עם משקפי השמש שוטפת את הרצפה Good job Franz? The girl with the sunglasses washes the car הבחורה עם משקפי השמש שוטף את המכונית young-man chick ✘ ✘
66
What happened? Long distance agreement Can’t be represented in phrase-table Can’t be represented in n-gram LM Local “semantic” information from LM/Phrase Bad translation (ungrammatical)
67
What happened? Long distance agreement Can’t be represented in phrase-table Can’t be represented in n-gram LM Local “semantic” information from LM/Phrase Bad translation (ungrammatical) It’s not Franz’s fault, but the system’s
68
When translating into an MRL: Many possible word forms – Hard to acquire [but I assume its solved] Need to choose correct inflection Inflection is determined based on information which is external to the word and frequently far away from it
69
When translating into an MRL: Many possible word forms – Hard to acquire [but I assume its solved] Need to choose correct inflection Inflection is determined based on information which is external to the word and frequently far away from it Distance from Verb to Subject in the Hebrew Dependency Treebank (news domain) S-V Dep-LengthCountPercent 1321842% 2150419% 391412% 44055% 52974% >5132217%
70
When translating into an MRL: Many possible word forms – Hard to acquire [but I assume its solved] Need to choose correct inflection Inflection is determined based on information which is external to the word and frequently far away from it Distance from Verb to Subject in the Hebrew Dependency Treebank (news domain) S-V Dep-LengthCountPercent 1321842% 2150419% 391412% 44055% 52974% >5132217% 2 words apart is already though for reliably estimating in an n-gram based system!
71
When translating into an MRL: Many possible word forms – Hard to acquire [but I assume its solved] Need to choose correct inflection Inflection is determined based on information which is external to the word and frequently far away from it
72
When translating into an MRL: Many possible word forms – Hard to acquire [but I assume its solved] Need to choose correct inflection Inflection is determined based on information which is external to the word and frequently far away from it Phrase based + N-gram LM can’t do it
73
What if both languages are MRLs? Gender/number marked on both sides No need for word-external information We can translate word word again MRL MRL is easy!
74
What if both languages are MRLs? Gender/number marked on both sides No need for word-external information We can translate word word again MRL MRL is easy! Wrong!
75
What if both languages are MRLs? Gender/number marked on both sides But: – agreement patterns differ between languages – gender information differ between languages
76
What if both languages are MRLs? Example: – Spanish and Hebrew have adjective-noun agreement
77
What if both languages are MRLs? Example: – Spanish and Hebrew have adjective-noun agreement – new shirt חולצה חדשה nueva camisa – new car מכונית חדשה nuevo automovil
78
What if both languages are MRLs? Example: – Spanish and Hebrew have adjective-noun agreement – new shirt חולצה חדשה nueva camisa – new car מכונית חדשה nuevo automovil
79
– new computer מחשב חדש nueva computadora What if both languages are MRLs? Example: – Spanish and Hebrew have adjective-noun agreement – new shirt חולצה חדשה nueva camisa – new car מכונית חדשה nuevo automovil
80
– new computer מחשב חדש nueva computadora What if both languages are MRLs? Example: – Spanish and Hebrew have adjective-noun agreement – new shirt חולצה חדשה nueva camisa – new car מכונית חדשה nuevo automovil חדש nuevo חדשה nueva
81
– new computer מחשב חדש nueva computadora What if both languages are MRLs? Example: – Spanish and Hebrew have adjective-noun agreement – new shirt חולצה חדשה nueva camisa – new car מכונית חדשה nuevo automovil חדש nuevo חדשה nueva - Many-to-many mapping - Correct form still depends on external information - More chances for error - Acquiring all the pairs from parallel corpora is harder
82
Must consider (at least) syntax Phrase-tables and n-grams still can’t do it
83
When Translating into an MRL: MT systems must be aware of gender/number Should have a notion of agreement Use syntax to enforce agreement
84
Source-side Syntax washes floor the The boy with the sunglasses subjobj
85
Source-side Syntax washes floor the The boy with the sunglasses subjobj נער ילד בחור masc/sing
86
Source-side Syntax washes floor the The boy with the sunglasses subjobj נער ילד בחור שוטף שוטפת שוטפים שוטפות masc/sing fem /sing masc/plural fem /plural
87
Source-side Syntax washes floor the The boy with the sunglasses subjobj נער ילד בחור שוטף שוטפת שוטפים שוטפות masc/sing fem /sing masc/plural fem /plural Agree!
88
Source-side Syntax washes floor the The boy with the sunglasses subjobj נער ילד בחור שוטף שוטפת שוטפים שוטפות masc/sing fem /sing masc/plural fem /plural Agree!
89
Source-side Syntax washes floor the The boy with the sunglasses subjobj נער ילד בחור שוטף שוטפת שוטפים שוטפות masc/sing fem /sing masc/plural fem /plural Problems: - How to obtain gender/number information? - How to decode efficiently? - Agreement behavior is not always that simple
90
Target-side Syntax xLNT transducers can model agreement
91
Target-side Syntax NN masc/sing ( ילד ) boy VB masc/sing ( שוטף ) washes VB fem/sing ( שוטפת ) washes NP masc/sing (x0:DT x1:NN masc/sing x2:PP) x0 x1 x2 VP masc/sing (x0:VB masc/sing x1:NP) x0 x1 S masc/sing (x0:NP masc/sing x1:VP masc/sing ) x0 x1 xLNT transducers can model agreement
92
Target-side Syntax NN masc/sing ( ילד ) boy VB masc/sing ( שוטף ) washes VB fem/sing ( שוטפת ) washes NP masc/sing (x0:DT x1:NN masc/sing x2:PP) x0 x1 x2 VP masc/sing (x0:VB masc/sing x1:NP) x0 x1 S masc/sing (x0:NP masc/sing x1:VP masc/sing ) x0 x1 Gender and number information encoded in the lexical rules xLNT transducers can model agreement
93
Target-side Syntax NN masc/sing ( ילד ) boy VB masc/sing ( שוטף ) washes VB fem/sing ( שוטפת ) washes NP masc/sing (x0:DT x1:NN masc/sing x2:PP) x0 x1 x2 VP masc/sing (x0:VB masc/sing x1:NP) x0 x1 S masc/sing (x0:NP masc/sing x1:VP masc/sing ) x0 x1 Gender and number information encoded in the lexical rules Agreement information encoded in the grammar xLNT transducers can model agreement
94
Target-side Syntax NN masc/sing ( ילד ) boy VB masc/sing ( שוטף ) washes VB fem/sing ( שוטפת ) washes NP masc/sing (x0:DT x1:NN masc/sing x2:PP) x0 x1 x2 VP masc/sing (x0:VB masc/sing x1:NP) x0 x1 S masc/sing (x0:NP masc/sing x1:VP masc/sing ) x0 x1 Gender and number information encoded in the lexical rules Agreement information encoded in the grammar xLNT transducers can model agreement Problems: - How to obtain gender/number information? - Grammar is going to be huge (can we make it smaller?) - How are we going to obtain the grammar? efficiently encoding morphological processes in a treebank grammar an open research question
95
On The Parsing Side of Things Most work on parsing MRLs: – consider morphology to be a lexicon-level issue Many inflections high OOV rate – Ignoring morphology at syntax-level – PCFGLA works frustratingly well Recently: – Smarter modeling of morphology at syntax level – Using morphological agreement to improve parsing Modest benefits to parsing accuracy PCFGLA still better
96
On The Parsing Side of Things Most work on parsing MRLs: – consider morphology to be a lexicon-level issue Many inflections high OOV rate – Ignoring morphology at syntax-level – PCFGLA works frustratingly well Recently: – Smarter modeling of morphology at syntax level – Using morphological agreement to improve parsing Modest benefits to parsing accuracy PCFGLA still better
97
On The Parsing Side of Things Most work on parsing MRLs: – consider morphology to be a lexicon-level issue Many inflections high OOV rate – Ignoring morphology at syntax-level – PCFGLA works frustratingly well Recently: – Smarter modeling of morphology at syntax level – Using morphological agreement to improve parsing Modest benefits to parsing accuracy PCFGLA still better
98
On The Parsing Side of Things Most work on parsing MRLs: – consider morphology to be a lexicon-level issue Many inflections high OOV rate – Ignoring morphology at syntax-level – PCFGLA works frustratingly well Recently: – Smarter modeling of morphology at syntax level – Using morphological agreement to improve parsing Modest benefits to parsing accuracy PCFGLA still better
99
On The Parsing Side of Things Most work on parsing MRLs: – consider morphology to be a lexicon-level issue Many inflections high OOV rate – Ignoring morphology at syntax-level – PCFGLA works frustratingly well Recently: – Smarter modeling of morphology at syntax level – Using morphological agreement to improve parsing Modest benefits to parsing accuracy PCFGLA still better
100
On The Parsing Side of Things Most work on parsing MRLs: – consider morphology to be a lexicon-level issue Many inflections high OOV rate – Ignoring morphology at syntax-level – PCFGLA works frustratingly well Recently: – Smarter modeling of morphology at syntax level – Using morphological agreement to improve parsing Modest benefits to parsing accuracy PCFGLA still better
101
On The Parsing Side of Things Most work on parsing MRLs: – consider morphology to be a lexicon-level issue Many inflections high OOV rate – Ignoring morphology at syntax-level – PCFGLA works frustratingly well Recently: – Smarter modeling of morphology at syntax level – Using morphological agreement to improve parsing Modest benefits to parsing accuracy PCFGLA still better PCFGLA is not modeling agreement
102
Rebels without a cause? Syntax-based MT: – Neat! – Only marginally better than phrase-based
103
Rebels without a cause? Syntax-based MT: – Neat! – Only marginally better than phrase-based English grammaticality is relatively easy to capture using local information
104
Rebels without a cause? Syntax-based MT: – Neat! – Only marginally better than phrase-based Syntax-level morphology in parsing: – Neat! – Only marginally better than ignoring it English grammaticality is relatively easy to capture using local information
105
Rebels without a cause? Syntax-based MT: – Neat! – Only marginally better than phrase-based Syntax-level morphology in parsing: – Neat! – Only marginally better than ignoring it English grammaticality is relatively easy to capture using local information I should work harder. Not many agreement mistakes to begin with. Agreement is a generation issue more than a parsing one.
106
Rebels without a cause? Syntax-based MT: – Neat! – Only marginally better than phrase-based Syntax-level morphology in parsing: – Neat! – Only marginally better than ignoring it Both are crucial for translating into MRLs!
107
To Conclude Translating into MRLs brings new challenges Syntax is crucial – If you are not looking into syntax, you should! – If you are looking into syntax – look deeper! Plenty of interesting work to be done
108
To Conclude Translating into MRLs brings new challenges Syntax is crucial – If you are not looking into syntax, you should! – If you are looking into syntax – look deeper! Plenty of interesting work to be done – Finishing up my phd on parsing Looking for a postdoc position for next year
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.