Presentation is loading. Please wait.

Presentation is loading. Please wait.

AVENUE: Machine Translation for Resource-Poor Languages NSF ITR 2001-2005.

Similar presentations


Presentation on theme: "AVENUE: Machine Translation for Resource-Poor Languages NSF ITR 2001-2005."— Presentation transcript:

1 AVENUE: Machine Translation for Resource-Poor Languages NSF ITR 2001-2005

2 Project Members: Automated Rule Learning Faculty –Jaime Carbonell –Ralf Brown –Alon Lavie –Lori Levin Coordinator of Latin American Projects –Rodolfo Vega Graduate Students –Ariadna Font Llitjos –Katharina Probst –Christian Monson –Erik Peterson

3 Resource Poor Languages Not enough linguists to write a human- engineered system. Not enough corpora to build a corpus-based system. No standard orthography. May be spoken by hundreds of thousands of people (Mapudungun, Chile) or by only a few elderly people (Siona, Colombia).

4 AVENUE languages AVENUE is currently working with: –Mapudungun [Chile] –Inupiaq [Alaska] –Aymara, Quechua and Aguaruna [Peru] –Siona [Colombia]

5 Mapudungun for the Mapuche Chile Official Language: Spanish Population: ~15 million ~1/2 million Mapuche people Language: Mapudungun

6 Where can Avenue make a difference for indigenous communities? To contribute to the development of the indigenous people at the local and national level

7 There are two possible ways to do this: A traditional way, from experts on development –Outcome: To translate government policy documents, on health care, law, agriculture, etc. An alternative way, from local experts, grounded in the community’s experience and needs –Outcome: To contribute to language education in the form of literacy and second language acquisition

8 Inter- and multi-cultural bilingual education An educational strategy contributing to the development of the indigenous culture beyond the point of subsistence. Helping each individual and their communities to achieve excellence in a multicultural national and global context. Increasing the use of information and communication technologies, in a life-long learning environment.

9 In exchange for the language data, we agree to contribute in the creation of the following products: Plug-in orthographic corrector for word processors Electronic dictionary Web based translator Intelligent tutor for literacy and second language acquisition

10 Our last meeting in Temuco, May 2002

11 Automatic Learning of a Transfer-based MTS Elicitation corpus SVS algorithm Transfer module tentative Transfer rules Rule Refinement module SL sentences (tentative) TL sentences Kathrin Probst Erik Peterson Ariadna Font Morphology learningMorphological analyzer Christian Monson

12 Morphology Analyzer for Rule Based Machine Translation

13

14 Example and Motivation

15 Results Language: English Corpus:Brown Corpus Set Accuracy:88.3% Example Clusters: –NULL:snavigator, discourse, peptide, … –NULL:’ssmith, china, cook, … –NULL:edslim, reappeared, munch, … –NULL:ingreappear, respond, grunt, … –NULL:lypeaceful, remote, superb, … –…

16 Future Directions More languages –Spanish –Mapudungun More types of morphology –Prefixes –Infixes Employ a human informant –Small amount of knowledge might help a lot

17 AVENUE Transfer Engine Written specifically for automatically learned rules –Integrated with rule learner –Can also be augmented with hand-written rules Currently researching constructions –Constructions are non-compositional structures –Many translation problems associated with constructions

18 Translation Example 总统会辞职吗? presidentwillresignQUEST Transfer English Output: Will the president resign? During translation: Question particle 吗 is deleted Auxiliary “will” is reordered before subject “the” is added before “president”

19 New approach to MT Fully automatic (no human intervention) Very little electronic data available elicitation corpus Machine learning techniques –Seeded version space algorithm to automatically learn transfer rules –Interactive and Automatic refinement of Transfer rules

20 Elicitation Tool

21 Rule Learning – Overview Goal: learn transfer rules for a language pair where one language is resource-rich, the other is resource-poor Learning proceeds in three steps: 1.Flat Seed Generation: “informed guessing” of transfer rules 2.Compositionality: adding structure to rules, using previously learned rules 3.Seeded Version Space Learning: generalizing rules to make them scale to more unseen examples

22 S::S [det adv adj n aux neg v det n]→ [det adv adj n v det n neg vpart] (;;alignments: (x1:y1)(x2::y2)(x3::y3)(x4::y4)(x6::y8)(x7::y5)(x7::y9)(x8::y6)(x9::y7)) ;;constraints: ((x1 def) = *+) ((x4 agr) = *3-sing) ((x5 tense) = *past) …. ((y1 def) = *+) ((y3 case) = *nom) ((y4 agr) = *3-sing) …. ) The highly qualified applicant did not accept the offer. Der äußerst qualifizierte Bewerber nahm das Angebot nicht an. ((1,1),(2,2),(3,3),(4,4),(6,8),(7,5),(7,9),(8,6),(9,7)) Flat Seed Generation - Example

23 S::S [det adv adj n aux neg v det n]→ [det adv adj n v det n neg vpart] (;;alignments: (x1:y1)(x2::y2)(x3::y3)(x4::y4)(x6::y8)(x7::y5)(x7::y9)(x8::y6)(x9::y7) ;;constraints: ((x1 def) = *+) ((x4 agr) = *3-sing) ((x5 tense) = *past) …. ((y1 def) = *+) ((y3 case) = *nom) ((y4 agr) = *3-sing) …. ) S::S [NP aux neg v det n]→ [NP v det n neg vpart] (;;alignments: (x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4) ;;constraints: ((x2 tense) = *past) …. ((y1 def) = *+) ((y1 case) = *nom) …. ) NP::NP [det AJDP n] [det ADJP n] ((x1::y1)… ((y3 agr) = *3-sing) ((x3 agr = *3-sing) ….) Compositionality - Example

24 S::S [NP aux neg v det n]→ [NP v det n neg vpart] (;;alignments: (x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4) ;;constraints: ((x2 tense) = *past) …. ((y1 def) = *+) ((y1 case) = *nom) ((y1 agr) = *3-sing) … ) ((y3 agr) = *3-sing) ((y4 agr) = *3-sing)… ) S::S [NP aux neg v det n]→ [NP v det n neg vpart] (;;alignments: (x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4) ;;constraints: ((x2 tense) = *past) … ((y1 def) = *+) ((y1 case) = *nom) ((y1 agr) = *3-plu) … ((y3 agr) = *3-plu) ((y4 agr) = *3-plu)… ) S::S [NP aux neg v det n]→ [NP n det n neg vpart] ( ;;alignments: (x1::y1)(x3::y5) (x4::y2)(x4::y6) (x5::y3)(x6::y4) ;;constraints: ((x2 tense) = *past) … ((y1 def) = *+) ((y1 case) = *nom) ((y4 agr) = (y3 agr)) … ) Seeded Version Space Learning - Example

25 Remaining Research Issues Improvement of existing algorithms Reversal of translation direction Learning with less information on the resource-poor language Learning from an unstructured corpus

26 Interactive and Automatic rule refinement 1. Given an MTS, translate sentences and present them to the users for minimal correction (interface design, MT error classification) 2. Determine blame assignment 3. Structure learning, as opposed to binary feedback, to automatically refine the existing rules

27 Interactive Learning Translation Correction Tool, web application Bilingual informants (no knowledge of linguistics assumed) User-friendly and Intuitive interface Can naïve users reliably pinpoint the source of errors? MT error classification realistic? Need of user studies: –Spanish - English –English - Spanish –English - Chinese

28

29

30 Structure learning Given user feedback (correction + error classification) and blame assignment, modify the appropriate transfer rule(s) to obtain correct translation Need to evaluate based on cross-validation, number of sentences it can translate correctly (elicitation corpus) Learn mapping between incorrect structures and correct structures: She saw  high woman She saw the tall woman

31 A simple example Spanish SLS: Ella vio a la mujer alta English TLS: She saw high woman Corrected TLS: She saw the tall woman MT error classification: missing determiner + wrong lexical selection Blame assignment (NP rule that generated the direct object + selectional restrictions) Rule refinement: the Noun Phrase (NP) rule that generated the error: NP -> Adj N needs to be refined into 2 different cases: NP -> Det Adj N[sg] (the tall woman) NP -> (Det) Adj N[pl] ((the)? tall women)

32 Refine MT error classification Blame assignment Structure Learning algorithm Expand elicitation corpus with more verb subcategorization patterns Remaining research issues


Download ppt "AVENUE: Machine Translation for Resource-Poor Languages NSF ITR 2001-2005."

Similar presentations


Ads by Google