Presentation is loading. Please wait.

Presentation is loading. Please wait.

AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.

Similar presentations


Presentation on theme: "AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University."— Presentation transcript:

1 AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University

2 2 HCI project proposals Interface to Online Bilingual and Multilingual Dictionaries Translation Correction Tool interface: design, implementation and user studies

3 Online Bilingual and Multilingual Dictionaries bilingual and multilingual dictionaries for indigenous languages (Mapudungun [Chile], Inupiaq [Alaska], Aymara, Quechua and Aguaruna [Peru]) For each bilingual/multilingual dictionary, we (will) have an excel database created by the local teams (Mapudungun: from a spoken corpus transcribed and translated into Spanish)

4

5 Online Bilingual and Multilingual Dictionaries (cont.) For each entry, we give the translation in Spanish, some other linguistic information (POS), and a link to the actual sentence where it appears in the corpus. For example: Püñpüñkünuukey: se manifiesta en forma de ronchas nmlch-nmfhp1_x_0031_nmfhp_00 Mapu: Fey itrofillpüle kuerpu, ta pichike püñpüñkünuukey ta kalül may, peñi. Sp: Así es en todas partes del cuerpo, pequeñas ronchas se forman en el cuerpo pues, hermano

6

7 Online Bilingual and Multilingual Dictionaries (cont.) Currently, users can search for: –Mapudungun words –Spanish words –all the words starting with a letter –all the words containing a word or a string of characters

8 Online Bilingual and Multilingual Dictionaries (cont.) Primary users: –people in the indigenous communities –researchers in these countries, inside and outside the indigenous communities Chilean case: -product of the Ministry of Education. -students and teachers, mostly Mapuche, but maybe some Spanish users as well

9 Online Bilingual and Multilingual Dictionaries (cont.) Secondary users –Linguistic, Lexicography and Anthropology researchers from all over the world –random people browsing the www

10 Online Dictionaries: Tasks for HCII project analyze design of the basic web interface given a query for a word in either language, it presents the information for that entry to the user in the other language. how to incorporate an audio file with the word as it was pronounced in the spoken corpus. how to make it interactive, i.e. have bilingual users comment on the entries and possibly add new entries (need profile info)

11 Translation Correction Tool (TCTool) AVENUE is a project which develops Automatic Machine Translation Systems for low-density languages Since translations are automatic, i.e. not perfect, we need to refine them. instead of having a professional translator, we want to find an automatic way to refine the output of the MTS -> TCTool

12 TCTool We can use the TCTool to automatically learn a refinement of the Transfer rules in our MTS, from users input Challenges: –users most likely not familiar with computers -> user-friendly and Intuitive interface –bilingual informants can’t be assumed to have any linguistic knowledge

13 Automatic Machine Translation Interlingua Transfer rules Corpus-based methods analysis interpretation generation

14 Automatic Learning of a Transfer-based MTS Elicitation corpus SVS algorithm Transfer module tentative Transfer rules Rule Refinement module SL sentences (tentative) TL sentences

15 Interactive and Automatic rule refinement Interactive step (TCTool): Given an MTS, translate sentences and present them to the users for minimal correction (interface design, MT error classification) Automatic step: Machine learning DS and algorithms to map user input with refined transfer-rules

16 User studies snapshot

17

18

19 TCTool: Tasks for HCII project analyze design of the basic web interface given a translated sentence, it asks the user to minimally correct it, if incorrect, and to classify the error(s). how to explain what minimally correction is what is the right error classification for non- expert and non-linguist users Can naïve users reliably pinpoint the source of errors? design User Studies to show reliability of user input (Spanish – English, English – Spanish, English – Chinese)

20 AVENUE project members LTI team: Researchers Ph. D. students Jaime Carbonell Ariadna Font Llitjós Lori Levin Christian Monson Alon Lavie Erik Peterson Ralf Brown Katharina Probst Avenue External Project Coordinator Rodolfo M Vega, Chilean team: Eliseo Cañulef Luis Caniupil Huaiquiñir Hugo Carrasco Marcela Collio Calfunao Rosendo Huisca Cristian Carrillan Anton Hector Painequeo Salvador Cañulef Flor Caniupil Claudio Millacura

21 Questions? For more information: http://www.cs.cmu.edu/~aria/avenue/


Download ppt "AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University."

Similar presentations


Ads by Google