Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication.

Similar presentations


Presentation on theme: "Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication."— Presentation transcript:

1 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication Environment Jaime Carbonell, Lori Levin, Alon Lavie, Language Technologies Institute Carnegie Mellon University {jgc, lsl, alavie}@cs.cmu.edu

2 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 Machine Translation of Indigenous Languages Policy makers have access to information about indigenous people. –Epidemics, crop failures, etc. Indigenous people can participate in –Health care –Education –Government –Internet without giving up their languages.

3 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 History of NICE Arose from a series of joint workshops of NSF and OAS. Workshop recommendations: –Create multinational projects using information technology to: provide immediate benefits to governments and citizens develop critical infrastructure for communication and collaborative research –training researchers and engineers –advancing science and technology

4 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 Architecture Diagram User Learning Module Elicitation Process Learning Process Transfer Rules Run-Time Module SL Input SL Parser Transfer Engine TL Generator EBMT Engine Unifier Module TL Output

5 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 EBMT Example English: I would like to meet her. Mapudungun: Ayükefun trawüael fey engu. English: The tallest man is my father. Mapudungun: Chi doy fütra chi wentru fey ta inche ñi chaw. English: I would like to meet the tallest man Mapudungun (new): Ayükefun trawüael Chi doy fütra chi wentru Mapudungun (correct): Ayüken ñi trawüael chi doy fütra wentruengu.

6 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE Partners LanguageCountryInstitutions Mapudungun (in place) Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education Iñupiaq (advanced discussion) US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans-Arctic and Antarctic Institute, Alaska Native Language Center Siona (discussion) Colombia OAS-CICAD, Plante, Department of the Interior

7 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 Agreement Between LTI and Institute of Indigenous Studies (IEI), Universidad De La Frontera, Chile Contributions of IEI –Native language knowledge and linguistic expertise in Mapudungun –Experience in bicultural, bilingual education –Data collection: recording, transcribing, translating –Orthographic normalization of Mapudungun

8 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 Agreement between LTI and Institute of Indigenous Studies (IEI), Universidad de la Frontera, Chile Contributions of LTI –Develop MT technology for indigenous languages –Training for data collection and transcription –Partial support for data collection effort pending funding from Chilean Ministry of Education –International coordination, technical and project management

9 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 LTI/IEI Agreement Continue collaboration on data collection and machine translation technology. Pursue focused areas of mutual interest, such as bilingual education. Seek additional funding sources in Chile and the US.

10 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 The IEI Team Coordinator (leader of a bilingual and multicultural education project): –Eliseo Canulef Distinguished native speaker: –Rosendo Huisca Linguists (one native speaker, one near-native) –Juan Hector Painequeo –Hugo Carrasco Typists/Transcribers Recording assistants Translators Native speaker linguistic informants

11 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 MINEDUC/IEI Agreement Highlights: Based on the LTI/IEI agreement, the Chilean Ministry of Education agreed to fund the data collection and processing team for the year 2001. This agreement will be renewed each year, as needed.

12 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 MINEDUC/IEI Agreement: Objectives  To evaluate the NICE/Mapudungun proposal for orthography and spelling  To collect an oral corpus that represent the four Mapudungun dialects spoken in Chile. The main domain is primary health, traditional and western.

13 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 MINEDUC/IEI Agreement: Deliverables  An oral corpus of 800 hours recorded, proportional to the demography of each current spoken dialect  120 hours transcribed and translated from Mapudungun to Spanish  A refined proposal for writing Mapudungun

14 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 Nice/Mapudungun: Database Writing conventions (Grafemario) Glossary Mapudungun/Spanish Bilingual newspaper, 4 issues Ultimas Familias –memoirs Memorias de Pascual Coña –Publishable product with new Spanish translation 35 hours transcribed speech 80 hours recorded speech`

15 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE/Mapudungun: Other Products Standardization of orthography: Linguists at UFRO have evaluated the competing orthographies for Mapudungun and written a report detailing their recommendations for a standardized orthography for NICE. Training for spoken language collection: In January 2001 native speakers of Mapudungun were trained in the recording and transcription of spoken data.

16 Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 Underfunded Activities Data collection –Colombia (unfunded) –Chile (partially funded) Travel –More contact between CMU and Chile (UFRO) and Colombia. Training –Train Mapuche linguists in language technologies at CMU. –Extend training to Colombia Refine MT system for Mapudungun and Siona –Current funding covers research on the MT engine and data collection, but not detailed linguistic analysis


Download ppt "Language Technologies Institute School of Computer Science Carnegie Mellon University NSF August 6, 2001 NICE: Native language Interpretation and Communication."

Similar presentations


Ads by Google