A roadmap for MT : four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop November 2007.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Machine Translation II How MT works Modes of use.
The 20th International Conference on Software Engineering and Knowledge Engineering (SEKE2008) Department of Electrical and Computer Engineering
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
1 CIS607, Fall 2006 Semantic Information Integration Instructor: Dejing Dou Week 10 (Nov. 29)
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
An innovative platform to allow translation and indexing of internet sites Localization World
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Carlos Lamsfus. ISWDS 2005 Galway, November 7th 2005 CENTRO DE TECNOLOGÍAS DE INTERACCIÓN VISUAL Y COMUNICACIONES VISUAL INTERACTION AND COMMUNICATIONS.
ICS-FORTH May 25, The Utility of XML Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Heraklion, May.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
1/24 17/7/2002 (Papillon-02) Translation in Papillon (Ch. Boitet) The translation of examples, citations, definitions and glosses in the Papillon project.
GOOD, MULTILINGUAL interpretation, translation, resources What can we do for the OG-08? Christian BOITET GETA, CLIPS, IMAG-campus UJF & CNRS, Grenoble,
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Institute of Informatics and Telecommunications – NCSR “Demokritos” Bootstrapping ontology evolution with multimedia information extraction C.D. Spyropoulos,
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Machine Translation, Digital Libraries, and the Computing Research Laboratory Indo-US Workshop on Digital Libraries June 23, 2003.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ LREC-2002, Las Palmas, May 2002 Mathieur Lafourcade & Christian Boitet.
Area Report Machine Translation Hervé Blanchon CLIPS-IMAG A Roadmap for Computational Linguistics COLING 2002 Post-Conference Workshop.
Expanding the Accessibility and Impact of Language Technologies for Supporting Education (TFlex): Edinburgh Effort Dr. Myroslava Dzikovska, Prof. Johanna.
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
Data Mining By Dave Maung.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
Introduction to Computational Linguistics
Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003.
Artificial Intelligence: Natural Language
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.
1 Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents S. Kawamoto, et al. October 27, 2004.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Be.wi-ol.de User-friendly ontology design Nikolai Dahlem Universität Oldenburg.
SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.
UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Introduction to Machine Translation
Approaches to Machine Translation
Introduction to Machine Translation
Tomás Murillo-Morales and Klaus Miesenberger
Approaches to Machine Translation
Introduction to Machine Translation
Presentation transcript:

A roadmap for MT : four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International Conference on Universal Knowledge and Language (ICUKL2002), Goa, November 2002 Christian Boitet GETA, CLIPS, IMAG, 385 av. de la bibliothèque, BP 53 F Grenoble cedex 9, France

Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 2/30 Outline Basic concepts What is MT ? Goals: Quality / User Architectures: Vauquois' triangle State of the art MT of texts: examples, problems MT of spoken dialogs The future of MT Goals 4 keys

Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 3/30 What is M(a)T ? At least 3 types of automation MT = Machine Translation MAT = Machine Assisted Translation MAHT = Machine Aided Human Translation A scientific technology Informatics (computer science) Linguistics Mathematics

Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 4/30 Goals: Quality / User

Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 5/30 Architectures: Vauquois' triangle

Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 6/30 Architekturen: Vauquois Dreieck (größer)

Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 7/30 Formal intermediate structures

Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 8/30 How to produce an MT system Choose an architecture Program the "tools" Spezialized languages for linguistic programming (SSLP) Development environment (MT shell) Build the "lingware" Lexical data / rules / weights Grammatical data / rules / weights Possible specialization to a typology ("sublanguage") How? Human work ± computer help / support Automatic learning (weights, likeliness…)

Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 9/30 State of affairs only a small number of language pairs is covered by MT systems designed for information access Systran EC (2000): 19/110 language pairs, 8 OK for intended use See also examples by Ronaldo Martins even fewer are capable of quality translation or speech translation Now a few examples…

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Examples: MT for access, Web (1)

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Examples: MT for access, Web (2) FE quite "easy", compared with EG and mainly FG

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Comparison: raw vs rough MT

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Examples: MT for revisors…

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 …with BV-aero/FE (2)

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 MT of spoken dialogs Specialized systems are already usable e.g. ATR/Matsushita, IBM, CSTAR/Nespole!… Much "noise" and "ungrammaticalities" But specializing is very helpful! General systems are also possible e.g. NEC/Xroad, Linguatec/Talk&Translate Speech recognition is already good enough Rough may be good enough (e.g. for chatting) Interpretation is different from translation… …and participants are intelligent ! Similarity with access-oriented-MT

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 French-Korean through IF (1)

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 French-Korean through IF (2)

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 French-Korean through IF (3)

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 A road map… to which goals? MT of adequate quality Not only for access For all languages

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Four keys 2 on the technical side 2 on the organizational side Compromize: a far wider coverage, a somewhat smaller asymptotic quality Automatic learning techniques Using non-textual pivots (intermediate formal descriptors) Democratization, cooperation Cooperative development of open source linguistic resources on the Web Towards systems where quality can be improved "on demand" by users

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Learning techniques Extend the use of hybrid techniques symbolic, numerical, or mixed ==> they have demonstrated their potential at the research level stochastic grammars weighted (or "neural") dictionaries or build new tools, intrinsically numerical inspiration from voice recognition 2 examples learning analyzers : text —> semantic tree (IBM) learning implicit very detailed DG from tree bank (NAIST)

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Using non-textual pivots Semantico-pragmatic (ontological) pivots task & domain oriented ==> limited applicability Abstract linguistic descriptors the most precise, but often too sophisticated depend on each language Anglo-semantic pivot: UNL "the HTML of linguistic content" in UNL, a hypergraph represents the abstract structure of (supposedly) equivalent English utterance less precise but "robust" symbols constructed from English ==> usable by all developers

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 pos obj agt Ronaldo (icl>proper noun) insplt goal(icl>abstract thing) left(aoj<thing) pos mod goal(icl>concrete thing) A simple UNL graph Ronaldo has headed the ball into the left corner of the goal

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Cooperative development of open source linguistic resources on the Web Mutualization is necessary at least for lexical knowledge too costly even for the leaders size (#entries) has to augment for each language (300K, 3M?) #languages has to increase dramatically (11 —> 20 —> 180?) Integration of human- and machine-oriented knowledge is useful e.g. to produce mixed MT/MAHT systems

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 A contribution: the Papillon project Goal: produce many open source dictionaries from a central lexical data base Means: build rich (DiCo) monolingual dictionaries of lexies (senses) interlink lexies by interlingual links (axies) use XML & associated tools as basis to generate many formats for humans and for machines start from (free) digital resources induce "consumers" to become "producers" (contributors) Quality control: private accounts central validating/integrating group

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Lexical Database Papillon database macrostructure User Dictionary Resource Interaction with the Dictionaries Extraction of Dictionaries Integration of existing resources Human Contributors

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Interlingual links based on translations = "AXIEs" Possibility to link 1 lexie with >1 acceptions References to other semantic systems: AXIE—1————n—>UW PAPILLON diagram French. DiCo Vocable carte n.f. Lexie carte.1 carte à jouer Lexie carte.2 carte géographique Japan. DiCo 地図 カード Acception 343 UNL: card(icl>play), card(icl>thing)… Acception 345 UNL: map(fld>geography) Interlingual links Acception 1002 UNL: card(fld>money) a Thai DiCo Engl. DiCo Vocable card N Lexie card.1 playing card Lexie card.2 money card Vocable=lexie map

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Construct systems where quality can be improved "on demand" by users a priori through interactive disambiguation in the source language or a posteriori by correcting the pivot representation (UNL or other) through any language (as in MultiMeteo) ==> In the 2 cases, all versions (in all languages) are improved possibility to merge MT multilingual generation computer-aided authoring

Ch. Boitet ICUKL2002, Goa, 25-29/11/ /30 Conclusion 4 keys to open the door to MT of adequate quality to all languages On the technical side, dramatically increase the use of learning techniques use pivot architectures, the most universally usable pivot being UNL On the organizational side, cooperatively develop open source linguistic resources on the web construct systems where quality can be improved "on demand" by users On the practical side, seek keys to unlock private investment, public funding, voluntary cooperation could this conference become a decisive turning point?