Presentation is loading. Please wait.

Presentation is loading. Please wait.

A roadmap for MT : four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International.

Similar presentations


Presentation on theme: "A roadmap for MT : four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International."— Presentation transcript:

1 A roadmap for MT : four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002 Christian Boitet GETA, CLIPS, IMAG, 385 av. de la bibliothèque, BP 53 F-38041 Grenoble cedex 9, France Christian.Boitet@imag.fr, http://clips.imag.fr/geta

2 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 2/30 Outline Basic concepts What is MT ? Goals: Quality / User Architectures: Vauquois' triangle State of the art MT of texts: examples, problems MT of spoken dialogs The future of MT Goals 4 keys

3 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 3/30 What is M(a)T ? At least 3 types of automation MT = Machine Translation MAT = Machine Assisted Translation MAHT = Machine Aided Human Translation A scientific technology Informatics (computer science) Linguistics Mathematics

4 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 4/30 Goals: Quality / User

5 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 5/30 Architectures: Vauquois' triangle

6 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 6/30 Architekturen: Vauquois Dreieck (größer)

7 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 7/30 Formal intermediate structures

8 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 8/30 How to produce an MT system Choose an architecture Program the "tools" Spezialized languages for linguistic programming (SSLP) Development environment (MT shell) Build the "lingware" Lexical data / rules / weights Grammatical data / rules / weights Possible specialization to a typology ("sublanguage") How? Human work ± computer help / support Automatic learning (weights, likeliness…)

9 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 9/30 State of affairs only a small number of language pairs is covered by MT systems designed for information access Systran EC (2000): 19/110 language pairs, 8 OK for intended use See also examples by Ronaldo Martins even fewer are capable of quality translation or speech translation Now a few examples…

10 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 10/30 Examples: MT for access, Web (1)

11 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 11/30 Examples: MT for access, Web (2) FE quite "easy", compared with EG and mainly FG

12 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 12/30 Comparison: raw vs rough MT

13 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 13/30 Examples: MT for revisors…

14 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 14/30 …with BV-aero/FE (2)

15 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 15/30 MT of spoken dialogs Specialized systems are already usable e.g. ATR/Matsushita, IBM, CSTAR/Nespole!… Much "noise" and "ungrammaticalities" But specializing is very helpful! General systems are also possible e.g. NEC/Xroad, Linguatec/Talk&Translate Speech recognition is already good enough Rough may be good enough (e.g. for chatting) Interpretation is different from translation… …and participants are intelligent ! Similarity with access-oriented-MT

16 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 16/30 French-Korean through IF (1)

17 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 17/30 French-Korean through IF (2)

18 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 18/30 French-Korean through IF (3)

19 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 19/30 A road map… to which goals? MT of adequate quality Not only for access For all languages

20 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 20/30 Four keys 2 on the technical side 2 on the organizational side Compromize: a far wider coverage, a somewhat smaller asymptotic quality Automatic learning techniques Using non-textual pivots (intermediate formal descriptors) Democratization, cooperation Cooperative development of open source linguistic resources on the Web Towards systems where quality can be improved "on demand" by users

21 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 21/30 Learning techniques Extend the use of hybrid techniques symbolic, numerical, or mixed ==> they have demonstrated their potential at the research level stochastic grammars weighted (or "neural") dictionaries or build new tools, intrinsically numerical inspiration from voice recognition 2 examples learning analyzers : text —> semantic tree (IBM) learning implicit very detailed DG from tree bank (NAIST)

22 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 22/30 Using non-textual pivots Semantico-pragmatic (ontological) pivots task & domain oriented ==> limited applicability Abstract linguistic descriptors the most precise, but often too sophisticated depend on each language Anglo-semantic pivot: UNL "the HTML of linguistic content" in UNL, a hypergraph represents the abstract structure of (supposedly) equivalent English utterance less precise but "robust" symbols constructed from English ==> usable by all developers

23 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 23/30 score(icl>event,agt>human,fld>sport).@entry.@past.@complete pos head(pof>body).@def obj agt Ronaldo (icl>proper noun) insplt goal(icl>abstract thing) left(aoj<thing) pos mod corner(icl>thing).@def goal(icl>concrete thing) A simple UNL graph Ronaldo has headed the ball into the left corner of the goal

24 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 24/30 Cooperative development of open source linguistic resources on the Web Mutualization is necessary at least for lexical knowledge too costly even for the leaders size (#entries) has to augment for each language (300K, 3M?) #languages has to increase dramatically (11 —> 20 —> 180?) Integration of human- and machine-oriented knowledge is useful e.g. to produce mixed MT/MAHT systems

25 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 25/30 A contribution: the Papillon project Goal: produce many open source dictionaries from a central lexical data base Means: build rich (DiCo) monolingual dictionaries of lexies (senses) interlink lexies by interlingual links (axies) use XML & associated tools as basis to generate many formats for humans and for machines start from (free) digital resources induce "consumers" to become "producers" (contributors) Quality control: private accounts central validating/integrating group

26 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 26/30 Lexical Database Papillon database macrostructure User Dictionary Resource Interaction with the Dictionaries Extraction of Dictionaries Integration of existing resources Human Contributors

27 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 27/30 Interlingual links based on translations = "AXIEs" Possibility to link 1 lexie with >1 acceptions References to other semantic systems: AXIE—1————n—>UW PAPILLON diagram French. DiCo Vocable carte n.f. Lexie carte.1 carte à jouer Lexie carte.2 carte géographique Japan. DiCo 地図 カード Acception 343 UNL: card(icl>play), card(icl>thing)… Acception 345 UNL: map(fld>geography) Interlingual links Acception 1002 UNL: card(fld>money) a Thai DiCo Engl. DiCo Vocable card N Lexie card.1 playing card Lexie card.2 money card Vocable=lexie map

28 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 28/30 Construct systems where quality can be improved "on demand" by users a priori through interactive disambiguation in the source language or a posteriori by correcting the pivot representation (UNL or other) through any language (as in MultiMeteo) ==> In the 2 cases, all versions (in all languages) are improved possibility to merge MT multilingual generation computer-aided authoring

29 Ch. Boitet ICUKL2002, Goa, 25-29/11/2002 29/30 Conclusion 4 keys to open the door to MT of adequate quality to all languages On the technical side, dramatically increase the use of learning techniques use pivot architectures, the most universally usable pivot being UNL On the organizational side, cooperatively develop open source linguistic resources on the web construct systems where quality can be improved "on demand" by users On the practical side, seek keys to unlock private investment, public funding, voluntary cooperation could this conference become a decisive turning point?


Download ppt "A roadmap for MT : four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International."

Similar presentations


Ads by Google