…) "East-Asia cooperation office" East-Asia cooperation office east-asia cooperation office(icl>…) "Tokyo University" "University of Kyoto" "World Bank(icl>…)""> …) "East-Asia cooperation office" East-Asia cooperation office east-asia cooperation office(icl>…) "Tokyo University" "University of Kyoto" "World Bank(icl>…)"">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on.

Similar presentations


Presentation on theme: "© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on."— Presentation transcript:

1 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on Universal Knowledge and Language (ICUKL2002), Goa, 25-29 November 2002 Christian BOITET GETA, CLIPS, IMAG, Grenoble Christian.Boitet@imag.fr

2 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 2 Which problems? What Igor said "remains to be done" 1.representation of multi-word concepts (« long UWs »); 2.elliptical expressions; 3.treatment of arguments both in the UW dictionary and in the UNL expressions and 1.conventions about attributes 2.XML formats for UNL documents

3 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 3 Representation of multi-word concepts (long UWs) — 1 Problematic examples of "UNKNOWN LONG UWs" "Institute of Advanced studies (UNU/IAS)"(icl>…) "East-Asia cooperation office" East-Asia cooperation office east-asia cooperation office(icl>…) "Tokyo University" "University of Kyoto" "World Bank(icl>…)"

4 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 4 Representation of long UWs — 2 What are the problems? 1.No hope of including all these long UWs in our UNL-LLL dictionaries  because of potentially immense, unbounded number of such UWs  Maybe never more than 5%, 10% of them in open domains 2.Necessity to include an analyzer of English compounds in order to translate "unknown long UWs" piece by piece.  but such compounds are extremely ambiguous

5 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 5 Let us think a bit more  Proper nouns CAN be decomposed.  This is NOT to say that their translation is always compositional. Compositional: World Bank ==> Banque du Mondefalse Idiomatic: World Bank ==> Banque mondialecorrect  So that we should have a solution allowing BOTH Compositional deconversion if the long UW is unknown Idiomatic deconversion after it put in the UNL-LLL dictionary

6 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 6 Proposal of a solution Origin  Proposed by H.Uchida at a meeting in Tokyo (1999?)  Not yet included but still needed and still the best Principle  Headword encodes a UNL representation of the compound Possible syntax "(mod(bank(icl>entity).@entry,world):01)"(icl> entity) "(mod(bank(icl> entity).@entry,world))"(icl> entity) … or a better one!

7 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 7 How to deconvert  Case 1: "(mod(bank(icl>institution).@entry,world))"(icl>instituti on) is not in the UNL-FR dictionary ==> French deconverter "unwraps" mod(bank(icl>institution).@entry,world) into a scope of the UNL-graph

8 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 8 Another example «"(mod(university.@entry,Tokyo(icl>town)):01)"(icl>entity)»university.@entry,Tokyo(icl>town)):01)"(icl>entity)» Compositional deconversion  Université de Tokyo  University of Tokyo  Universität von Tokyo  Tokyo no daigaku (or Tokyo ni daigaku) Idiomatic deconversion  Université de Tokyo (or Todai!)  Tokyo University / University of Tokyo  Universität Tokyo  Tokyo daigaku / Todai

9 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 9 Elliptical expressions Example Do you prefer the first or the second solution? I prefer the first.  Je préfère le premier?  Je préfère la première? ==> A bad deconversion will be very misleading. Possible solution Encode the elided element and put.@eld on it..@eld That is equivalent to "preedit" the input text  I prefer the first solution. …and in the spirit of the new idea by H.Uchida of preediting for semantic relations

10 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 10 Treatment of arguments in the UW dictionary in the UNL expressions See talk by I.Bogulslavskij The solution proposed entails 1.a very small change in the UNL syntax  Allow attributes.@A,.@B,.@C,.@D on arcs hence also on restrictions by sem.rel..@A.@B.@C.@D 2.a discipline in the UW creation  all arguments should appear as restrictions

11 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 11 "Argument-full" + "readable" UW Argument-full look(icl>do,agt.@A>person,obj.@B>thing); look(icl>do,agt.@A>person,gol.@B>thing); look(icl>do,agt.@A>person,dst.@B>thing); Readable look(icl>do, agt.@A>person, obj.@B>thing);look for something Even more readable look for(icl>do,agt.@A>person, obj.@B>thing);look for something

12 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 12 Continuing that list… look for(icl>do,agt.@A>person, obj.@B>thing);look for something look at(icl>do,agt.@A>person, plt.@B>thing);look at something or look at(icl>do,agt.@A>person, obj.@B>thing);look at something look like(icl>do,agt.@A>person, cmp.@B>thing);look like something look like(icl>do,agt.@A>person, obj.@B>thing);look like something might also cover "look as" in "he looks as a good man" or look as if(icl>do,agt.@A>person, obj.@B>thing);it looks as if… look(icl>do,agt.@A>person, obj.@B>thing);look for something

13 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 13 Attributes The problem lion(icl>mammal).@plur ==> un lion, les lions, lions? We don't know whether definiteness has been computed ==> it is.@undef ==> use it.@undef or not ==> it is UNKNOWN ==> compute default Solution: for every attribute XXXX, put.@XXXX.@XXXX for +XXXX (1 or true).@unXXXX.@unXXXX for -XXXX (0 or false) nothingfor XXXX unknown (? or undefined)

14 © Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 14 XML formats for UNL documents A minimal UNL-xml format strictly equivalent of UNL-htmlr –proposed & used by Tsai W.J. for the SWIIVRE-UNL web site & his Ph.D. Methodology for defining and using other, more detailed UNL-xml-xyz formats: –xyz is an application (e.g. a graphical editor, or statistics- gathering tool, etc.), –Automatic parsing of the basic UNL-xml format introduces new tags, –An object document model (DOM) suitable for application xyz can then be defined and used.


Download ppt "© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on."

Similar presentations


Ads by Google