Published byBraxton Merithew Modified over 2 years ago

An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár Workshop Szeged, October 1-2, 2003

Contents t Some words on Prof. Kalmár’s activity in computational linguistics t Problems of human language description with formal tools t A new representation with patterns t Introduction to machine translation methods t Application of patterns to translation Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Kalmár & languages t Kalmár’s paper in formal language theory: „An Intuitive Representation of Context-Free Languages” t Kalmár’s activity in machine translation (conference in 1962): „Representation of Languages with the Help of Mathematical Structures” Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Linguistic representation problems of the 60’s t Dependency structure t Constituent structure t X-bar theory: X’ (P) X (Q) t Related structures t Using transformations Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Structured symbols t Linguistic categories: atomic symbols t Not enough: subcategorization t Semantic features: ± alive,... t Syntactic features: ± countable,... t Rule sets instead of rules t ID/LP Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Feature structures t DAGs t Unification problems t Feature geometry, typed features t LFG, GPSG, HPSG t Parsing: CF-skeleton + features or feature structures only? Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Complexity of NL grammars t RG/FSA: not enough t CF/RTN: not enough t CS ? t 0/ATN: Turing Machine t Transformations and metarules t Arguments for and against Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

NL grammar formalisms t Competence and performance? t Kornai number (left-recursion, center- embedding, “respectively” construction) t Gradually from unrestricted to regular t (i) a n b n ->a*b* (n is lost!) t (ii) a n b n ->{ε,ab,aabb,aaabbb} t “Finitization” by length t No structure in FSA; finite systems, however, can produce structural output Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Syntax and semantics t Logical representations (e.g. λx.dog(x), λx.run(x)) t World-knowledge representations (e.g. IS-A, PART-OF, INSTANCE-OF ) t Categorial grammar: early logical representations of syntax (Kalmár) t DCG: interpretation & representation t Rule-to-rule hypothesis Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Conflict handling t Lexicon meets syntax: who is right? t Lexicon: off-line info coming from past experiences t Which is more important in a specific situation? Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Open classes t Open vs. closed classes: that is, features can or cannot be overridden t Proper names, jabbers, folk etymology, loanwords,... t Grammar of closed classes: minimal grammar Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Finite morphology t Finite patterns t Finite number of entries t Descriptions assigned to entries t Finite & open vs. infinite & closed t Underspecified entries for guessing Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Finite syntax t “Item and arrangement” (as in morphology) t “Arrangement” describes a rather free constituent-order t Metawords in a meta-dictionary, e.g. ‘(Det (Adj (N)))’ ‘DAN’ t Cascades without loop Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

The „plastic box” t John is a boy. t ”John” is a noun. t Go is a verb. t ”Go” is a verb. t is a sign. t ” ” is a sign. t is a . (where is a ”plastic box”) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Real examples (a) Unusual use: Go is a verb. POS [np] POS [v] (b) Metaphor: My car drinks a lot. ANIMATE [+] ANIMATE [-] (c) Unknown entry: Kalmár is a family name. POS [np] Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Linguistic frames t Psychology: ”Gestalt” t Morphological complex structures treated as frames by humans t Frames in AI: ‘shopping’, ‘walking’,... t As ‘high-level parsing’ relates to ‘detailed on-line analysis’ Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Translation of human languages t old problems (50’s) t direct (60’s) t interlingual (70’s) t transfer (80’s) t examples (90’s) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation Patterns: general linguistic information in lexicalized form t Short, fully specified patterns are: lexical entries t Longer, fully specified entries are: multi-word expressions t Partially underspecified patterns are: collocations, phrasal verbs, idioms t Totally underspecified patterns are: linguistic rules t Pattern/interpretation pairs: Translation Description Language

Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation The MetaMorpho principles t No single words but contextual expressions (in form of patterns) only t Pattern pairs: input/interpretation structure pairs t Single pass: no separate transfer steps t Target structure generation: by-product of parsing t

Jabberwocky ‘Twas brillig, and the slighty toves Did gyre and gimble in the wabe: All mimsy were the borogroves, And the mone raths outgrabe. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

‘Twas , and the s Did and in the : All were the s, And the s . Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Translation rules for Jabberwocky t ‘twas volt t , and , és t the s did a ok tak t and és t in the a ban t all teljesen t were the s k voltak az ok t the s a ok tek Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

t ‘Twas , and the s Did and in the : All were the s, And the s . t volt, és a ok tak és tek a ben: teljesen voltak a ok és a ok tek. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Translation of Jabberwocky Dzsebervoki Brillig volt, és a szlájti tóvok gájertak és gimbeltek a vébben: teljesen mimszik voltak a borogróvok és a món rátok autgrébtek. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

An intuitive representation X-bar based structures 2. Feature-based descriptions 3. Metarules (used off-line) 4. Rule-to-rule principle 5. Lexicon should be finite but open 6. Closed classes belong to the minimal grammar 7. Minimal grammar describes ”basically” linguistic elements Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

An intuitive representation... (cont’d) 8. Linguistic constructions can be described by finite patterns 9. A huge & finite description set is used rather than a limited & infinite grammar 10. In case of conflict, lexical information is either redundant or contradicting to the actual description 11. Known constructions need no real- time analysis (Gestalt, frame) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

An intuitive representation... (cont’d) 12. ”Broken” frames are analyzed real-time 13. Structural (source/target) pattern pair is assigned to every frame to be translated 14. Target structure is computed while parsing source structure Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

