An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Feature Structures and Parsing Unification Grammars Algorithms for NLP 18 November 2014.
CSA2050: DCG I1 CSA2050 Introduction to Computational Linguistics Lecture 8 Definite Clause Grammars.
© Ferroday Ltd 2006 On-line dictionaries with ISO Part 1 Introduction and terminology.
Cognitive Linguistics Croft & Cruse 9
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
PSY 369: Psycholinguistics Some basic linguistic theory part2.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
NLP and Speech 2004 Feature Structures Feature Structures and Unification.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור עשר Chart Parsing (cont) Features.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Parsing I Miriam Butt May 2005 Jurafsky and Martin, Chapters 10, 13.
What is Syntax?  The rules that govern the structure of utterances; also called grammar  The basic organization of sentences is around syntax  build.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 4.
Humans, Computers, and Computational Complexity J. Winters Brock Nathan Kaplan Jason Thompson.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Computational Grammars Azadeh Maghsoodi. History Before First 20s 20s World War II Last 1950s Nowadays.
Semantics and Lexicology Generativist semantics. From structuralist semantics Semantic features, components.
Syntax 3: Back to State Networks... Recursive Transition Networks John Barnden School of Computer Science University of Birmingham Natural Language Processing.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing INTRODUCTION Muhammed Al-Mulhem March 1, 2009.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
SYNTAX Lecture -1 SMRITI SINGH.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
7. Parsing in functional unification grammar Han gi-deuc.
Transition Network Grammars for Natural Language Analysis - W. A. Woods In-Su Yoon Pusan National University School of Electrical and Computer Engineering.
Introduction to Embodied Construction Grammar March 4, 2003 Ben Bergen
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Creativity of Language
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Center for PersonKommunikation P.1 Background for NLP Questions brought up by N. Chomsky in the 1950’ies: –Can a natural language like English be described.
CSA2050 Introduction to Computational Linguistics Parsing I.
Introduction to Compiling
Compiler Introduction 1 Kavita Patel. Outlines 2  1.1 What Do Compilers Do?  1.2 The Structure of a Compiler  1.3 Compilation Process  1.4 Phases.
November 16, 2004 Lexicon (An Interacting Subsystem in UG) Part-II Rajat Kumar Mohanty IIT Bombay.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
November 9, Lexicon (An Interacting Subsystem in UG) Part-I Rajat Kumar Mohanty IIT Bombay.
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 3.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
NATURAL LANGUAGE PROCESSING
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
Introduction to Linguistics IX Syntax.
Chapter 1 Introduction.
System Software Unit-1 (Language Processors) A TOY Compiler
Programming Languages Translator
Theory of Computation Theory of computation is mainly concerned with the study of how problems can be solved using algorithms.  Therefore, we can infer.
Chapter 1 Introduction.
Compiler Lecture 1 CS510.
CS416 Compiler Design lec00-outline September 19, 2018
What is Syntax?  The rules that govern the structure of utterances; also called grammar  The basic organization of sentences is around syntax  build.
Introduction to Computational Linguistics
Natural Language Processing
CS416 Compiler Design lec00-outline February 23, 2019
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

An Intuitive Representation of Human Languages for Translation Gábor Prószéky MorphoLogic& Faculty of Information Technology, Pázmány University Kalmár Workshop Szeged, October 1-2, 2003

Contents t Some words on Prof. Kalmár’s activity in computational linguistics t Problems of human language description with formal tools t A new representation with patterns t Introduction to machine translation methods t Application of patterns to translation Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Kalmár & languages t Kalmár’s paper in formal language theory: „An Intuitive Representation of Context-Free Languages” t Kalmár’s activity in machine translation (conference in 1962): „Representation of Languages with the Help of Mathematical Structures” Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Linguistic representation problems of the 60’s t Dependency structure t Constituent structure t X-bar theory: X’  (P) X (Q) t Related structures t Using transformations Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Structured symbols t Linguistic categories: atomic symbols t Not enough: subcategorization t Semantic features: ± alive,... t Syntactic features: ± countable,... t Rule sets instead of rules t ID/LP Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Feature structures t DAGs t Unification problems t Feature geometry, typed features t LFG, GPSG, HPSG t Parsing: CF-skeleton + features or feature structures only? Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Complexity of NL grammars t RG/FSA: not enough t CF/RTN: not enough t CS ? t 0/ATN: Turing Machine t Transformations and metarules t Arguments for and against Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

NL grammar formalisms t Competence and performance? t Kornai number (left-recursion, center- embedding, “respectively” construction) t Gradually from unrestricted to regular t (i) a n b n ->a*b* (n is lost!) t (ii) a n b n ->{ε,ab,aabb,aaabbb} t “Finitization” by length t No structure in FSA; finite systems, however, can produce structural output Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Syntax and semantics t Logical representations (e.g. λx.dog(x), λx.run(x)) t World-knowledge representations (e.g. IS-A, PART-OF, INSTANCE-OF ) t Categorial grammar: early logical representations of syntax (Kalmár) t DCG: interpretation & representation t Rule-to-rule hypothesis Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Conflict handling t Lexicon meets syntax: who is right? t Lexicon: off-line info coming from past experiences t Which is more important in a specific situation? Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Open classes t Open vs. closed classes: that is, features can or cannot be overridden t Proper names, jabbers, folk etymology, loanwords,... t Grammar of closed classes: minimal grammar Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Finite morphology t Finite patterns t Finite number of entries t Descriptions assigned to entries t Finite & open vs. infinite & closed t Underspecified entries for guessing Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Finite syntax t “Item and arrangement” (as in morphology) t “Arrangement” describes a rather free constituent-order t Metawords in a meta-dictionary, e.g. ‘(Det (Adj (N)))’  ‘DAN’ t Cascades without loop Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

The „plastic box” t John is a boy. t ”John” is a noun. t Go is a verb. t ”Go” is a verb. t is a sign. t ” ” is a sign. t  is a . (where  is a ”plastic box”) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Real examples (a) Unusual use: Go is a verb. POS [np]  POS [v] (b) Metaphor: My car drinks a lot. ANIMATE [+]  ANIMATE [-] (c) Unknown entry: Kalmár is a family name. POS [np] Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Linguistic frames t Psychology: ”Gestalt” t Morphological complex structures treated as frames by humans t Frames in AI: ‘shopping’, ‘walking’,... t As ‘high-level parsing’ relates to ‘detailed on-line analysis’ Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Translation of human languages t old problems (50’s) t direct (60’s) t interlingual (70’s) t transfer (80’s) t examples (90’s) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation Patterns: general linguistic information in lexicalized form t Short, fully specified patterns are: lexical entries t Longer, fully specified entries are: multi-word expressions t Partially underspecified patterns are: collocations, phrasal verbs, idioms t Totally underspecified patterns are: linguistic rules t Pattern/interpretation pairs: Translation Description Language

Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation The MetaMorpho principles t No single words but contextual expressions (in form of patterns) only t Pattern pairs: input/interpretation structure pairs t Single pass: no separate transfer steps t Target structure generation: by-product of parsing t

Jabberwocky ‘Twas brillig, and the slighty toves Did gyre and gimble in the wabe: All mimsy were the borogroves, And the mone raths outgrabe. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

 ‘Twas , and the  s Did  and  in the : All  were the s, And the  s . Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Translation rules for Jabberwocky t ‘twas    volt t , and   , és  t the s did   a ok tak t  and    és  t in the   a ban t all   teljesen  t  were the s  k voltak az ok t the s   a ok tek Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

 t ‘Twas , and the  s Did  and  in the : All  were the s, And the  s . t  volt, és a  ok tak és tek a ben: teljesen  voltak a ok és a  ok tek. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Translation of Jabberwocky Dzsebervoki Brillig volt, és a szlájti tóvok gájertak és gimbeltek a vébben: teljesen mimszik voltak a borogróvok és a món rátok autgrébtek. Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

An intuitive representation X-bar based structures 2. Feature-based descriptions 3. Metarules (used off-line) 4. Rule-to-rule principle 5. Lexicon should be finite but open 6. Closed classes belong to the minimal grammar 7. Minimal grammar describes ”basically” linguistic elements Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

An intuitive representation... (cont’d) 8. Linguistic constructions can be described by finite patterns 9. A huge & finite description set is used rather than a limited & infinite grammar 10. In case of conflict, lexical information is either redundant or contradicting to the actual description 11. Known constructions need no real- time analysis (Gestalt, frame) Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

An intuitive representation... (cont’d) 12. ”Broken” frames are analyzed real-time 13. Structural (source/target) pattern pair is assigned to every frame to be translated 14. Target structure is computed while parsing source structure Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation

Kalmár Workshop 2003 Gábor Prószéky: An Intuitive Representation of Human Languages for Translation