Minimalist Parsing Scott Drellishak CompLing Lab Meeting 2/1/2006.

Slides:



Advertisements
Similar presentations
Approches formelles en syntaxe et sémantique Alain Lecomte UMR 7023 Structures Formelles de la Langue.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Feature Structures and Parsing Unification Grammars Algorithms for NLP 18 November 2014.
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy McKeown and Dan Jurafsky.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Chapter 4 Syntax.
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
Natural Language Processing - Feature Structures - Feature Structures and Unification.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור עשר Chart Parsing (cont) Features.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Transparency No. P2C4-1 Formal Language and Automata Theory Part II Chapter 4 Parse Trees and Parsing.
Context-Free Grammar Parsing by Message Passing Paper by Dekang Lin and Randy Goebel Presented by Matt Watkins.
Grammars Examples and Issues. Examples from Last Lecture a + b a b + a*bc* First draw a state diagram Then create a rule for each transition.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Computational Grammars Azadeh Maghsoodi. History Before First 20s 20s World War II Last 1950s Nowadays.
COP4020 Programming Languages
Winter 2003/4Pls – syntax – Catriel Beeri1 SYNTAX Syntax: form, structure The syntax of a pl: The set of its well-formed programs The rules that define.
Compilation 2007 Context-Free Languages Parsers and Scanners Michael I. Schwartzbach BRICS, University of Aarhus.
Syntax and Semantics Dr. Walid Amer, Associate Professor of linguistics The Islamic university of Gaza February, 2009.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
(2.1) Grammars  Definitions  Grammars  Backus-Naur Form  Derivation – terminology – trees  Grammars and ambiguity  Simple example  Grammar hierarchies.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
What it’s ? “parsing” Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages,
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
7. Parsing in functional unification grammar Han gi-deuc.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Introduction to Parsing
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Comp 311 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright August 28, 2009.
CSA2050 Introduction to Computational Linguistics Parsing I.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Natural Language - General
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
The Minimalist Program
ISBN Chapter 3 Describing Syntax and Semantics.
Natural Language Processing Chapter 2 : Morphology.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
PZ03BX Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ03BX –Recursive descent parsing Programming Language.
CSE573 Autumn /02/98 Natural Language Processing Administrative –PS4 support code now in the NT course area Truckworld interface (stop, start,
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
A Computational Approach to Minimalism Alain LECOMTE INRIA-FUTURS (team SIGNES) & CLIPS-IMAG (Grenoble)
Mid-Terms Exam Scope and Introduction. Format Grades: 100 points -> 20% in the final grade Multiple Choice Questions –8 questions, 7 points each Short.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
COMPILER CONSTRUCTION
Chapter 4 Syntax a branch of linguistics that studies how words are combined to form sentences and the rules that govern the formation of sentences.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Natural Language Processing Vasile Rus
Modeling Arithmetic, Computation, and Languages Mathematical Structures for Computer Science Chapter 8 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesAlgebraic.
Basic Parsing with Context Free Grammars Chapter 13
Syntax Analysis Chapter 4.
Graph Coverage for Design Elements CS 4501 / 6501 Software Testing
Natural Language - General
Language Variations: Japanese and English
Recursive descent parsing
Recursive descent parsing
Presentation transcript:

Minimalist Parsing Scott Drellishak CompLing Lab Meeting 2/1/2006

Overview Four parts: Four parts: 1. Whirlwind tour of Minimalism 2. Formal definition of a Minimalist Grammar 3. Algorithms for parsing MGs 4. Software and web sites

Four parts: Four parts: 1. Whirlwind tour of Minimalism 2. Formal definition of a Minimalist Grammar 3. Algorithms for parsing MGs 4. Software and web sites

Minimalism Minimalism Minimalism Recent version of transformational generative grammar. Chomsky’s (1995) The Minimalist Program. Recent version of transformational generative grammar. Chomsky’s (1995) The Minimalist Program. Updates and supersedes earlier GB/P&P Updates and supersedes earlier GB/P&P Explores “the extent to which previous empirical coverage can be maintained with fewer grammatical devices.” (Stabler 1999: 299) Explores “the extent to which previous empirical coverage can be maintained with fewer grammatical devices.” (Stabler 1999: 299)

Minimalism Sentence derivations proceed according to this (famous) diagram: Sentence derivations proceed according to this (famous) diagram: Lexicon Phonetic Form (PF) Logical Form (LF)

Minimalism Items come out of the lexicon fully inflected and with features: interpretable and uninterpretable. Items come out of the lexicon fully inflected and with features: interpretable and uninterpretable. Uninterpretable features must cancel out before the derivation reaches LF. Uninterpretable features must cancel out before the derivation reaches LF. The branch to PF allows the surface form to “peek” into the middle of the derivation. The branch to PF allows the surface form to “peek” into the middle of the derivation. Cross-linguistic differences are accounted for by variations in the lexicon. Cross-linguistic differences are accounted for by variations in the lexicon.

Minimalism Trees are derived by starting with singleton trees (lexical items) and combining them. Trees are derived by starting with singleton trees (lexical items) and combining them. Only two operations: Only two operations: On two trees: merge them together into a single tree (with one “projecting over” the other). On two trees: merge them together into a single tree (with one “projecting over” the other). On a single tree: move a node in the tree up to the root to cancel a feature. On a single tree: move a node in the tree up to the root to cancel a feature. (We’ll see a couple of derivations later.) (We’ll see a couple of derivations later.)

Four parts: Four parts: 1. Whirlwind tour of Minimalism 2. Formal definition of a Minimalist Grammar 3. Algorithms for parsing MGs 4. Software and web sites

Minimalist Grammars For parsing, Minimalism needs formalization. For parsing, Minimalism needs formalization. Stabler (1997) defines a MG as: Stabler (1997) defines a MG as: V =phonetic and interpretable features Cat =categories, selectors, licensors, licensees Lex =expressions (trees) built from V and Cat F ={ merge, move } (Based on an earlier grammar formalism, so the names don’t mean what you think.) (Based on an earlier grammar formalism, so the names don’t mean what you think.)

V = Lexicon Lexical entries like: Lexical entries like: =n d –case every (category D, selects a N, needs case) n language (category N) =d +case =d v speaks (category V, 2 DPs, assigns case to 1) This is a DP analysis This is a DP analysis “speaks” stands for /speaks/(speaks) “speaks” stands for /speaks/(speaks)

Cat = Features Base: c, t, v, d, n, … (parts of speech) Base: c, t, v, d, n, … (parts of speech) Select: =x, =X, X= (selects arguments) Select: =x, =X, X= (selects arguments) Select features trigger merge Select features trigger merge Upper-case moves phonetic content to merged node; “=” determines prefix or postfix Upper-case moves phonetic content to merged node; “=” determines prefix or postfix Licensees: -case, -wh, … (needs…) Licensees: -case, -wh, … (needs…) Licensors: +case, +wh, … (provides…) Licensors: +case, +wh, … (provides…) L* features trigger move; upper-case = “strong” L* features trigger move; upper-case = “strong”

Lex = Trees A set of nodes and three relations: A set of nodes and three relations: Dominance (x ⊳ y = x is y ’ s parent) Dominance (x ⊳ y = x is y ’ s parent) Who ’ s higher in the tree? Who ’ s higher in the tree? Precedence (x ≺ y = x precedes y) Precedence (x ≺ y = x precedes y) Who ’ s before who in the tree? Who ’ s before who in the tree? Projection (x < y = x projects over y) Projection (x < y = x projects over y) Whose features percolate up to the parent? Whose features percolate up to the parent?

F = Operations merge: Combines two trees. A head selects and combines with a phrase to its right: merge: Combines two trees. A head selects and combines with a phrase to its right: =d =d v make + d lunch ⇒ < =d v makelunch

F = Operations If the selector feature is upper case, only the phonetic features combine: If the selector feature is upper case, only the phonetic features combine: D= =d v make + d lunch ⇒ < =d v /lunch make/(make)(lunch)

F = Operations move: One tree’s head’s +x feature attracts the nearest –x feature to the root of the tree: move: One tree’s head’s +x feature attracts the nearest –x feature to the root of the tree: < +case v speak-case Nahuatl ⇒ > (Nahuatl) (Nahuatl)< v speak/Nahuatl/

A Sample Derivation Let’s take a look at the derivation of a simple sentence from Stabler (1997)… Let’s take a look at the derivation of a simple sentence from Stabler (1997)…

Four parts: Four parts: 1. Whirlwind tour of Minimalism 2. Formal definition of a Minimalist Grammar 3. Algorithms for parsing MGs 4. Software and web sites

Parsing MGs Stabler (2000 and 2001) describes a CYK-like algorithm for parsing MGs. Stabler (2000 and 2001) describes a CYK-like algorithm for parsing MGs. Defines a set of operations on strings of features that are arranged in “chains” (forests of incomplete trees). Defines a set of operations on strings of features that are arranged in “chains” (forests of incomplete trees). Each of these operations operates on a contiguous range of the forest, so they can be chart-parsed to recognize input sentences. Each of these operations operates on a contiguous range of the forest, so they can be chart-parsed to recognize input sentences.

MG Operations

CYK? Somewhat different from the version of CYK used to parse CFGs, but it’s still the same idea. Somewhat different from the version of CYK used to parse CFGs, but it’s still the same idea. Each operation transforms a string of features, canceling out selection and licensing features, producing more strings, which are stored in the chart. Each operation transforms a string of features, canceling out selection and licensing features, producing more strings, which are stored in the chart. Then, look for further operations that take them as input, building a hierarchy. Then, look for further operations that take them as input, building a hierarchy.

Another Recognizer Stabler refers to Harkema (2000), which defines a MG recognizer that works more like an Earley parser. Stabler refers to Harkema (2000), which defines a MG recognizer that works more like an Earley parser. It has an agenda and a chart. As operations are applied to make new items, those go into the agenda. Stop when a “goal item” appears in the chart. It has an agenda and a chart. As operations are applied to make new items, those go into the agenda. Stop when a “goal item” appears in the chart. Overall time complexity is O(n 4k+4 ) Overall time complexity is O(n 4k+4 )

Another Sample Derivation Here’s a derivation from Stabler (2000)—a slightly different format; note indices: Here’s a derivation from Stabler (2000)—a slightly different format; note indices: 1. (0,1)::=d v –wlexical 2. (1,2)::d –caselexical 3. (x,x)::=v +case acclexical 4. (x,x)::=acc +w wlexical 5. (0,1):v -w,(1,2):-casemerge3(1,2) 6. (x,x):+case acc,(0,1):-w,(1,2):-casemerge3(3,5) 7. (1,2):acc,(0,1):-wmove1(6) 8. (1,2):+w w,(0,1):-wmerge1(4,7) 9. (0,2):wmove1(8)

Four parts: Four parts: 1. Whirlwind tour of Minimalism 2. Formal definition of a Minimalist Grammar 3. Algorithms for parsing MGs 4. Software and web sites

Parsers Stabler’s parsers: MG parsers in Ocaml and two flavors of Prolog. (Also requires tcl/tk.) Stabler’s parsers: MG parsers in Ocaml and two flavors of Prolog. (Also requires tcl/tk.) Stabler’s parsers Stabler’s parsers Sourabh Niyogi: Stabler-based MG parser in Scheme, does verb subcategorization. Sourabh Niyogi: Stabler-based MG parser in Scheme, does verb subcategorization. Sourabh Niyogi Sourabh Niyogi Willemijn Vermaat: Stabler-based MG parser w/ web interface (that I couldn’t figure out). Willemijn Vermaat: Stabler-based MG parser w/ web interface (that I couldn’t figure out). Willemijn Vermaat Willemijn Vermaat Dekang Lin: MINIPAR. Executable only, based on PRINCIPAR, not clear what the internals are like. Dekang Lin: MINIPAR. Executable only, based on PRINCIPAR, not clear what the internals are like. Dekang Lin Dekang Lin

References Chomsky (1995). The Minimalist Program. Harkema (2000). A Recognizer for Minimalist Grammars. Stabler (1997). Derivational Minimalism. Stabler (1999). Remnant Movement and Structural Complexity. Stabler (2000). Minimalist Grammars and Recognition.