Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology Miriam Butt (University of Konstanz) and Martin Forst (NetBase.

Slides:



Advertisements
Similar presentations
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Advertisements

Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition.
Lexical Functional Grammar : Grammar Formalisms Spring Term 2004.
BİL711 Natural Language Processing1 Problems with CFGs We know that CFGs cannot handle certain things which are available in natural languages. In particular,
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Grammar Development Platform Miriam Butt October 2002.
Grammar Engineering: Set-valued Attributes Various Kinds of Constraints Case Restrictions on Arguments Miriam Butt (University of Konstanz) and Martin.
Grammatical Relations and Lexical Functional Grammar Grammar Formalisms Spring Term 2004.
Software Applications for Processing Romanian Texts. Demonstration and Comparison Sanda Cherata Babeş-Bolyai University Faculty of Letters.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
The Lexicon II Miriam Butt December Hand-Coding vs. Lexicon Induction Languages contain closed class as well as open class items Closed Class: Auxiliaries,
Natural Language Processing - Feature Structures - Feature Structures and Unification.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור עשר Chart Parsing (cont) Features.
Stemming, tagging and chunking Text analysis short of parsing.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Sag et al., Chapter 4 Complex Feature Values 10/7/04 Michael Mulyar.
Integrating Finite-state Morphologies with Deep LFG Grammars Tracy Holloway King.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Learning Bit by Bit Class 3 – Stemming and Tokenization.
1 Kakia Chatsiou Department of Language and Linguistics University of Essex XLE Tutorial & Demo LG517. Introduction to LFG Introduction.
CS 4705 Lecture 11 Feature Structures and Unification Parsing.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
1 CPE 480 Natural Language Processing Lecture 5: Parser Asst. Prof. Nuttanart Facundes, Ph.D.
Linguistics 187/287 Week 2 Engineering and Linguistic Generalizations.
Ling 570 Day 17: Named Entity Recognition Chunking.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
Section 11.3 Features structures in the Grammar ─ Jin Wang.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
March 2006Introduction to Computational Linguistics 1 CLINT Tokenisation.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 6 (14/02/06) Prof. Pushpak Bhattacharyya IIT Bombay Top-Down and Bottom-Up.
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
English Syntax Read J & M Chapter 9.. Two Kinds of Issues Linguistic – what are the facts about language? The rules of syntax (grammar) Algorithmic –
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
Linguistics 187 Week 3 Coordination and Functional Uncertainty.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
Chapter 11: Parsing with Unification Grammars Heshaam Faili University of Tehran.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Natural Language Processing Vasile Rus
Descriptive Grammar – 2S, 2016 Mrs. Belén Berríos Droguett
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Lexical Functional Grammar
CS 388: Natural Language Processing: Syntactic Parsing
Department of Language and Linguistics
CSCI 5832 Natural Language Processing
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

Coordination Every attribute can only have one value So what do we do with coordinated constituents? Example: gorillas sleep and eat VP --> { … | VP: ! $ ^ CONJ VP: ! $ ^ }.

Coordination (cont’d) Coordination can happen basically at any level of the c-structure. Example: the gorillas peel and eat the bananas V --> { … | V: ! $ ^ CONJ V: ! $ ^ }.

Coordination (cont’d) Basically any category can be coordinated. Example: the gorillas eat the bananas in the cage and in the garden PP --> { … | PP: ! $ ^ CONJ PP: ! $ ^ }.

Coordination (cont’d) How can we capture these generalizations? Via regular-expression macros! SC-COORD(CAT) = CAT: ! $ ^; CONJ CAT: ! $ ^. PP --> {... PP) }.

Nominal coordination NP, N, etc. coordination is special because the NUM attribute should typically have the value pl even when the individual set members are in the sg. Examples:Mary and the gorilla like bananas. The boys and girls like bananas.

Nominal coordination (cont’d) NP-COORD(CAT) = CAT: ! $ ^; CONJ: ^ = ! (^ NUM) = pl; CAT: ! $ ^. NP --> {... NP) }. N --> {... N) }.

METARULEMACRO Macros are nice But can‘t we do better? After all, it‘s pretty tedious to go into almost all rules and invoke either the SC-COORD or the NP-COORD macro XLE has a special macro called the METARULEMACRO Every rule goes through the METARULEMACRO unless specified otherwise

METARULEMACRO (cont’d) Takes three arguments: _CAT, _BASECAT, and _RHS _CAT is the category on the left-hand side of the rule _BASECAT is the same as _CAT unless you are dealing with a complex-category rule _RHS is the right-hand side of the rule

METARULEMACRO (cont’d) METARULEMACRO(_CAT _BASECAT _RHS)= { _RHS | e: _CAT $ { N NP _CAT) | e: _CAT ~$ { N NP _CAT) }.

Interfacing finite-state transducers Maintaining a full-form lexicon is tedious Many lexicon entries look alike Is there a way to get the information about the category of a word from somewhere, ideally along with information about morphosyntactic categories such as tense, mood, case, number, person, etc? Finite-state morphologies!

Interfacing finite-state transducers Cascade of finite-state transducers used is specified in MORPHOLOGY section At least two subsections: –TOKENIZE –ANALYZE By default, the transducers listed are used both for parsing and for generation This behavior can be altered by prefixing the names of transducer files with P! or G!

Tokenization So far, only white spaces are considered as token boundaries However, there are more kinds of token boundaries in real-word text –Punctuation has to be split off the preceding token –Some white spaces should not be treated as token boundaries, e.g. “Sri Lanka” –Upper-case letters at sentence beginnings should optionally be lower-cased A finite-state tokenizer takes care of these things

Finite-state morphologies Map surface forms to canonical form (lemma) and series of morphological tags Examples: rode ride +Verb +PastTense +123P rides ride +Verb +Pres +3sg ride +Noun +Pl children child +Noun +Pl

Interfacing Finite-state Morphology Morphological tags need to be listed in the lexicon –Sublexical lexicon entries look like regular lexicon entries –Difference: morphcode xle instead of * Lemmas with non-predictable subcategorization frames must be listed in the lexicon Other lemmas can be dealt with by the -unknown entry

Lexicon entries for morphology output +Verb V-POS XLE. +Pres TNS +3sg PERS wait V-S XLE (^ PRED)= ‘wait ’. -unknown A-S %stem); N-S %stem).

Interfacing Finite-state Morphology Morphology output needs to be parsed by sublexical rules –Look like regular rules –Have f-annotations like regular rules –Difference: Sublexical categories are marked with the suffix _BASE

Interfacing Finite-state Morphology V --> V-S_BASE V-POS_BASE { TNS_BASE PERS_BASE | ASP_BASE }.

XLE Lookup Model Only one entry per headword per lexicon section Same headword may be covered by an explicit entry and by -unknown entry In order to allow this, we need to mark the explicit entry with ; ETC sleep sleep); ETC. -unknown %stem).