Some slides adapted from.  Linguistic Generation  Statistical Generation.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
The Practical Value of Statistics for Sentence Generation: The Perspective of the Nitrogen System Irene Langkilde-Geary.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Statistical NLP: Lecture 3
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Chapter 20: Natural Language Generation Presented by: Anastasia Gorbunova LING538: Computational Linguistics, Fall 2006 Speech and Language Processing.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
1 Words and the Lexicon September 10th 2009 Lecture #3.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
NLP and Speech 2004 Feature Structures Feature Structures and Unification.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Artificial Intelligence 2005/06 From Syntax to Semantics.
1/13 Parsing III Probabilistic Parsing and Conclusions.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Features and Unification
NLP and Speech 2004 English Grammar
1/17 Probabilistic Parsing … and some other approaches.
Syntax and Context-Free Grammars CMSC 723: Computational Linguistics I ― Session #6 Jimmy Lin The iSchool University of Maryland Wednesday, October 7,
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Stochastic POS tagging Stochastic taggers choose tags that result in the highest probability: P(word | tag) * P(tag | previous n tags) Stochastic taggers.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
Phonetics, Phonology, Morphology and Syntax
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
Introduction to Natural Language Generation
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 14, Feb 27, 2007.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Natural Language Processing Lecture 6 : Revision.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Linguistic Essentials
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
Jan 2004CSA3050: NLG21 CSA3050: Natural Language Generation 2 Surface Realisation Systemic Grammar Functional Unification Grammar see J&M Chapter 20.3.
Section 11.3 Features structures in the Grammar ─ Jin Wang.
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Artificial Intelligence 2004
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
3.3 A More Detailed Look At Transformations Inversion (revised): Move Infl to C. Do Insertion: Insert interrogative do into an empty.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
NATURAL LANGUAGE PROCESSING
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
Natural Language - General
Linguistic Essentials
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Some slides adapted from

 Linguistic Generation  Statistical Generation

 Conceptual: ◦ What to say ◦ How to organize  Linguistic ◦ How to say it  Words?  Syntactic structure

ContentPlanner MicroPlanner SentenceGenerator Lexicon Grammar PresentationPlan Ontology Data Sentences

 Parsing  Input = sentence  Output = parse tree  Generation  Output = sentence  Input = parse tree?

 Syntactic  Agent = The President  Pred = pass  Patient = tax bailout plain  When = yesterday ◦ The President passed the tax bailout plan ◦ The tax bailout plan was passed by the President ◦ The tax bailout plan was passed ◦ It was the President who passed the tax bailout plan ◦ It was the tax bailout plan the President passed.  Constraints?

 Bought vs sell  Kathy bought the book from Joshua.  Joshua sold the book to Kathy.  Erudite vs. wise  The erudite old man taught us ancient history.  The wise old man taught us ancient history.  Polarity vs. “plus/minus”  Insert the battery and check the polarity.  Insert the battery and make sure the plus lines up with the plus.  Edged out vs. beat  The Denver Nuggets edged out the Boston Celtics  The Denver Nuggets beat the Boston Celtics with a narrow margin  Constraints?

 Syntax  Allow one to select  Allow the selection  Semantics  Rebound vs. point in basketball  Lexical  “grab a rebound” vs. “score a point” and not vice versa  Domain  IBM rebounded from a 3 day loss.  Magic grabbed 20 rebounds.  Pragmatics  A glass half-full  A glass half-empty

 Inter-lexical (e.g., collocations)  Cross-ranking (content units are not isomorphic with linguistic units)

 Wall Street indexes opened strongly. (time in verb, manner as adverb)  Stock indexes surged at the start of the trading day. (time as PP, manner in adverb)  The Denver Nuggets beat the Boston Celtics with a narrow margin, (game result in verb, manner in PP)  The Denver Nuggets edged out the Boston Celtics (game result and manner in verb)

ContentPlanner MicroPlanner SentenceGenerator Lexicon Grammar PresentationPlan Ontology Data Sentences Lexical choice

 Function plays as important a role as syntax  Pragmatics, semantics are represented equally with syntactic features, constitutents  Unification is used to enrich the input with constraints from the grammar  Input is recursively unified with grammar  Top-down process

 Functional Descriptions (FDs) as a feature structure  Data structure that is partial and structured  Input and grammar are both specified as functional descriptions

 ((alt GSIMPLE (  ;; a grammar always has the same form: an alternative  ;; with one branch for each constituent category.  ;; First branch of the alternative  ;; Describe the category clause.  ((cat clause)  (agent ((cat np)))  (patient ((cat np)))  (pred ((cat verb-group)  (number {agent number})))  (cset (pred agent patient))  (pattern (agent pred patient))  ;; Second branch: NP  ((cat np)  (head ((cat noun) (lex {^ ^ lex})))  (number ((alt np-number (singular plural))))  (alt ( ;; Proper names don't need an article  ((proper yes)  (pattern (head)))  ;; Common names do  ((proper no)  (pattern (det head))  (det ((cat article) (lex "the")))))))  ;; Third branch: Verb  ((cat verb-group)  (pattern (v))  (aux none)  (v ((cat verb) (lex {^ ^ lex}))))  ))

 Input to generate: The system advises John.  I1 =((cat clause)  (tense present)  (pred ((lex "advise")))  (agent ((lex "system") (proper no)))  (patient ((lex "John"))))

 ((cat clause)  (tense present)  (pred ((lex "advise")  (cat verb-group)  (number {agent number})  (PATTERN (V))  (AUX NONE)  (V ((CAT VERB) (LEX {^ ^ LEX})))))  (agent ((lex "system") (proper no)  (cat np)  (HEAD ((CAT NOUN) (LEX {^ ^ LEX})))  (NUMBER SINGULAR)  (PATTERN (DET HEAD))  (DET ((CAT ARTICLE) (LEX "the")))))  (patient ((lex "John")  (cat np)  (HEAD ((CAT NOUN) (LEX {^ ^ LEX})))  (NUMBER SINGULAR)  (PROPER YES)  (CSET (HEAD))  (PATTERN (HEAD))))  (cset (pred agent patient))  (pattern (agent pred patient)))

 Identify the pattern feature in the top level: for I1, it is (pattern (agent action affected)).  If a pattern is found: ◦ For each constituent of the pattern, recursively linearize the constituent. (That means linearize agent, pred and patient). ◦ The linearization of the FD is the concatenation of the linearizations of the constituents in the order prescribed by the pattern feature.  If no pattern is found: ◦ Find the lex feature of the FD, and depending on the category of the constituent, the morphological features needed. For example, if the FD is of (cat verb), the features needed are: person, number, tense. ◦ Send the lexical item and the appropriate morphological features to the morphology module. The linearization of the fd is the resulting string. For example, for (lex="advise") when the features are the default values (as they are in I1), the result is advises. When the FD does not contain a morphological feature, the morphology module provides reasonable defaults.

 ((cat clause)  (agent ((cat np)))  (patient ((cat np)))  (alt (  ((focus {agent})  (voice active)  (pred ((cat verb-group)  (number {agent number})  (cset (action agent affected))  (pattern (agent action affected)))  ((focus {patient})  (voice passive)  (verbs ((cat verb-group)  (aux ((lex “be”)  (number {patient number}))  (pastp ({pred}  (tense pastp)))  (pattern (aux pastp))))  (by-agent {agent})  (pattern (patient verbs by-agent)))) 

 Problem: What does the input to realization look like?  Wouldn’t it be easier to automatically learn output?  What does it take to scale up linguistic grammars?

 Subject-verb agreement I am vs. I are vs. I is  Corpus counts (Langkilde-Geary, 2002)  I am 2797  I are 47  I is 14

 Choice of determininer a trust vs. an trust vs. the trust  Corpus counts (Langkilde-Geary, 2002)  A trust394  An trust 0  The trust 1356  A trusts 2  An trusts 0  The trusts 115

 Over-generate and prune  Automatically acquire grammar from a corpus (if a phrase structure grammar is needed)  Exploit general-purpose tools and resources when possible & appropriate  Tokenizers  Part-of-speech taggers  Parsers, Penn Treebank  WordNet, VerbNet

 General strategy: ◦ Generate multiple candidate sentences with some permissive strategy  Some sentences may be very ungrammatical!  Very many sentences (millions) may be generated ◦ Assign scores to the candidate sentences using a corpus-based language model ◦ Output the highest-ranking sentence(s)

 I is not able to betray their trust.  I cannot betray trust of them.  I cannot betray the trust of them.  I am not able to betray their trust.  I will not be able to betray the trust of them.  I will not be able to betray their trust.  I cannot betray their trust.  I cannot betray trusts of them.  I are not able to betray their trust.  I cannot betray a trust of them.s

1. I cannot betray their trust. 2. I will not be able to betray their trust. 3. I am not able to betray their trust. 4. I are not able to betray their trust. 5. I is not able to betray their trust. 6. I cannot betray the trust of them. 7. I cannot betray trust of them. 8. I cannot betray a trust of them. 9. I cannot betray trusts of them. 10.I will not be able to betray the trust

1. I cannot betray their trust. 2. I will not be able to betray their trust. 3. I am not able to betray their trust. 4. I are not able to betray their trust. 5. I is not able to betray their trust. 6. I cannot betray the trust of them. 7. I cannot betray trust of them. 8. I cannot betray a trust of them. 9. I cannot betray trusts of them. 10.I will not be able to betray the trust

 Early, influential statistical realization algorithm ◦ Langkilde & Knight (1998) ◦ Hatzivassiloglou & Knight (1995)  Uses an overgenerate and prune strategy

 Input: Abstract Meaning Representation (AMR)  Based on Penman Sentence Plan Language (See Kasper 1989, Langkilde & Knight 1998)  Example AMR: (m1 / |dog<canid|)  m1 is an instance of |dog<canid| -- derived from WordNet  Might be realized “ the dog”, “ the dogs”, “ a dog”, “ dog”,...  Another example AMR: ◦ (m3 / |eat, take in| :agent (m4 / |dog<canid| :quant plural) :patient (m5 / |os,bone|)) ◦ Might be realized as “ the dogs ate the bone”, “Dogs willeat a bone”, “ The dogs eat bones”, “Dogs eat bone”,...

 In practice, overgeneration can produce millions of sentences for a single input ◦ So need very compact representations or prune aggressively  Nitrogen uses a lattice representation ◦ Lattice is an acyclic graph where each arc is labeled with a word. ◦ A complete path from the left-most node to rightmost node through the lattice represents a possible expression/sentence.

 Suppose realizer, looking at an AMR input, is uncertain about definiteness and number. Can generate a lattice fragment like this:  Generates: The large Federal deficit fell. A large Federal deficit fell. An large Federal deficit fell large. Federal deficit fell. A large Federal deficits fell.

 Set of hand-built rules link AMR patterns to lattice fragments  Each AMR pattern is deliberately mapped to many different realizations (overgeneration)  A lexicon describes alternative words that can express AMR concepts.

 A lexicon of 110,000 entries connects concepts to alternative English words. Format:  Important note: no features like transitivity, subcategorization, gradability (for adjectives), or countability (for nouns). ◦ This is a substantial advantage for development.

 Algorithm sketch: Traverse input AMR bottomup, building lattices for the leaves (innermost nested levels of the input) first, to be combined at outer levels according to relations between the leaves  (see Langkilde & Knight 1998 for details)  Result is a large lattice like...

This lattice represents 576 different sentences

 Nitrogen uses a bigram/trigram language model built from 46 million words of Wall Street Journal text from 1987 and  As visit each state s, maintain list of most probable sequences of words from start to s:  Extend all word sequences to predecessors of s,recompute scores, prune down to 1000 most probable sequences per state.  At end state, emit most probable sequence.

 Do the two approaches handle the same phenomena?  Could they be integrated?

 1989 Kasper, A flexible interface for linking applications to Penman's sentence generator  1995 Hatzivassiloglou & Knight, Unification Based Glossing  1995 Knight & Hatzivassiloglou, Two Level Many Paths Generation  1998 Langkilde & Knight, Generation that Exploits Corpus Based Statistical Knowledge  2000 Langkilde, Forest Based Statistical Sentence Generation  2002 Langkilde-Geary, An Empirical Verification of Coverage and Correctness for a General Purpose Sentence Generator  1998 Langkilde & Knight, The practical value of n grams in generation  2002 Langkilde & Geary, A foundation for general purpose natural language generation sentence realization using probabilistic models of language  2002 Oh & Rudnicky, Stochastic natural language generation for spoken dialog systems  2000 Ratnaparkhi, Trainable methods for surface natural language generation