Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L.

Slides:



Advertisements
Similar presentations
Ani Nenkova Lucy Vanderwende Kathleen McKeown SIGIR 2006.
Advertisements

Adverbs and Adjectives
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Unit 6 Predicates, Referring Expressions, and Universe of Discourse Part 1: Practices 1-7.
Introduction to phrases & clauses
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2006.
Rubric for a Persuasive Letter
1 Discourse, coherence and anaphora resolution Lecture 16.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
21-May-15 Genetic Algorithms. 2 Evolution Here’s a very oversimplified description of how evolution works in biology Organisms (animals or plants) produce.
1 Words and the Lexicon September 10th 2009 Lecture #3.
ZERO PRONOUN RESOLUTION IN JAPANESE Jeffrey Shu Ling 575 Discourse and Dialogue.
Term 2 Week 3 Semantics.
Part of speech (POS) tagging
Meaning and Language Part 1.
Syntax.
Linguistic Theory Lecture 3 Movement. A brief history of movement Movements as ‘special rules’ proposed to capture facts that phrase structure rules cannot.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
CAS LX 502 Semantics 3a. A formalism for meaning (cont ’ d) 3.2, 3.6.
14 Days Until CAHSEE!!! 15 February  Essay Revision Questions are based on the text of brief rough drafts, and they appear in two basic forms:
Subtracting Mixed Numbers © Math As A Second Language All Rights Reserved next #7 Taking the Fear out of Math
COMM 226 Practical tips for creating entity relationship diagrams (ERDs) Chitu Okoli Associate Professor in Business Technology Management John Molson.
What is Readability?  A characteristic of text documents..  “the sum total of all those elements within a given piece of printed material that affect.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
CSI 3120, Grammars, page 1 Language description methods Major topics in this part of the course: –Syntax and semantics –Grammars –Axiomatic semantics (next.
Artificial Intelligence
Relationship between Physics Understanding and Paragraph Coherence Reva Freedman November 15, 2012.
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Syntax Lecture 5: More On Wh-movement. Review Wh-movement: – Moves interrogative ‘wh’-phrase – from various positions inside the IP – to the specifier.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Artificial Intelligence: Natural Language
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2005.
Models of Linguistic Choice Christopher Manning. 2 Explaining more: How do people choose to express things? What people do say has two parts: Contingent.
Semantic Construction lecture 2. Semantic Construction Is there a systematic way of constructing semantic representation from a sentence of English? This.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Automatic recognition of discourse relations Lecture 3.
Programming Errors. Errors of different types Syntax errors – easiest to fix, found by compiler or interpreter Semantic errors – logic errors, found by.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Levels of Linguistic Analysis
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
SEMANTICS Referring Expression.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Grammar Chapter 10. What is Grammar? Basic Points description of patterns speakers use to construct sentences stronger patterns - most nouns form plurals.
GRAMMAR AND PUNCTUATION REVISE AND REVIEW WORD CLASSES.
Chapter 3: Determining the Topic. © 2008 McGraw-Hill Higher EducationChapter 3: Determining the Topic2 Definition of Topic The “something” an author chooses.
Unit 6 Predicates, Referring Expressions, and Universe of Discourse.
Take notes! I don’t want to see any of these errors in future writing assignments.
Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,
Week 3. Clauses and Trees English Syntax. Trees and constituency A sentence has a hierarchical structure Constituents can have constituents of their own.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
CSC 594 Topics in AI – Natural Language Processing
Grammar: Issues with Agreement
Levels of Linguistic Analysis
Information Retrieval
Presentation transcript:

Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L

Joint Work With Micha Elsner (PhD student, Brown) Joseph Osterwile (Ex Undergraduate, Brown)

Abstract Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter- sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.

Revised Abstract We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. Given a document, randomly permute the order of its sentences and then attempt to distinguish the original from the permuted version. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. In this talk we consider the following abstract problem in discourse. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation. Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter-sentential analogue of semantics. NOTICE! This example is doctored to illustrate the program. You can ask me about the real randomized abstract if you like.

A Note on “Generative” When we talk about a “generative” model we do NOT mean a model that actually generates language. (If we do mean that we will say “literally generate”) Rather “generative” is used in machine learning to talk about a model that assigns probability to the input. So “generate” = “assign a probability to”.

Our Three Models So each of our three models assigns a probability to some aspect of the input (head- nouns, pronouns, and noun-phrase syntax, respectively). The idea is that the probability assigned to the original document should be higher than that assigned to the random one. One advantage of such generative models is that if done correctly, they can be combined by just multiplying their probabilities together. This is, in fact, exactly what we do.

More Formally We generate each sentence conditioned on the previous sentences For each sentence we compute three probabilities, head-nouns, pronouns, and NP syntax.

Generative Models of Discourse IIntroduction IIModel 1 – Head Nouns (Entity Grids) IIIModel 2 - Pronominal Reference IVModel 3 – Noun-Phrase Syntax VReal Problems (Future Work)

Nouns Tend to Repeat Discourse, the study of how the meaning of a document is built out the meaning’s of its sentences, is the inter- sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.

Entity Grids Following Barzilay Lapata, and Lee, an entity grid is an array with the “entities” (really just the head nouns) of the document on one axis, the sentence ordering on the other, and at each point the role the entities plays in the sentence. As in previous work we limit the roles to subject (S), object (0), other (X) and not mentioned (-).

A (Partial) Entity Grid DiscourseSX----- MeaningX Document X-X-X-- SentencesX-X---- Talk-X----- Problem-O-O--- Order--O---- Original--X----- Version--X----- Models---X-S-

The Grid for the Randomized Document Discourse----X-S Meaning------X Document -XX---X Sentences--X---X Talk----X-- ProblemO---O-- Order--O---- Original--X---- Version--X---- ModelsX--S-.-

The Basic E-grid Probability For head-noun probabilities we look at each head nouns probability given its two sentence history (what roles, (S,O,X,-) it filled in the two previous sentences. Each noun in the sentence The role n plays in the i-1th sentence

Model 1 Results Baseline50% Model 182.2% Trained on 10,000 automatically parsed documents form the NTC corpus, tested on 1323 other documents from same corpus.

Generative Models of Discourse IIntroduction IIModel 1 - Entity Grids IIIModel 2 - Pronominal Reference IVModel 3 – Noun-Phrase Syntax VReal Problems (Future Work)

Can Pronouns Help In our abstract the only important pronouns have intra-sentential antecedents. Furthermore, when the document is out of order, there will almost always be something for the pronoun to point back to. As we will see, pronouns are the weakest of our models, but they do help.

Adding Pronouns to the Mix To handle pronouns we need to consider the various pronoun resolution possibilities Unfortunately this sum is intractable, so we approximate it with This is reasonable because most documents have only one set of reference assignments that make sense

The probability of an Antecedent and the Pronoun given the Antecedent Probability that the antecedent is a given how far away a is, and how often it has been mentioned Probability of the pronoun gender given the antecedent. Probability of the pronoun number given the antecedent.

Example Pronoun Probabilities P(ref=x|x is 1 back and appeared 1 time)= 0.25 If it is 1 back and appeared>4 times, 0.86 P(“asbestos” is neuter) = P(“alice” is feminine)=0.84 P(“it” has a plural antecedent)=.04

Model 2 Results Model % Model 271.3% Model %

Pronoun Reference vs. Discourse Modeling Best GenderWeak GenderModel Model 2 Discourse71.3%66.7% Pronoun Reference Accuracy79.1%75.5%

Generative Models of Discourse IIntroduction IIModel 1 - Entity Grids IIIModel 2 - Pronominal Reference IVModel 3 – Noun-Phrase Syntax VReal Problems (Future Work)

Abstract Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter- sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiplies individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.

Distinctions Between First and Non-First Mentions The first mention of an entity tends to have more deeply embedded syntax, It is longer at every level of embedding, Uses the determiner “a” more often, Often uses certain key words more or less often. E.g., most newspapers seem to follow the convention that, e.g., “John Doe” will be followed by “Mr. Doe”.

Using This Information We assume that the first time a particular head noun occurs is the first mention, and all subsequent uses are non-first. We have a generative model of the noun- phrase syntax/key-words that should pick out the correct ordering.

Generative NP Syntax l={first, notfirst}h=height. Probability of larger h will be higher for l=first s is either a non-terminal or key-word

A Simple Example the document NP DET NOUN P(h=1|l) is high for l=nonfirst P(the|start,h=1,l) is high for l=nonfirst

Model 3 Results Model % Model % Model 386.2% Model % 1+3, 89.1%

Generative Models of Discourse IIntroduction IIModel 1 - Entity Grids IIIModel 2 - Pronominal Reference IVModel 3 – Noun-Phrase Syntax VReal Problems (Future Work)

Future Models Next week: Probabilistic choice of pronoun/full-NP. Next month: Insert quotations. (Almost) never in first sentence. Usually clustered together. Next year: Temporal relations between sentences, relations between verbs, different kinds of descriptions.

Real Problems Given an abstract representation of what we know about the entities in the document, (really) generate the words for those entities Given the sentences of two documents, and the first sentence of one of them, pick out the rest of the sentences of that document. The same, but with 10 documents on (roughly) the same topic. -er