Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L.

Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L

Joint Work With Micha Elsner (PhD student, Brown) Joseph Osterwile (Ex Undergraduate, Brown)

Abstract Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter- sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.

Revised Abstract We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. Given a document, randomly permute the order of its sentences and then attempt to distinguish the original from the permuted version. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. In this talk we consider the following abstract problem in discourse. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation. Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter-sentential analogue of semantics. NOTICE! This example is doctored to illustrate the program. You can ask me about the real randomized abstract if you like.

A Note on “Generative” When we talk about a “generative” model we do NOT mean a model that actually generates language. (If we do mean that we will say “literally generate”) Rather “generative” is used in machine learning to talk about a model that assigns probability to the input. So “generate” = “assign a probability to”.

Our Three Models So each of our three models assigns a probability to some aspect of the input (head- nouns, pronouns, and noun-phrase syntax, respectively). The idea is that the probability assigned to the original document should be higher than that assigned to the random one. One advantage of such generative models is that if done correctly, they can be combined by just multiplying their probabilities together. This is, in fact, exactly what we do.

More Formally We generate each sentence conditioned on the previous sentences For each sentence we compute three probabilities, head-nouns, pronouns, and NP syntax.

Generative Models of Discourse IIntroduction IIModel 1 – Head Nouns (Entity Grids) IIIModel 2 - Pronominal Reference IVModel 3 – Noun-Phrase Syntax VReal Problems (Future Work)

Nouns Tend to Repeat Discourse, the study of how the meaning of a document is built out the meaning’s of its sentences, is the inter- sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiply individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.

Entity Grids Following Barzilay Lapata, and Lee, an entity grid is an array with the “entities” (really just the head nouns) of the document on one axis, the sentence ordering on the other, and at each point the role the entities plays in the sentence. As in previous work we limit the roles to subject (S), object (0), other (X) and not mentioned (-).

A (Partial) Entity Grid DiscourseSX----- MeaningX------ Document X-X-X-- SentencesX-X---- Talk-X----- Problem-O-O--- Order--O---- Original--X----- Version--X----- Models---X-S-

The Grid for the Randomized Document Discourse----X-S Meaning------X Document -XX---X Sentences--X---X Talk----X-- ProblemO---O-- Order--O---- Original--X---- Version--X---- ModelsX--S-.-

The Basic E-grid Probability For head-noun probabilities we look at each head nouns probability given its two sentence history (what roles, (S,O,X,-) it filled in the two previous sentences. Each noun in the sentence The role n plays in the i-1th sentence

Model 1 Results Baseline50% Model 182.2% Trained on 10,000 automatically parsed documents form the NTC corpus, tested on 1323 other documents from same corpus.

Generative Models of Discourse IIntroduction IIModel 1 - Entity Grids IIIModel 2 - Pronominal Reference IVModel 3 – Noun-Phrase Syntax VReal Problems (Future Work)

Can Pronouns Help In our abstract the only important pronouns have intra-sentential antecedents. Furthermore, when the document is out of order, there will almost always be something for the pronoun to point back to. As we will see, pronouns are the weakest of our models, but they do help.

Adding Pronouns to the Mix To handle pronouns we need to consider the various pronoun resolution possibilities Unfortunately this sum is intractable, so we approximate it with This is reasonable because most documents have only one set of reference assignments that make sense

The probability of an Antecedent and the Pronoun given the Antecedent Probability that the antecedent is a given how far away a is, and how often it has been mentioned Probability of the pronoun gender given the antecedent. Probability of the pronoun number given the antecedent.

Example Pronoun Probabilities P(ref=x|x is 1 back and appeared 1 time)= 0.25 If it is 1 back and appeared>4 times, 0.86 P(“asbestos” is neuter) = 0.998 P(“alice” is feminine)=0.84 P(“it” has a plural antecedent)=.04

Model 2 Results Model 1 82.2% Model 271.3% Model 1+285.3%

Pronoun Reference vs. Discourse Modeling Best GenderWeak GenderModel Model 2 Discourse71.3%66.7% Pronoun Reference Accuracy79.1%75.5%

Abstract Discourse, the study of how the meaning of a document is built out the meanings of its sentences, is the inter- sentential analogue of semantics. In this talk we consider the following abstract problem in discourse. Given a document, randomly permute the order of the sentences and then attempt to distinguish the original from the permuted version. We present a sequence of generative models that can handle the problem with increasing accuracy. Each model accounts for some aspect of the document, and assigns a probability to the document's contents. In the standard generative way the subsequent models simply multiplies individual probabilities to get their results. We also discuss the linkage of this abstract task to more realistic ones such as essay grading, document summarization and document generation.

Distinctions Between First and Non-First Mentions The first mention of an entity tends to have more deeply embedded syntax, It is longer at every level of embedding, Uses the determiner “a” more often, Often uses certain key words more or less often. E.g., most newspapers seem to follow the convention that, e.g., “John Doe” will be followed by “Mr. Doe”.

Using This Information We assume that the first time a particular head noun occurs is the first mention, and all subsequent uses are non-first. We have a generative model of the noun- phrase syntax/key-words that should pick out the correct ordering.

Generative NP Syntax l={first, notfirst}h=height. Probability of larger h will be higher for l=first s is either a non-terminal or key-word

A Simple Example the document NP DET NOUN P(h=1|l) is high for l=nonfirst P(the|start,h=1,l) is high for l=nonfirst

Model 3 Results Model 1 82.2% Model 1+285.3% Model 386.2% Model 1+2+390.3% 1+3, 89.1%

Future Models Next week: Probabilistic choice of pronoun/full-NP. Next month: Insert quotations. (Almost) never in first sentence. Usually clustered together. Next year: Temporal relations between sentences, relations between verbs, different kinds of descriptions.

Real Problems Given an abstract representation of what we know about the entities in the document, (really) generate the words for those entities Given the sentences of two documents, and the first sentence of one of them, pick out the rest of the sentences of that document. The same, but with 10 documents on (roughly) the same topic. -er

Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L.

Similar presentations

Presentation on theme: "Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L.

Similar presentations

Presentation on theme: "Generative Models of Discourse Eugene Charniak Brown Laboratory for Linguistic Information Processing BL IP L."— Presentation transcript:

Similar presentations

About project

Feedback