Unsupervised Learning of Narrative Schemas and their Participants

Slides:

Advertisements

Similar presentations

A Database of Nate Chambers and Dan Jurafsky Stanford University Narrative Schemas.

Advertisements

Helpful Hints to Conduct and Write a Literature Review October 2006.

Computational Lexical Semantics Lecture 9: Learning Narrative Frames Linguistic Institute 2005 University of Chicago.

Learning Narrative Schemas Nate Chambers, Dan Jurafsky Stanford University IBM Watson Research Center Visit.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.

Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

A Database of Narrative Schemas A 2010 paper by Nathaniel Chambers and Dan Jurafsky Presentation by Julia Kelly.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

SAMPLING Purposes Representativeness “Sampling error”

An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)

Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.

WP4 Models and Contents Quality Assessment

Topic Modeling for Short Texts with Auxiliary Word Embeddings

Automatically Labeled Data Generation for Large Scale Event Extraction

Dependent-Samples t-Test

REPORT WRITING.

Chapter 16: Sample Size “See what kind of love the Father has given to us, that we should be called children of God; and so we are. The reason why the.

Working with Scholarly Articles

Statistical NLP: Lecture 7

How to Write a Constructed and Visual Response for the EOC

A Brief Introduction to Distant Supervision

MENTOR TEXTS QUESTIONS

Web News Sentence Searching Using Linguistic Graph Similarity

Developing Student Writing

SAMPLING Purposes Representativeness “Sampling error”

Testing Hypotheses About Proportions

SAT Notes: Please get out your notebook and turn to the writing section. We are taking notes today.

Content analysis, thematic analysis and grounded theory

By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS

Software Documentation

Schema Theory (Paper – 1.1.4:Unit – 5)

Thesis writing Session 2017

Two Discourse Driven Language Models for Semantics

Statistical NLP: Lecture 13

Transitions between paragraphs Conclusions Consultation Time

CSC 594 Topics in AI – Natural Language Processing

Observations on assignment 3 - Reviews

Inferential statistics,

Compare and contrast essay

CHAPTER 1 – OBSERVATION SKILLS

Social Knowledge Mining

AP World History Riverside High School Mr. Sakole

Writing the Document Based Question (DBQ) Essay

Reading and Writing about Literature

Number and String Operations

Statistical NLP: Lecture 9

LANGUAGE TEACHING MODELS

Video 6: Practice Papers for

Rhetorical Forms Definition Comparison-Contrast Classification

Thesis & Evidence Guidelines

Scholastic Aptitude Test Developing Critical Reading Skills

Testing Hypotheses About Proportions

Personal Narratives Notes.

INF 141: Information Retrieval

Unit 2: Research Lesson 04 and 05

How to Write a Summary Text Read Annotate Write

Thesis & Evidence Guidelines

Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.

The Winograd Schema Challenge Hector J. Levesque AAAI, 2011

Information Retrieval

Meta-analysis, systematic reviews and research syntheses

Statistical NLP : Lecture 9 Word Sense Disambiguation

Statistical NLP: Lecture 10

Professor Junghoo “John” Cho UCLA

Presentation transcript:

Halley Young (halleyy@seas.upenn.edu) 02/18/2019 Unsupervised Learning of Narrative Schemas and their Participants Nathanael Chambers and Dan Jurafsky ACL, 2009 Halley Young (halleyy@seas.upenn.edu) 02/18/2019

Problem & Motivation Problem: Discover narrative schemas in an unsupervised manner Narrative schemas are partial orderings of a set of events which all share one of a finite, small set of arguments, or roles Text 1: “The police searched the suspect before they arrested him. He plead innocent and the jury acquitted him” … Text N: “The cops searched the suspect. He plead guilty, and the jury convicted him and sentenced him to 3 years in prison.”

Why do we care? Knowing such schema can help with coreference resolution and sense disambiguation recent (2018) paper uses for improving scores on Winograd schema Generative models that produce entire narratives should know to generate temporally and thematically appropriate narratives It’s kind of cool to know that a linguistic hypothesis from the 1950s- 70s (Minsky, Filmore) about how meaning arises can be discovered statistically Can be used for summarization (if P always follows Q, maybe only need to say Q) Understanding steps to a process that a robot might carry out

Contents: Previous approaches to this paper This paper’s approach to this problem Results from this paper Approaches since this paper has been published Conclusions and final questions

Previous approaches prior to this paper Mostly hand-crafted “scripts” or “frames” (slightly different formulations of a very similar KR problem) In previous paper by same author, used unsupervised approach, but didn’t learn roles or have event slots with both participants

Contributions of this work Unsupervised learning of what was before generally an exhausting manual task Improvements in accuracy and expressivity over previous methods for creating narrative schema which contained only partial schema (i.e., some subject/object positions unfilled) and did not provide role types

How does it work? Jointly learning chains of events (verbs + argument slot) and roles that fit in argument slots Assume existence of coreference disambiguation, then try to maximize the similarity (by frequency of co-occurrence) of events and roles connected in a narrative schema

How does it work? For a given tuple (f :: verb, g :: argument type), and for an existing chain of (tuple, argument pairs), the score , where similarity is defined by point wise mutual information (proportional to probability of being seen together) chainsim({(arrest, object), (plead, subject)}, (convict, object)) should be high, since in many texts the object of arrest, subject of plead, and object of convict appear together

How does it work? Specific to a given argument, (again measuring frequency of co-occurence, but this time with a given role value instead of simply argument slots. Then chainsim’(C, <f,g>) = Essentially, a verb-event slot fits a narrative schema if there is an actual role type that it is associated with that it shares between many elements of the schema This means that, even though there aren’t a lot of instances of {(arrest, object), (plead, subject), (convict, object)} where the role is played by the police, because there exists a role (the suspect) for which there is a high co-occurrence, (convict, object) fits well with the existing narrative chain

How does it work? Define another function to maximize: N = a set of events and mappings from events to argument Notice that we’re maximizing for a given verb over all possible argument structures β is a hyperparameter that controls whether to include a verb in a narrative schema or start a new one As seen before, (convict, object) fits well into the chain containing {(arrest, object), (plead, subject)}, but if we evaluate “chop” before creating any chains relating to cooking, it makes sense to start a new narrative schema

Example Schema Found

Results and Analysis Compare results of schema generation on a New York Times dataset to the hand-crafted FrameNet schemas Able to easily map 13/20 narrative schemas to FrameNet frames Unclear if the other 7 were poorly chosen or a sign that FrameNet is not wide-coverage 78 total verbs, of which 52 occurred in FrameNet 67% of the 52 were grouped similarly in FrameNet 156 argument slots, out of which 96.8% were grouped correctly relative to the other verbs sharing that slot

Results and Analysis - Cloze Test Task: given human defined narrative chain with one event slot missing, rank position of that event slot on the list of potential new event slots to add to the chain 10.1% improvement using the argument roles

Work since then “NASTEA: Investigating Narrative Schemas through Annotated Entities” by Simonson and Davis, 2016 use document categories and topic modeling Authors of this paper (Jurafsky and Chambers, 2011) recently wrote a new paper which gives richer roles than subject/object (e.g., an “embassy” is the “target” of a “bombing”)

Conclusions Tasks that before we assumed required human annotation may be possible to learn in an unsupervised fashion Joint learning of different types of variables can improve accuracy over creating two distinct learning tasks There are a fairly large number of sets of events with roles which reoccur together with high frequency, given a certain (newspaper) corpus

Questions to think about Moving past 2009 - word vectors? LSTMs? Whatever you think about whether FrameNet has been a success or not, would you assume that a set of frames based solely on verbs that actually co-occur with similar arguments often in practice would be more or less useful? Arguments for more usefulness: relate to the texts the machine will actually see, has the possibility of scaling beyond human annotation Arguments for less usefulness: Consider the frame “Activity_ready_state”

(FrameNet Example) Could this approach be extended to take “to do a u-turn on passport ID cards” as an argument?” Notice the difference between “for the bathtub” and “for him”

Questions? Thank you!