Introduction to “Event Extraction” Jan 18, 2007. What is “Information Extraction” Filling slots in a database from sub-segments of text. As a task: October.

Slides:



Advertisements
Similar presentations
What Did We See? & WikiGIS Chris Pal University of Massachusetts A Talk for Memex Day MSR Redmond, July 19, 2006.
Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Automatic indexing and retrieval of crime-scene photographs Katerina Pastra, Horacio Saggion, Yorick Wilks NLP group, University of Sheffield Scene of.
Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.
Outline Super-quick review of previous talk More on NER by token-tagging –Limitations of HMMs –MEMMs for sequential classification Review of relation extraction.
1 I256: Applied Natural Language Processing Marti Hearst Nov 15, 2006.
Extracting Personal Names from Applying Named Entity Recognition to Informal Text Einat Minkov & Richard C. Wang Language Technologies Institute.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
Chapter 6 Developing Data Models for Business Databases.
Systems Analysis I Data Flow Diagrams
Introduction to Text Mining
Overview of Search Engines
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
ADL Slide 1 December 15, 2009 Evidence-Centered Design and Cisco’s Packet Tracer Simulation-Based Assessment Robert J. Mislevy Professor, Measurement &
School of Engineering and Computer Science Victoria University of Wellington COMP423 Intelligent agents.
Page 1 ISMT E-120 Introduction to Microsoft Access & Relational Databases The Influence of Software and Hardware Technologies on Business Productivity.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Page 1 ISMT E-120 Desktop Applications for Managers Introduction to Microsoft Access.
This chapter is extracted from Sommerville’s slides. Text book chapter
Information Extraction Yunyao Li EECS /SI /29/2006.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Learning To Understand Web Site Update Requests William W. Cohen, Einat Minkov, Anthony Tomasic.
Survey of Semantic Annotation Platforms
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
December 2005CSA3180: Information Extraction I1 CSA2050: Natural Language Processing Information Extraction Named Entities IE Systems MUC Finite State.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Types of Extraction. Wrappers 2 IE from Text 3 AttributeWalmart ProductVendor Product Product NameCHAMP Bluetooth Survival Solar Multi- Function Skybox.
Ling 570 Day 17: Named Entity Recognition Chunking.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Some Work on Information Extraction at IRL Ganesh Ramakrishnan IBM India Research Lab.
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
A Language Independent Method for Question Classification COLING 2004.
1 Relational Databases and SQL. Learning Objectives Understand techniques to model complex accounting phenomena in an E-R diagram Develop E-R diagrams.
Presenter: Shanshan Lu 03/04/2010
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Systems Analysis and Design in a Changing World, Fourth Edition
India Research Lab © Copyright IBM Corporation 2006 Entity Annotation using operations on the Inverted Index Ganesh Ramakrishnan, with Sreeram Balakrishnan.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Data Acquisition. Get all data necessary for the analysis task at hand Some data comes from inside the company –Need to go and talk with various data.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Automatically Labeled Data Generation for Large Scale Event Extraction
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Natural Language Processing (NLP)
(Entity and) Event Extraction CSCI-GA.2591
Social Knowledge Mining
First assignment: due Today
CSE 635 Multimedia Information Retrieval
Automatic Extraction of Hierarchical Relations from Text
CS246: Information Retrieval
WHIRL – Reasoning with IE output
Natural Language Processing (NLP)
Extracting Why Text Segment from Web Based on Grammar-gram
Natural Language Processing (NLP)
Presentation transcript:

Introduction to “Event Extraction” Jan 18, 2007

What is “Information Extraction” Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft.. IE

What is “Information Extraction” Information Extraction = segmentation + classification + clustering + association As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation Also known as “named entity extraction”

What is “Information Extraction” Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation also aka “named entity extraction”

What is “Information Extraction” Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

What is “Information Extraction” Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation NAME TITLE ORGANIZATION Bill Gates CEOMicrosoft Bill Veghte VP Microsoft RichardStallman founder Free Soft.. * * * *

Some things to think about We’ve seen sliding windows, non-sequential token tagging, and sequential token tagging. –Which of these are likely to work best, and when? –Are there other ways to formulate NER as a learning task? –Is there a benefit from using more complex graphical models? What potentially useful information does a linear- chain CRF not capture? –Can you combine sliding windows with a sequential model? Next lecture will survey IE of sets of related entities (e.g., person and his/her affiliation). –How can you formalize that as a learning task? –Some case studies…

ACE: Automatic Content Extraction A case study, or: yet another NIST bake-off

About ACE and The five year mission: “develop technology to extract and characterize meaning in human language”…in newswire text, speech, and images –EDT: Develop NER for: people, organizations, geo-political entities (GPE), location, facility, vehicle, weapon, time, value … plus subtypes (e.g., educational organizations) –RDC: identify relation between entities: located, near, part-whole, membership, citizenship, … –EDC: identify events like interaction, movement, transfer, creation, destruction and their arguments

… and their arguments (entities)

Events, entities and mentions In ACE there is a distinction between an entity—a thing that exists in the Real World—and an entity mention—which is something that exists in the text (a substring). Likewise, and event is something that (will, might, or did) happen in the Real World, and an event mention is some text that refers to that event. –An event mention lives inside a sentence (the “extent”) with a “trigger” (or anchor) –An event mention is defined by its type and subtype (e.g, Life:Marry, Transaction:TransferMoney) and its arguments –Every argument is an entity mention that has been assigned a role.

Events, entities and mentions –An event mention lives inside a sentence (the “extent”) with a “trigger” (or anchor) –An event mention is defined by its type and subtype (e.g, Life:Marry, Transation:TransferMoney) and its arguments –Every argument is an entity mention that has been assigned a role. ITHACA, N.Y. -- The John D. and Catherine T. MacArthur Foundation today (Sept. 20) named Jon Kleinberg, Cornell professor of computer science, among the 25 new MacArthur Fellows -- the so-called "Genius Awards" -- for Dr. Kleinberg will receive $500,000 in no-strings-attached support over the next five years. Jon Kleinberg will receive $500,000 from the MacArthur Foundation over the next five years.

How to find events? The simple approach in Ahn, “Stages of event extraction”: Find all the entities and sentences –Ahn uses ground-truth labels for entities (an ACE thing) –Entities=candidate event arguments; sentences=candidate event extents; these will be classified and paired up Find the event mentions –Classify words as anchors (for event type T:S) or not: 35 classes, mostly None –Classify (event-mention,entity-mention) pairs as arguments (with role R) or not: 36 classes, mostly None Q: Why not just classify entity-mentions by Role? –Classify event mentions by modality, polarity, … –Classify (event-mention i,event-mention j ) pairs as co-referent or not. Treat all of these tasks as separate classification problems POS tag and parse everything, convert parse tree to dependency relations, and use all of these as features

Event Anchors - Features

Event Anchors – Results (MaxEnt and TIMBL)

Argument identification Function of both anchor and entity mention

Event co-reference …using greedy left-to-right clustering where you repeatedly decide “should I link new event mention M new with previous mention M 1, M 2, …. based on Pr(M new co-referent with M j )

Ahn: The punchline

The Webmaster Project: Yet Another Case Study with Einat Minkov (LTI), Anthony Tomasic (ISRI) See IJCAI-2005 paper

Overview and Motivations What’s new : –Adaptive NLP components –Learn to adapt to changes in domain of discourse –Deep analysis in limited but evolving domain Compared to past NLP systems : –Deep analysis in narrow domain (Chat-80, SHRDLU,...) –Shallow analysis in broad domain (POS taggers, NE recognizers, NP-chunkers,...) –Learning used as tool to develop non-adaptive NLP components Details : –Assume DB-backed website, where schema changes over time No other changes allowed (yet) –Interaction: User requests (via NL ) changes in factual content of website (assume update of one tuple) System analyzes request System presents preview page and editable form version of request Key points : –partial correctness is useful –user can verify correctness (vs case for DB queries, q/a,...) => source of training data...something in between...

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib Classification newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... NER database web page templates Update Request Construction preview page user-editable form version of request confirm? LEARNER offline training data User

Outline Training data/corpus –look at feasibility of learning the components that need to be adaptive, using a static corpus Analysis steps: –request type –entity recognition –role-based entity classification –target relation finding –target attribute finding –[request building] Conclusions/summary

Training data User1 User2 User3.... Mike Roborts should be Micheal Roberts in the staff listing, pls fix it. Thanks - W On the staff page, change Mike to Michael in the listing for “Mike Roberts”.

Training data User1 User2 User3.... Add this as Greg Johnson’s phone number: Please add “ ” to greg johnson’s listing on the staff page.

Training data – entity names are made distinct User1 User2 User3.... Add this as Greg Johnson’s phone number: Please add “ ” to fred flintstone’s listing on the staff page. Modification: to make entity-extraction reasonable, remove duplicate entities by replacing them with alternatives (preserving case, typos, etc)

Training data User1 User2 User3.... Request1 Request2 Request3.... message(user 1,req 1) message(user 2,req 1).... message(user 1,req 2) message(user 2,req 2).... message(user 1,req 3) message(user 2,req 3)....

Training data – always test on a novel user? User1 User2 User3.... Request1 Request2 Request3.... message(user 1,req 1) message(user 2,req 1).... message(user 1,req 2) message(user 2,req 2).... message(user 1,req 3) message(user 2,req 3).... test train Simulate a distribution of many users (harder to learn)

Training data – always test on a novel request? User1 User2 User3 Request1 Request2 Request3 message(user 1,req 1) message(user 2,req 1).... message(user 1,req 3) message(user 2,req 3).... test train message(user 1,req 2) message(user 2,req 2).... Simulate a distribution of many requests (much harder to learn) 617 s total + 96 similar ones

Training data – limitations One DB schema, one off-line dataset –May differ from data collected on-line –So, no claims made for tasks where data will be substantially different (i.e., entity recognition) –No claims made about incremental learning/transfer All learning problems considered separate One step of request-building is trivial for the schema considered: –Given entity E and relation R, to which attribute of R does E correspond? –So, we assume this mapping is trivial (general case requires another entity classifier)

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

Entity Extraction Results We assume a fixed set of entity types –no adaptivity needed (unclear if data can be collected) Evaluated: –hand-coded rules ( approx cascaded FST in “Mixup” language) –learned classifiers with standard feature set and also a “tuned” feature set, which Einat tweaked –results are in F1 (harmonic avg of recall and precision) –two learning methods, both based on “token tagging” Conditional Random Fields (CRF) Voted-perception discriminative training for an HMM (VP-HMM)

Entity Extraction Results – v2 (CV on users)

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

Entity Classification Results Entity “roles”: –keyEntity: value used to retrieve a tuple that will be updated (“delete greg’s phone number”) –newEntity: value to be added to database (“William’s new office # is 5307 WH”). –oldEntity: value to be overwritten or deleted (“change mike to Michael in the listing for...”) –irrelevantEntity: not needed to build the request (“please add.... – thanks, William”) Features: closest preceding preposition closest preceding “action verb” (add, change, delete, remove,...) closest preceding word which is a preposition, action verb, or determiner (in “determined” NP) is entity followed by ‘s

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction Reasonable results with “bag of words” features.

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

Request type classification: addTuple, alterValue, deleteTuple, or deleteValue? Can be determined from entity roles, except for deleteTuple and deleteValue. –“Delete the phone # for Scott” vs “Delete the row for Scott” Features : –counts of each entity role –action verbs –nouns in NPs which are (probably) objects of action verb –(optionally) same nouns, tagged with a dictionary Target attributes are similar Comments: Very little data is available Twelve words of schema-specific knowledge: dictionary of terms like phone, extension, room, office,...

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction

Training data User1 User2 User3.... Request1 Request2 Request3.... message(user 1,req 1) message(user 2,req 1).... message(user 1,req 2) message(user 2,req 2).... message(user 1,req 3) message(user 2,req 3)....

Training data – always test on a novel user? User1 User2 User3.... Request1 Request2 Request3.... message(user 1,req 1) message(user 2,req 1).... message(user 1,req 2) message(user 2,req 2).... message(user 1,req 3) message(user 2,req 3).... test train Simulate a distribution of many users (harder to learn)

Training data – always test on a novel request? User1 User2 User3 Request1 Request2 Request3 message(user 1,req 1) message(user 2,req 1).... message(user 1,req 3) message(user 2,req 3).... test train message(user 1,req 2) message(user 2,req 2).... Simulate a distribution of many requests (much harder to learn) 617 s total + 96 similar ones

Other issues: a large pool of users and/or requests usr

Webmaster: the punchline

Conclusions? System architecture allows all schema-dependent knowledge to be learned –Potential to adapt to changes in schema –Data needed for learning can be collected from user Learning appears to be possible on reasonable time-scales –10s or 100s of relevant examples, not thousands –Schema-independent linguistic knowledge is useful F1 is eighties is possible on almost all subtasks. –Counter-examples are rarely changed relations (budget) and distinctions for which little data is available There is substantial redundancy in different subtasks –Opportunity for learning suites of probabilistic classifiers, etc Even an imperfect IE system can be useful…. –With the right interface…

POS tags NP chunks words,... features entity1, entity2,.... msg Shallow NLPFeature Building C C C C requestType targetRelation targetAttrib newEntity1,... oldEntity1,... keyEntity1,... otherEntity1,... Information Extraction micro- form query DB

Webmaster: the Epilog (VIO) Faster for request-submitter Zero time for webmaster Zero latency More reliable (!) Tomasic et al, IUI 2006 Entity F1 ~= 84, Micro-form selection accuracy =~ 80 Used UI for experiments on real people (human-human, human-VIO)

Conclusions and comments Two case studies of non-trivial IE pipelines illustrate: –In any pipeline, errors propogate –What’s the right way of training components in a pipeline? Independently? How can (and when should) we make decisions using some flavor of joint inference? Some practical questions for pipeline components: –What’s downstream? What do errors cost? –Often we can’t see the end of the pipeline… How robust is the method ? –new users, new newswire sources, new upsteam components… –Do different learning methods/feature sets differ in robustness? Some concrete questions for learning relations between entities: –(When) is classifying pairs of things the right approach? How do you represent pairs of objects? How to you represent structure, like dependency parses? Kernels? Special features?