Information Extraction Niranjan Balasubramanian Many Slides From: Rion Snow, Luke Zettlemoyer, Mausam, Raphael Hoffman, Alan Ritter.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Distant Supervision for Relation Extraction without Labeled Data CSE 5539.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
Beyond TREC-QA Ling573 NLP Systems and Applications May 28, 2013.
Coupled Semi-Supervised Learning for Information Extraction Carlson et al. Proceedings of WSDM 2010.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011.
Ang Sun Ralph Grishman Wei Xu Bonan Min November 15, 2011 TAC 2011 Workshop Gaithersburg, Maryland USA.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Tues 4-5 TA: Yves Petinot 719 CEPSR,
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Introduction to Machine Learning Approach Lecture 5.
Modeling Missing Data in Distant Supervision for Information Extraction Alan Ritter Luke Zettlemoyer Mausam Oren Etzioni 1.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
SI485i : NLP Set 13 Information Extraction. 2 “Yesterday GM released third quarter results showing a 10% in profit over the same period last year. “John.
Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.
Lecture 13 Information Extraction Topics Name Entity Recognition Relation detection Temporal and Event Processing Template Filling Readings: Chapter 22.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Bootstrapping Information Extraction with Unlabeled Data Rayid Ghani Accenture Technology Labs Rosie Jones Carnegie Mellon University & Overture (With.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Ang Sun Director of Research, Principal Scientist, inome
Rules, Movement, Ambiguity
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Supertagging CMSC Natural Language Processing January 31, 2006.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Relation Extraction Using patterns to extract relations.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Relation Extraction: Rule-based Approaches CSCI-GA.2590 Ralph Grishman NYU.
Learning to Extract CSCI-GA.2590 Ralph Grishman NYU.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Natural Language Processing Information Extraction Jim Martin (slightly modified by Jason Baldridge)
Automatically Labeled Data Generation for Large Scale Event Extraction
A Brief Introduction to Distant Supervision
Relation Extraction CSCI-GA.2591
Distant supervision for relation extraction without labeled data
CSCE 590 Web Scraping – Information Retrieval
Social Knowledge Mining
Natural Language - General
Introduction Task: extracting relational facts from text
Lecture 13 Information Extraction
CS246: Information Retrieval
PolyAnalyst Web Report Training
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Presentation transcript:

Information Extraction Niranjan Balasubramanian Many Slides From: Rion Snow, Luke Zettlemoyer, Mausam, Raphael Hoffman, Alan Ritter

What is Information Extraction? Documents Un-structured (semi-structured) Structured Databases (aka Knowledge Bases)

Why is it useful? Clear factual information is helpful –Answer questions. –Analytics. Organize and present information –Info boxes in Wikipedia Obtain new knowledge via inference. –Works-for(x, y) AND located-in(y, z)  lives-in(x, z)

Information Extraction Tasks Entity Recognition Relation Extraction Event Extraction

Named Entity Recognition At the heart of the Giuliani-led critique of the president’s patriotism is the suggestion that Barack Obama has never expressed love for the United States. Rudolph W. Giuliani, the former mayor of New York City, has even challenged the media to find examples of Mr. Obama expressing such affection. Has the president done so? Yes, he has. A review of his public remarks provides multiple examples. In 2008, when he was still a presidential candidate, Mr. Obama uttered the magic words in Berlin, during a speech to thousands. Mr. Obama used a similar construction, as president, in 2011, during a town hall meeting in Illinois, when he recalled “why I love this country so much.” Mr. Giuliani told Fox News that “I don’t hear from him what I heard from Harry Truman, what I heard from Bill Clinton, what I heard from Jimmy Carter, which is these wonderful words about what a great country we are, what an exceptional country we are.”

Relation Extraction Located-in(Person, Place) He was in Tennessee Subsidiary(Organization, Organization) XYZ, the parent company of ABC Related-to(Person, Person) John’s wife Yoko Founder(Person, Organization) Steve Jobs, co-founder of Apple...

Event Extraction

Relation Extraction

Outline What is relation extraction? Why is it hard? How is it done? What are the big challenges?

Types of Relations News Medical Geographical Lexical

News Domain ROLE: relates a person to an organization or a geopolitical entity –subtypes: member, owner, affiliate, client, citizen PART: generalized containment –subtypes: subsidiary, physical part-of, set membership AT: permanent and transient locations –subtypes: located, based-in, residence SOCIAL: social relations among persons –subtypes: parent, sibling, spouse, grandparent, associate

Freebase Relations Thousands of relations and millions of instances! Manually created from multiple sources including Wikipedia InfoBoxes

Geographical Relations

Lexical Relations Synonym Antonym Hyponym Hypernym Meronym Similar to … WordNet – A lexical resource. Specifies relationships between words.

Medical Relations UMLS Resource

Why is relation extraction difficult? Linguistic variability President Barack Obama Barack Obama, president of the United States, … President of the United States, Mr. Obama, … Entity Ambiguity Apple produces seeds vs. Apple produces iPhones.

Why is relation extraction difficult? Implicit Relations Obama met with Putin in Moscow => Obama traveled to Moscow. Complex language with many clauses, long list of qualifiers, negations etc. Pentoxifylline (PTX) affects many processes that may contribute to the pathogenesis of severe malaria and it has been shown to reduce the duration of coma in children with cerebral malaria.

How is relation extraction done? Pattern-based + Bootstrapping Supervised Relation Extraction Distantly Supervised Relation Extraction Open Information Extraction

Pattern-based Extraction Inspect sentences that express relation. Write lexical patterns that suggest relation.

Pattern-based IS-A relations Suppose you want to find IS-A relations. You can look for sentences that contain: x is a y. It is an ok start but you can do better. Inspect some sentences. Agar is a substance prepared from a mixture of red algae, such as Gelidium. This includes temples, treasuries, and other important civic buildings. Insurance does not cover bruises, wounds, broken bones or other injuries. The bow lute, such as the Bambara ndang, are widely used here.

Hearst Hyponym Patterns 66% accurate. What about coverage?

Meronym (part-whole) patterns Berland and Charniak patterns Find all sentences in a corpus containing basement and building Sentence FragmentPattern... building’s basement …whole NN[-PL] ’s POS part NN[-PL] …basement of building…parts NN-PL of PREP wholes NN-PL …basement in building…parts NN-PL in PREP wholes NN-PL …basement in the big building…part NN in PREP {the|a} DET mods [JJ|NN]* whole NN …basements of a building…part NN[-PL] of PREP {the|a} DET mods [JJ|NN]* whole NN For each pattern: 1. Find occurrences of the pattern 2. Filter those ending with -ing, -ness, -ity[Why?] 3. Applied a likelihood metric. First two are reliable patterns. The rest are noisy in practice. (~ 55% accuracy)

Bootstrapping for Relation Extraction: Automate Pattern Extraction Take some seed relations. e.g., Buried-in(Mark Twain, Elmira) Find some sentences that contain the seed entities and extract patterns. Mark Twain is buried in Elmira, NY. → X is buried in Y The grave of Mark Twain is in Elmira → The grave of X is in Y Elmira is Mark Twain’s final resting place →Y is X’s final resting place Use these patterns to extract new relations. The grave of Bruce Lee is in Seattle. →Buried-in(Bruce Lee, Seattle)

Authors of Books: DIPRE [Brin 1998] Extract (author, book) pairs Start with these 5 seeds: Learn patterns Iterate: Use patterns to get more instances Use instances to get more patterns Extracted 15,000 author-book pairs with 95% accuracy with just three iterations.

Snowball: Improved Bootstrapping [Agichten and Gravano, 2000] Add constraints on X and Y e.g. has to be named entities. Add heuristics to score extractions, select best ones at each iteration.

Issues with Bootstrapping Requires seeds for each relation. –Sensitive to original set of seeds. Semantic drift at each iteration. –Some patterns may extract noisy or different relations. e.g. US Presidents “presidents such as...”  Company presidents Precision tends to be not that high No probabilistic interpretation –Hard to know how confident to be in each result

How is relation extraction done? Pattern-based + Bootstrapping Supervised Relation Extraction Distantly Supervised Relation Extraction Open Information Extraction Event Extraction

Supervised Relation Extraction [Zhou et al, 2005] Define the relation vocabulary i.e., the relations you want. –Relation detection: true/false –Relation classification: located-in, employee-of, inventor-of, … Collect labeled training data. –MUC, ACE,... Define a feature representation. –words, entity types,... Build a classifier. –Nai ̈ ve Bayes, MaxEnt, SVM, Evaluate.

ACE 2008 Relations

Features Light-weight –BOW, bigrams between, before and after –Stemmed versions –Entity types –Distance between entities Medium-weight –Base-phrase chunk paths –Head words of chunks Heavy-weight –Dependency, constituency tree paths –Tree distance –Patterns over trees

Features Example American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said. employer(American Airlines, Tim Wagner) Bag-of-words features WM1 = {American, Airlines}, WM2 = {Tim, Wagner} Head-word features HM1 = Airlines, HM2 = Wagner, HM12 = Airlines+Wagner Words in between WB-NULL = false, WBF-L = NULL, WBF = a, WBL = spokesman, WBO = {unit, of, AMR, immediately, matched, the, move} Words before and after BM1F = NULL, BM1L = NULL, AM2F = said, AM2L = NULL Good precision (69%) but poor recall (24%).

Features: Base phrase Chunking [NP American Airlines], [NP a unit] [PP of] [NP AMR], [ADVP immediately] [VP matched] [NP the move], [NP spokesman Tim Wagner] [VP said]. Phrase heads before and after CPHBM1F = NULL, CPHBM1L = NULL, CPHAM2F = said, CPHAM2L = NULL Phrase heads in between CPHBNULL = false, CPHBFL = NULL, CPHBF = unit, CPHBL = move CPHBO = {of, AMR, immediately, matched} Phrase label paths CPP = [NP, PP, NP, ADVP, VP, NP] CPPH = NULL [A way to generalize!] Increased both precision & recall by 4-6%

Features: Syntactic Parse Mention dependencies ET1DW1 = ORG:Airlines H1DW1 = matched:Airlines ET2DW2 = PER:Wagner H2DW2 = said:Wagner Entity types and dependency tree ET12SameNP = ORG-PER-false ET12SamePP = ORG-PER-false ET12SameVP = ORG-PER-false Minor gain in terms of results. Why? 1) Many relations are local. 2) Parse features are useful for long distance connections but parsers fail on long sentences.

Evaluation Relation detection performance is reasonable. Relation classification is decent but not great. Engineering features is better than letting ML figure out features (for this task).

How is relation extraction done? Pattern-based + Bootstrapping Supervised Relation Extraction Distantly Supervised Relation Extraction Open Information Extraction Event Extraction

Distant Supervision Motivated by lack of training data. –Bootstrapping gets some additional “training” data at each round. Use a large database to get huge of seed relations. Find sentences that express these seed relations. –Assume that any sentence that contains both entities in relation is expressing the relation. Train a supervised classifier

Hypernyms via Distant Supervision [Snow 2005]

Lexico-syntactic Dependency Patterns (Shakespeare, author)“Shakespeare was the author of several plays...” Extract shortest path on the tree, Path is a ordered list of edge tuples: Tuple = Entities generalized into POS category.

Evaluation

Distant Supervision for Freebase Relations [Mintz 2009] Premise: ACE relations are only a handful. At large scale, training data is hard to create. Large-ish databases of relations are available. Use them.

Distant Supervision for Freebase Relations

Training: For each relation in Freebase: Find sentences that contain the entities in the relation. Extract features from the sentence Aggregate features, append relation name and we have an instance. Learn a classifier over the training instances. Relation Extractor: For all sentences that contain any pair of (named) entities: Extract features. For every unique pair of entities: Aggregate features and make a prediction.

Partial Information from Multiple Sentences [Steven Spielberg]’s film [Saving Private Ryan] is loosely based on … … Award winning [Saving Private Ryan], directed by [Steven Spielberg]... Evidence for [Saving Private Ryan] as a film. Ambiguous evidence for Spielberg as a director. Ambiguous evidence for [Steven Spielberg] as a director. Could be a CEO. No evidence for [Saving Private Ryan] as a film.

Negative Training Data? If you only had positive data, you have to assume that anything that is not in your data is negative. If we only have positive data, how do we know which features are bad? Suppose you saw this sentence: Google is Bill Gates' worst fear said its CEO. And learnt this pattern: Y is X’s worst fear => CEO-of(X, Y) Solution? Sample 1% of unrelated pairs of entities.

Features ‘Astronomer Edwin Hubble was born in Marshfield, Missouri’

Evaluation Evaluation: Select the top 102 relations with most entries. Use half for training and other half for testing. Combining syntax and surface helps over either.

Top Weighted Features

Issues with Distant Supervision False positives –Some entities may have multiple relations. lives-in(Obama, Washington DC) works-in(Obama, Washington DC) –Presence of entities alone doesn’t guarantee relation is expressed “Microsoft is catching up.”, Bill Gates said. False negatives –Knowledge bases are incomplete. –System may correctly predict a relation which currently is not in KB.

Multi-R [Hoffman et al, 2011] Addresses the overlapping relations problem.

Missing Data [Ritter et al, 2013] Addresses incomplete KB by treating facts as soft constraints.

Multi-R + Missing Data

Open Information Extraction

Relation Extraction Bottlenecks Traditional relation extraction assumes a relation vocabulary. –Need to anticipate the knowledge needs in advance. Typically need a few seed examples or instances per relation. –Distant supervision mitigates this somewhat but still assumes the relations are given via a database. Doesn’t easily scale to large collections such as the web. –Need to run each relation classifier on each sentence!

Open Information Extraction Identify relation phrases directly from text. Avoids lexical patterns. –Extractors are specified via POS tags and closed-class words. Focus on generic ways in which relations are expressed. –Not domain specific.

Text Runner

Two Issues Incoherent Extractions Uninformative Extractions

Relation Frequency A large proportion of relations appear in a handful of ways. Let’s focus on how relation phrases are specified!

ReVerb: Relation Extraction from Verbs

ReVerb 1) Use syntactic constraints to specify relation phrases. Three simple patterns: Find longest phrase matching one of the syntactic constraints. 2) Find nearest noun-phrases to the left and right of relation phrase. - Not a relative pronoun or WHO-adverb or an existential there.

ReVerb Lexical constraints:

How good is ReVerb?

Key Issues Argument detection heuristic not adequate. Lexical constraint too restrictive.

Arg Learning [Christensen et al, 2012] Arg1 Structure

Arg Learning Arg2 Structure

Arg Learner using CRFs 1) Build three classifiers 2) Each with its own feature set based on the syntactic analysis --

Arg Learner

Ollie: Bootstrapping from ReVerb [Mausam et al., 2013]

Ollie: Bootstrapping from ReVerb

Supervised Learning of Patterns Features – Frequency of pattern in training set – Lexical/POS features – Length/coverage features – …

Issues with Open IE Semantics? –Not tied to any ontology. Can’t assign specific meaning. –Tie relation phrases back to an existing ontology [Soderland, 2012] –Learn inference rules over Open IE relations directly! Redundancy –Many distinct relation phrases convey the same conceptual “relation” –Solution: Cluster relations

Summary Relation extraction aims to identify relations between entities. –Used primarily to construct knowledge-bases. –Can be used for QA as well. A well studied task. Many approaches ranging from: –Hand-built patterns + Bootstrapping. –Supervised Learning –Distant supervision –Open Information Extraction