Relation Extraction Slides from Dan Jurafsky, Rion Snow, Jim Martin, Chris Manning and William Cohen.

Slides:



Advertisements
Similar presentations
Distant Supervision for Relation Extraction without Labeled Data CSE 5539.
Advertisements

Information Extraction Lecture 7 – Relation Extraction CIS, LMU München Winter Semester Dr. Alexander Fraser.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Information Extraction Shallow Processing Techniques for NLP Ling570 December 5, 2011.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Building a Semantic Taxonomy Using this classifier we may now extend and construct semantic taxonomies. We assume that the semantic taxonomy is a directed.
Information Extraction and Ontology Learning Guided by Web Directory Authors:Martin Kavalec Vojtěch Svátek Presenter: Mark Vickers.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Tues 4-5 TA: Yves Petinot 719 CEPSR,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Introduction to Text Mining
What is relation extraction?
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
School of Engineering and Computer Science Victoria University of Wellington COMP423 Intelligent agents.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Get Another Label? Using Multiple, Noisy Labelers Joint work with Victor Sheng and Foster Provost Panos Ipeirotis Stern School of Business New York University.
1 Modeling Query-Based Access to Text Databases Eugene Agichtein Panagiotis Ipeirotis Luis Gravano Computer Science Department Columbia University.
ISMB 2003 presentation Extracting Synonymous Gene and Protein Terms from Biological Literature Hong Yu and Eugene Agichtein Dept. Computer Science, Columbia.
Information Extraction Yunyao Li EECS /SI /29/2006.
Learning Thesauruses and Knowledge Bases
SI485i : NLP Set 13 Information Extraction. 2 “Yesterday GM released third quarter results showing a 10% in profit over the same period last year. “John.
Learning Object Metadata Mining Masoud Makrehchi Supervisor: Prof. Mohamed Kamel.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Named Entity Tagging Thanks to Dan Jurafsky, Jim Martin, Ray Mooney, Tom Mitchell for slides.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Some Work on Information Extraction at IRL Ganesh Ramakrishnan IBM India Research Lab.
Querying Text Databases for Efficient Information Extraction Eugene Agichtein Luis Gravano Columbia University.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Lecture 13 Information Extraction Topics Name Entity Recognition Relation detection Temporal and Event Processing Template Filling Readings: Chapter 22.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Lecture 14 Relation Extraction
India Research Lab © Copyright IBM Corporation 2006 Entity Annotation using operations on the Inverted Index Ganesh Ramakrishnan, with Sreeram Balakrishnan.
Facilitating Document Annotation using Content and Querying Value.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
Information Extraction Lecture 3 – Rule-based Named Entity Recognition Dr. Alexander Fraser, U. Munich September 3rd, 2014 ISSALE: University of Colombo.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Relation Extraction Using patterns to extract relations.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Facilitating Document Annotation Using Content and Querying Value.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Natural Language Processing Information Extraction Jim Martin (slightly modified by Jason Baldridge)
Automatically Labeled Data Generation for Large Scale Event Extraction
A Brief Introduction to Distant Supervision
Text Based Information Retrieval
Relation Extraction CSCI-GA.2591
Distant supervision for relation extraction without labeled data
CSCE 590 Web Scraping – Information Retrieval
CSCI 5832 Natural Language Processing
Social Knowledge Mining
Introduction Task: extracting relational facts from text
Lecture 13 Information Extraction
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Information Retrieval
Presentation transcript:

Relation Extraction Slides from Dan Jurafsky, Rion Snow, Jim Martin, Chris Manning and William Cohen

Background: Information Extraction Extract information from text Extract information from text  Sometimes called text analytics commercially  Extract entities (the people, organizations, locations, times, dates, genes, diseases, medicines, etc. in a text)  Extract the relations between entities  Figure out the larger events that are taking place

Information Extraction Creating knowledge bases and ontologies Creating knowledge bases and ontologies Implications for cognitive modeling Implications for cognitive modeling Digital Libaries Digital Libaries  Google scholar, Citeseer need to extract the title, author and references Bioinformatics Bioinformatics Patent analysis Patent analysis Specific market segments for stock analysis Specific market segments for stock analysis SEC filings SEC filings Intelligence analysis Intelligence analysis

Paradigms: Data Mining vs Text Mining Data Mining is a set of techniques that aims to extract knowledge by big data analysis, e.g. from databases, using simple patterns and statistical methods Data Mining is a set of techniques that aims to extract knowledge by big data analysis, e.g. from databases, using simple patterns and statistical methods In contrast, the Text Mining's goal (or Machine Reading), In contrast, the Text Mining's goal (or Machine Reading), is to extract relationship between entities that just existing in examinated text, semantically deeper than data mining does is to extract relationship between entities that just existing in examinated text, semantically deeper than data mining does The difference is in the source information: The difference is in the source information:  relationship in text just exist in our source,  we don't need to try to guess it but only discover it!

Outline Reminder: Named Entity Tagging Reminder: Named Entity Tagging Relation Extraction Relation Extraction  Hand-built patterns  Seed (bootstrap) methods  Supervised classification  Distant supervision

What is “Information Extraction” Slide from William Cohen Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION

What is “Information Extraction” Slide from William Cohen Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft.. IE

What is “Information Extraction” Slide from William Cohen Information Extraction = segmentation + classification + clustering + association As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation “named entity extraction”

What is “Information Extraction” Slide from William Cohen Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

What is “Information Extraction” Slide from William Cohen Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

What is “Information Extraction” Slide from William Cohen Information Extraction = segmentation + classification + association + clustering As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VPMicrosoft Richard Stallmanfounder Free Soft.. * * * *

Extracting Structured Knowledge LLNL EQ Lawrence Livermore National Laboratory LLNL LOC-IN California Livermore LOC-IN California LLNL IS-A scientific research laboratory LLNL FOUNDED-BY University of California LLNL FOUNDED-IN 1952 Each article can contain hundreds or thousands of items of knowledge... “The Lawrence Livermore National Laboratory (LLNL) in Livermore, California is a scientific research laboratory founded by the University of California in 1952.”

Goal: Machine-readable summaries SubjectRelationObject p53is_aprotein Baxis_aprotein p53has_functionapoptosis Baxhas_functioninduction apoptosisinvolved_incell_death Baxis_in mitochondrial outer membrane Baxis_incytoplasm apoptosisrelated_to caspase activation... Textual abstract: Summary for human Structured knowledge extraction: Summary for machine

From Unstructured Text to Structured Knowledge Unstructured Text News articles... slide from Rion Snow

From Unstructured Text to Structured Knowledge Unstructured Text Blog posts.... slide from Rion Snow

From Unstructured Text to Structured Knowledge Unstructured Text Scientific journal articles... slide from Rion Snow

From Unstructured Text to Structured Knowledge Unstructured Text Tweets, instant messages, chat logs... slide from Rion Snow

From Unstructured Text to Structured Knowledge Unstructured Text slide from Rion Snow

From Unstructured Text to Structured Knowledge Structured Knowledge Unstructured Text slide from Rion Snow

From Unstructured Text to Structured Knowledge Structured Knowledge Unstructured Text slide from Rion Snow

From Unstructured Text to Structured Knowledge Structured Knowledge Unstructured Text slide from Rion Snow

From Unstructured Text to Structured Knowledge Structured Knowledge Unstructured Text slide from Rion Snow

From Unstructured Text to Structured Knowledge Structured Knowledge Unstructured Text slide from Rion Snow

From Unstructured Text to Structured Knowledge Structured Knowledge Unstructured Text slide from Rion Snow

Relation Extraction What is relation extraction? What is relation extraction? Founded in 1801 as South Carolina College, USC is the flagship institution of the University of South Carolina System and offers more than 350 programs of study leading to bachelor's, master's, and doctoral degrees from fourteen degree-granting colleges and schools to an enrollment of approximately 45,251 students, 30,967 on the main Columbia campus. … [wiki] Founded in 1801 as South Carolina College, USC is the flagship institution of the University of South Carolina System and offers more than 350 programs of study leading to bachelor's, master's, and doctoral degrees from fourteen degree-granting colleges and schools to an enrollment of approximately 45,251 students, 30,967 on the main Columbia campus. … [wiki]flagshipUniversity of South Carolina Systembachelor's master'sdoctoralflagshipUniversity of South Carolina Systembachelor's master'sdoctoral complex relation = summarization complex relation = summarization focus on binary relation predicate(subject, object) or triples focus on binary relation predicate(subject, object) or triples

Wiki Info Box – structured data template template standard things about Universities standard things about Universities Established Established type type faculty faculty students students location location mascot mascot

Focus on extracting binary relations predicate(subject, object) from predicate logic predicate(subject, object) from predicate logic triples triples Directed graphs Directed graphs

Why relation extraction? create new structured KB create new structured KB Augmenting existing: words -> wordnet, facts -> FreeBase or DBPedia Augmenting existing: words -> wordnet, facts -> FreeBase or DBPedia Support question answering: Jeopardy Support question answering: Jeopardy Which relations Which relations Automated Content Extraction (ACE) Automated Content Extraction (ACE) 17 relations 17 relations ACE examples ACE examples

Unified Medical Language System (UMLS) UMLS: Unified Medical 134 entities, 54 relations UMLS: Unified Medical 134 entities, 54 relations

UMLS semantic network

Current Relations in the UMLS Semantic Network isa associated_with physically_related_to part_of consists_of contains connected_to interconnects branch_of tributary_of ingredient_of spatially_related_to location_of adjacent_to surrounds traverses functionally_related_to affects … isa associated_with physically_related_to part_of consists_of contains connected_to interconnects branch_of tributary_of ingredient_of spatially_related_to location_of adjacent_to surrounds traverses functionally_related_to affects … … temporally_related_to co-occurs_with precedes temporally_related_to co-occurs_with precedes conceptually_related_to conceptually_related_to evaluation_of degree_of analyzes assesses_effect_of measurement_of measures diagnoses property_of derivative_of developmental_form_of method_of … evaluation_of degree_of analyzes assesses_effect_of measurement_of measures diagnoses property_of derivative_of developmental_form_of method_of …

Databases of Wikipedia Relations DBpedia is a crowd-sourced community effort DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia to extract structured information from Wikipedia and to make this information readily available and to make this information readily available DBpedia allows you to make sophisticated queries DBpedia allows you to make sophisticated queries

English version of the DBpedia knowledge base 3.77 million things 3.77 million things 2.35 million are classified in an ontology 2.35 million are classified in an ontology including: including: including 764,000 persons, 573,000 places (including 387,000 populated places), 333,000 creative works (including 112,000 music albums, 72,000 films and 18,000 video games), 192,000 organizations (including 45,000 companies and 42,000 educational institutions), 202,000 species and 5,500 diseases.

freebase google (freebase wiki) google (freebase wiki)

Ontological relations Ontological relations Ontological relations IS-A hypernym IS-A hypernym Instance-of Instance-of has-Part has-Part hyponym (opposite of hypernym) hyponym (opposite of hypernym)

Reminder: Task 1: Named Entity Tagging Slide from Chris Manning General NER or Biomedical NER John Hennessy is a professor at Stanford University, in Palo Alto. TAR independent transactivation by Tat in cells derived from the CNS - a novel mechanism of HIV-1 gene regulation.

Reminder: Maximum Entropy Markov Model Slide from Chris Manning DNA O of HIV−1 gene regulation O

Task II: Relation Extraction

Relations between words Language Understanding Applications needs word meaning! Language Understanding Applications needs word meaning!  Question answering  Conversational agents  Summarization One key meaning component: word relations One key meaning component: word relations  Hierarchical (ontological) relations “San Francisco” ISA “city”  Other relations between words “alternator” is a part of a “car”

Relation Prediction How can we capture the variability of expression of a relation in natural text from a large, unannotated corpus? “...works by such authors as Herrick, Goldsmith, and Shakespeare.” “If you consider authors like Shakespeare...” “Shakespeare, author of The Tempest...” “Some authors (including Shakespeare)...” “Shakespeare was the author of several...” Shakespeare IS-A author (0.87)

Hyponymy One sense is a hyponym of another if the first sense is more specific, denoting a subclass of the other One sense is a hyponym of another if the first sense is more specific, denoting a subclass of the other  car is a hyponym of vehicle  dog is a hyponym of animal  mango is a hyponym of fruit Conversely Conversely  vehicle is a hypernym/superordinate of car  animal is a hypernym of dog  fruit is a hypernym of mango superordinatevehiclefruitfurnituremammal hyponymcarmangochairdog

42 X is-a-part-of Y (meronym / holonym) X is-a-kind-of Y (hyponym / hypernym) WordNet relations

WordNet is incomplete In WordNet 2.1Not in WordNet “insulin” “progesterone” “leptin” “pregnenolone” “combustibility” “navigability “affordability” “reusability” “HTML”“XML” “Google”, “Yahoo”“Microsoft”, “IBM” Ontological relations are missing for many words Especially true for specific domains (restaurants, auto parts, finance)

Other kinds of Relations: Disease Outbreaks Extract structured information from text Extract structured information from text Slide from Eugene Agichtein May , Atlanta -- The Centers for Disease Control and Prevention, which is in the front line of the world's response to the deadly Ebola epidemic in Zaire, is finding itself hard pressed to cope with the crisis… DateDisease NameLocation Jan. 1995MalariaEthiopia July 1995Mad Cow DiseaseU.K. Feb. 1995PneumoniaU.S. May 1995EbolaZaire Information Extraction System (e.g., NYU’s Proteus) Disease Outbreaks in The New York Times

More relations: Protein Interactions „We show that CBF-A and CBF-C interact with each other to form a CBF-A-CBF-C complex and that CBF-B does not interact with CBF-A or CBF-C individually but that it associates with the CBF-A-CBF-C complex.“ CBF-A CBF-C CBF-B CBF-A-CBF-C complex interact complex associates Slide from Rosario and Hearst

Yet More Relations CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday it has increased fares by $6 per round trip on flights to some cities also served by lower-cost carriers. American Airlines, a unit AMR, immediately matched the move, spokesman Tim Wagner said. United, a unit of UAL, said the increase took effect Thursday night and applies to most routes where it competes against discount carriers, such as Chicago to Dallas and Atlanta and Denver to San Francisco, Los Angeles and New York Slide from Jim Martin

Relation Types For generic news texts... For generic news texts... Slide from Jim Martin

More relations: UMLS Unified Medical Language System Unified Medical Language System  integrates linguistic, terminological and semantic information  Semantic Network consists of 134 semantic types and 54 relations between types Pharmacologic Substance affects Pathologic Function Pharmacologic Substance causes Pathologic Function Pharmacologic Substance complicatesPathologic Function Pharmacologic Substance diagnoses Pathologic Function Pharmacologic Substance prevents Pathologic Function Pharmacologic Substance treats Pathologic Function Slide from Paul Buitelaar

Relations in Ontologies: GO (Gene Ontology) GO (Gene Ontology) GO (Gene Ontology)  Aligns descriptions of gene products in different databases, including plant, animal and microbial genomes  Organizing principles are molecular function, biological process and cellular component Accession:GO: Ontology:biological process Synonyms:broad: genetic exchange Definition:In the absence of a sexual life cycle, the processes involved in the introduction of genetic information to create a genetically different individual. Term Lineageall : all (164142) GO: : biological process (115947) GO: : development (11892) GO: : genetic transfer (69) Slide from Paul Buitelaar

Relations in Ontologies: geographical OntologyF-Logic similar city Neckar Zugspitze Geographical Entity (GE) Natural GE Inhabited GE country rivermountain instance_of Germany BerlinStuttgart is-a flow_through located_in capital_of flow_through located_in capital_of 367 length (km) 2962 height (m) Design: Philipp Cimiano Slide from Paul Buitelaar

MeSH (Medical Subject Headings) Thesaurus 51 MeSH Descriptor Definition Synonym set Slide from Illhoi Yoo, Xiaohua (Tony) Hu,and Il-Yeol Song

MeSH Tree MeSH Ontology MeSH Ontology  Hierarchically arranged from most general to most specific.  Actually a graph rather than a tree normally appear in more than one place in the tree MeSH Tree Slide from Yoo, Hu, Song

Types of ACE Relations, 2003 ROLE - relates a person to an organization or a geopolitical entity ROLE - relates a person to an organization or a geopolitical entity  Subtypes: member, owner, affiliate, client, citizen PART - generalized containment PART - generalized containment  Subtypes: subsidiary, physical part-of, set membership AT - permanent and transient locations AT - permanent and transient locations  Subtypes: located, based-in, residence SOCIAL- social relations among persons SOCIAL- social relations among persons  Subtypes: parent, sibling, spouse, grandparent, associate Slide from Doug Appelt

Frequent Freebase Relations a

Predicting the “is-a” relation How can we capture the variability of expression of a relation in natural text from a large, unannotated corpus? “...works by such authors as Herrick, Goldsmith, and Shakespeare.” “If you consider authors like Shakespeare...” “Shakespeare, author of The Tempest...” “Some authors (including Shakespeare)...” “Shakespeare was the author of several...” Shakespeare IS-A author (0.87)

Why this is hard: Ambiguity! TreatmentDisease Cure? Prevent? Side Effect? Which relations hold between 2 entities?

Different relations between Disease (Hepatitis) and Treatment Cure Cure  These results suggest that con A-induced hepatitis was ameliorated by pretreatment with TJ-135. Prevent Prevent  A two-dose combined hepatitis A and B vaccine would facilitate immunization programs Vague Vague  Effect of interferon on hepatitis B Slide from B. Rosario and M. Hearst

5 easy methods for relation extraction 1. Hand-built patterns 2. Bootstrapping (seed) methods 3. Supervised methods 4. Unsupervised methods 5. Distant supervision

5 easy methods for relation extraction 1. Hand-built patterns 2. Supervised methods 3. Bootstrapping (seed) methods 4. Unsupervised methods 5. Distant supervision

A complex hand-built extraction rule [NYU Proteus]

Goal: Add hyponyms to WordNet directly from text. Intuition from Hearst (1992) Intuition from Hearst (1992)  “ Agar is a substance prepared from a mixture of red algae, such as Gelidium, for laboratory or industrial use” What does Gelidium mean? What does Gelidium mean? How do you know? How do you know?

Goal: Add hyponyms to WordNet directly from text. Intuition from Hearst (1992) Intuition from Hearst (1992)  “ Agar is a substance prepared from a mixture of red algae, such as Gelidium, for laboratory or industrial use” What does Gelidium mean? What does Gelidium mean? How do you know? How do you know?

Hearst’s Hand-Designed Lexico-Syntactic Patterns (Hearst, 1992): Automatic Acquisition of Hyponyms “Y such as X ((, X)* (, and/or) X)” “such Y as X…” “X… or other Y” “X… and other Y” “Y including X…” “Y, especially X…”

Hearst’s hand-built patterns for Relation Extraction Hearst patternExample occurrences X and other Y...temples, treasuries, and other important civic buildings. X or other YBruises, wounds, broken bones or other injuries... Y such as XThe bow lute, such as the Bambara ndang... Such Y as X...such authors as Herrick, Goldsmith, and Shakespeare. Y including X...common-law countries, including Canada and England... Y, especially XEuropean countries, especially France, England, and Spain...

Problem with hand-built patterns Requires that we hand-build patterns for each relation! Requires that we hand-build patterns for each relation!  don’t want to have to do this for all possible relations!  we’d like better a ccuracy

5 easy methods for relation extraction 1. Hand-built patterns 2. Supervised methods 3. Bootstrapping (seed) methods 4. Unsupervised methods

2. Supervised Relation Extraction Sometimes done in 3 steps Sometimes done in 3 steps 1.Find all pairs of named entities 2.Decide if 2 entities are related 3.If yes, classifying the relation Why the extra step? Why the extra step?  Cuts down on training time for classification by eliminating most pairs  Producing separate feature-sets that are appropriate for each task.

Relation Analysis Usually just run on named entities within the same sentence Usually just run on named entities within the same sentence Slide from Jim Martin

Relation Extraction Task definition: to label the semantic relation between a pair of entities in a sentence (fragment) Task definition: to label the semantic relation between a pair of entities in a sentence (fragment) Slide from Jing Jiang …[leader arg-1 ] of a minority [government arg-2 ]… PHYSPER-SOCEMP-ORGNIL PHYS: Physical PER-SOC: Personal / Social EMP-ORG: Employment / Membership / Subsidiary

Supervised Learning Supervised machine learning ( e.g. [Zhou et al. 2005], [Bunescu & Mooney 2005], [Zhang et al. 2006], [Surdeanu & Ciaramita 2007] ) Supervised machine learning ( e.g. [Zhou et al. 2005], [Bunescu & Mooney 2005], [Zhang et al. 2006], [Surdeanu & Ciaramita 2007] ) Training data is needed for each relation type Training data is needed for each relation type …[leader arg-1 ] of a minority [government arg-2 ]… arg-1 word: leaderarg-2 type: ORG dependency: arg-1  of  arg-2 EMP-ORGPHYSPER-SOCNIL Slide from Jing Jiang

ACE 2008 Six Relations

Features: Words Headwords of M1 and M2, and combination Headwords of M1 and M2, and combination George Washington Bridge Bag of words and bigrams in M1 and M2 Bag of words and bigrams in M1 and M2 Words or bigrams in particular positions to the left and right of the M1 and M2 Words or bigrams in particular positions to the left and right of the M1 and M2 +/- 1, 2, 3 Bag of words or bigrams between the two entities Bag of words or bigrams between the two entities

Features: Named Entity Type and Mention Level Named-entity types (ORG, LOC, etc) Named-entity types (ORG, LOC, etc) Concatenation of the types Concatenation of the types Entity Level of M1 and M2 Entity Level of M1 and M2  (NAME, NOMINAL, PRONOUN)

Features: Parse Tree and Base Phrases Syntactic environment Syntactic environment  Constituent path through the tree from one to the other  Base syntactic chunk sequence from one to the other  Dependency path Slide from Jim Martin

Features: Gazeteers and trigger words Personal relative trigger list Personal relative trigger list  from wordnet: parent, wife, husabnd, grandparent, etc Country name list Country name list

American Airlines, a unit of AMR, immediately matched the move, spokesman Tim Wagner said.

Classifiers for supervised methods Now you can use any classifier you like Now you can use any classifier you like  SVM  Logistic regression  Naïve Bayes  etc

Summary Can get high accuracies with enough hand- labeled training data Can get high accuracies with enough hand- labeled training data If test data looks exactly like the training data If test data looks exactly like the training data But But  labeling 5000 relations (and named entities) is expensive  the approach doesn’t generalize to different genres

5 easy methods for relation extraction 1. Hand-built patterns 2. Supervised methods 3. Bootstrapping (seed) methods 4. Unsupervised methods 5. Distant supervision

Bootstrapping Approaches What if you don’t have enough annotated text to train on. What if you don’t have enough annotated text to train on.  But you might have some seed tuples  Or you might have some patterns that work pretty well Can you use those seeds to do something useful? Can you use those seeds to do something useful?  Co-training and active learning use the seeds to train classifiers to tag more data to train better classifiers...  Bootstrapping tries to learn directly (populate a relation) through direct use of the seeds Slide from Jim Martin

Bootstrapping Example: Seed Tuple Seed tuple Seed tuple  Grep (google)  “Mark Twain is buried in Elmira, NY.” X is buried in Y  “The grave of Mark Twain is in Elmira” The grave of X is in Y  “Elmira is Mark Twain’s final resting place” Y is X’s final resting place. Use those patterns to grep for new tuples that you don’t already know Use those patterns to grep for new tuples that you don’t already know Slide from Jim Martin

Hearst (1992) proposal for bootstrapping Choose lexical relation R Choose lexical relation R Gather a set of pairs that have this relation Gather a set of pairs that have this relation Find places in the corpus where these expressions occur near each other and record the environment Find places in the corpus where these expressions occur near each other and record the environment Find the commonalities among these environments and hypothesize that common ones yield patterns that indicate the relation of interest Find the commonalities among these environments and hypothesize that common ones yield patterns that indicate the relation of interest

Bootstrapping Relations Slide from Jim Martin

Dipre (Brin 1998) Extract pairs Extract pairs Start with these 5 seeds Start with these 5 seeds Learn these patterns: Learn these patterns: Now iterate, using these patterns to get more instances and patterns… Now iterate, using these patterns to get more instances and patterns…

Snowball [Agichtein & Gravano 2000] Exploit duality between patterns and tuples Exploit duality between patterns and tuples - find tuples that match a set of patterns -find patterns that match a lot of tuples  bootstrapping approach Initial Seed TuplesOccurrences of Seed Tuples Tag Entities Generate Extraction Patterns Generate New Seed Tuples Augment Table

Snowball initial seed tuples Computer servers at Microsoft’s headquarters in Redmond… In mid-afternoon trading, share of Redmond-based Microsoft fell… The Armonk-based IBM introduced a new line… The combined company will operate from Boeing’s headquarters in Seattle. Intel, Santa Clara, cut prices of its Pentium processor. occurrences of seed tuples

Snowball Require that X and Y be named entities of particular types Require that X and Y be named entities of particular types { } ORGANIZATION LOCATION { } ORGANIZATION LOCATION

Patterns (extraction) pattern has format, where tag1, tag2 are named-entity tags and left, middle, and right are vectors of weighted terms patterns derived directly from occurrences are too specific patterns derived directly from occurrences are too specific ORGANIZATION 's central headquarters in LOCATION is home to... LOCATION ORGANIZATION {,, } {, }

Pattern Clusters cluster patterns, cluster centroids define patterns ORGANIZATION { } LOCATION ORGANIZATION { } LOCATION ORGANIZATION Cluster 1 { } LOCATION ORGANIZATION Cluster 2 LOCATION

5 easy methods for relation extraction 1. Hand-built patterns 2. Supervised methods 3. Bootstrapping (seed) methods 4. Unsupervised methods 5. Distant supervision

Distant supervision paradigm Instead of hand-creating 5 seed examples Instead of hand-creating 5 seed examples Use a large database to get our seed examples Use a large database to get our seed examples  lots of examples  supervision from a database, not a corpus! Not genre-dependent!  Create lots and lots of noisy features from all these examples  Combine in a classifier Snow, Jurafsky, Ng Learning syntactic patterns for automatic hypernym discovery. NIPS 17 Mintz, Bills, Snow, Jurafsky (2009) Distant supervision for relation extraction without labeled data. ACL-2009.

Distant supervision paradigm Has advantages of supervised classification: Has advantages of supervised classification:  use of rich of hand-created knowledge Has advantages of unsupervised classification: Has advantages of unsupervised classification:  infinite amounts of data  allows for very large number of weak features  not sensitive to training corpus

Relation Classification with “Distant Supervision” We construct a noisy training set consisting of occurrences from our corpus that contain an IS-A pair according to WordNet. Slide from Rion Snow Suppose scientists could erase memories with a single substance in the brain.

Relation Classification with “Distant Supervision” This leads to high-signal examples like: “...consider authors like Shakespeare...” “Shakespeare, author of The Tempest...” “Some authors (including Shakespeare)...” “Shakespeare was the author of several...” Construct a noisy training set consisting of occurrences from our corpus that contain an IS-A pair according to WordNet. Slide from Rion Snow

Relation Classification with “Distant Supervision” This leads to high-signal examples like: But noisy examples like: “...consider authors like Shakespeare...” “Shakespeare, author of The Tempest...” “Some authors (including Shakespeare)...” “Shakespeare was the author of several...” “The author of Shakespeare in Love...” “...authors at the Shakespeare Festival...” Training set (TREC and Wikipedia): 14,000 hypernym pairs, ~600,000 total pairs Slide from Rion Snow

How to learn patterns Take corpus sentences Collect noun pairs 752,311 pairs from 6M words of newswire Is pair an IS-A in WordNet? 14,387 yes, 737,924 no Parse the sentences Extract patterns 69,592 dependency paths >5 pairs) Train classifier on these patterns Logistic regression with 70K features (actually converted to 974,288 bucketed binary features) … doubly heavy hydrogen atom called deuterium… (Atom, deuterium) YES Snow, Jurafsky, Ng Learning syntactic patterns for automatic hypernym discovery. NIPS 17

One of 70,000 patterns “ ‘called’ ” Learned from cases such as: “sarcoma / cancer”: …an uncommon bone cancer called osteogenic sarcoma and to… “deuterium / atom” ….heavy water rich in the doubly heavy hydrogen atom called deuterium. New pairs discovered: “efflorescence / condition”: …and a condition called efflorescence are other reasons for… “’neal_inc / company” …The company, now called O'Neal Inc., was sole distributor of E-Ferol… “hat_creek_outfit / ranch” …run a small ranch called the Hat Creek Outfit. “hiv-1 / aids_virus” …infected by the AIDS virus, called HIV-1. “bateau_mouche / attraction” …local sightseeing attraction called the Bateau Mouche... “kibbutz_malkiyya / collective_farm” …an Israeli collective farm called Kibbutz Malkiyya…

Hypernym Precision / Recall for all Features Slide from Rion Snow

logistic regression Idea: use each pattern as a feature!!!! Precision/Recall for Hypernym Classification: 10-fold Cross Validation on 14,000 WordNet-Labeled Pairs Slide from Rion Snow

Extracting more relations with distant supervision 2700 relations > 10 instances 5.2 million instances 3.7 million entities Training Set Corpus 2.3 million articles 31.5 million sentences Mintz, Bills, Snow, Jurafsky (2009) Distant supervision for relation extraction without labeled data. ACL-2009.

Frequent Freebase Relations a

Algorithm: Distant Supervision A kind of weakly supervised learning A kind of weakly supervised learning  Use a large database to get seed examples  Create lots and lots of noisy pattern features from all these examples  Combine in a classifier

Extract parse and other features “Astronomer Edward Hubble was born in Marshfield, Missouri” “Named entities” Edwin Hubble is a PERSON Marshfield is a LOCATION Lexical items nearby…

New relations learned Montmartre IS-IN Paris Montmartre IS-IN Paris Fort Erie IS-IN Ontario Fort Erie IS-IN Ontario Fyoder Kamesnky DIED-IN Clearwater Fyoder Kamesnky DIED-IN Clearwater Utpon Sinclair WROTE Lanny Budd Utpon Sinclair WROTE Lanny Budd Vince McMahon FOUNDED WWE Vince McMahon FOUNDED WWE Thomas Mellon HAS-PROFESSION Judge Thomas Mellon HAS-PROFESSION Judge

Human evaluation: precision using Mechanical Turk labelers FeaturePrecision Syntactic.67 Lexical.66 Both.69

Where syntactic knowledge helps Back Street is a 1932 film made by Universal Pictures, directed by John M. Stahl, and produced by Carl Laemmle Jr. Back Street is a 1932 film made by Universal Pictures, directed by John M. Stahl, and produced by Carl Laemmle Jr. Back Street and John M. Stahl are very far apart in surface string Back Street and John M. Stahl are very far apart in surface string But are close together in dependency parse But are close together in dependency parse

Unsupervised relation extraction Banko et al 2007 “Open information extraction from the Web” Banko et al 2007 “Open information extraction from the Web” Extracting relations from the web with Extracting relations from the web with  no training data  no predetermined list of relations The Open Approach The Open Approach 1.Use parse data to train a “trust-worthy” classifier 2.Extract trustworthy relations among NPs 3.Rank relations based on text redundancy