INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Unsupervised and Semi-Supervised Relation Extraction.

Slides:



Advertisements
Similar presentations
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Advertisements

TEXTRUNNER Turing Center Computer Science and Engineering
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Semantic Access to Data from the Web Raquel Trillo *, Laura Po +, Sergio Ilarri *, Sonia Bergamaschi + and E. Mena * 1st International Workshop on Interoperability.
807 - TEXT ANALYTICS Massimo Poesio Lecture 8: Relation extraction.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Modelled on paper by Oren Etzioni et al. : Web-Scale Information Extraction in KnowItAll System for extracting data (facts) from large amount of unstructured.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.
Methods for Domain-Independent Information Extraction from the Web An Experimental Comparison Oren Etzioni et al. Prepared by Ang Sun
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Presented by Zeehasham Rasheed
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Tues 4-5 TA: Yves Petinot 719 CEPSR,
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Feature Selection for Automatic Taxonomy Induction The Features Input: Two terms Output: A numeric score, or. Lexical-Syntactic Patterns Co-occurrence.
1 Extracting Product Feature Assessments from Reviews Ana-Maria Popescu Oren Etzioni
ERC StG: Multilingual Joint Word Sense Disambiguation (MultiJEDI) Roberto Navigli 1 A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.
Supporting the Automatic Construction of Entity Aware Search Engines Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Dipartimento di Informatica.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Open IE and Universal Schema Discovery Heng Ji Acknowledgement: some slides from Daniel Weld and Dan Roth.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Web-scale Information Extraction in KnowItAll Oren Etzioni etc. U. of Washington WWW’2004 Presented by Zheng Shao, CS591CXZ.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Presented by Rani Qumsiyeh & Andrew Zitzelberger.
A Language Independent Method for Question Classification COLING 2004.
Expressing Implicit Semantic Relations without Supervision ACL 2006.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
KnowItAll Oren Etzioni, Stephen Soderland, Daniel Weld Michele Banko, Alex Beynenson, Jeff Bigham, Michael Cafarella, Doug Downey, Dave Ko, Stanley Kok,
KnowItAll April William Cohen. Announcements Reminder: project presentations (or progress report) –Sign up for a 30min presentation (or else) –First.
Semi-automatic Product Attribute Extraction from Store Website
Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words Dmitry Davidov, Ari Rappoport The Hebrew University.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Yoon kyoung-a A Semantic Match Algorithm for Web Services Based on Improved Semantic Distance Gongzhen Wang, Donghong Xu, Yong Qi, Di Hou School.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Learning Attributes and Relations
Semantic Parsing for Question Answering
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Information Extraction from Wikipedia: Moving Down the Long Tail
Introduction Task: extracting relational facts from text
Open Information Extraction from the Web
KnowItAll and TextRunner
Presentation transcript:

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Unsupervised and Semi-Supervised Relation Extraction

NON-SUPERVISED METHODS FOR RELATION EXTRACTION Unsupervised relation extraction: – Hearst – Other work on extracting hyponymy relations – Extracting other relations: Almuhareb and Poesio, Cimiano and Wenderoth Semi-supervised methods – KNOW-IT-ALL

HEARST 1992, 1998: USING PATTERNS TO EXTRACT ISA LINKS Intuition: certain constructions typically used to express certain types of semantic relations E.g., for ISA: – The seabass IS A fish – Swimming, running AND OTHER activities – Vehicles such as cars, trucks and bikes

TEXT PATTERNS FOR HYPONYMY EXTRACTION HEARST 1998: NP {, NP}* {,} or other NP bruises …… broken bones, and other INJURIES HYPONYM (bruise, injury) EVALUATION: 55.46% precision wrt WordNet

THE PRECISION / RECALL TRADEOFF X and other Y: high precision, low recall X isa Y: low precision, high recall

HEARST’ REQUIREMENTS ON PATTERNS

OTHER WORK ON EXTRACTING HYPONYMY Caraballo ACL 1999 Widdows & Dorow 2002 Pantel & Ravichandran ACL 2004

Widdows & Dorow 2002

Text patterns for (other) relation extraction Intuition: Hearst’s ideas can apply to other types of relations as well Some proposals: – Almuhareb & Poesio 2004 (attributes and values) – Cimiano et al 2005 (Pustejovsky qualia)

USING PATTERNS TO EXTRACT ATTRIBUTES AND VALUES Woods (1975): “ A is an attribute [feature] of C if we can say V is a/the A of C” ATTRIBUTE Pattern: “the * of the C [is|was]” … the price of the car was …  To increase the precision of the patterns, we put some restrictions (is and was), to make sure that C stands for a concept..  VALUE pattern: “[a|an|the] * C [is|was]” … an expensive car is …  Both patterns are low precision, high recall: some filtering is needed  WEIGHTING in this work

EXTRACTING ATTRIBUTES AND VALUES, (2)  Web an increasingly popular `corpus’ (Grefenstette, 1999; Keller and Lapata, 2003)  Its size outweighs other problems (lack of balance, etc.)  (Almuhareb and Poesio 2004): use the Google API to extract from the Web information about concepts using text patterns  Our search requests to Google take the general form “s 1 * s 2 ” (including the double quotes).  We get frequencies of occurrence of a pattern; these frequencies are then weighed using the t-test  We also tried MI,  2 and log-likelihood (Dunning 1993)

TOP CANDIDATE FEATURES OF NOSE skin side 9.56 inside 9.13 base 8.96 shape 8.56 bottom 8.00 end 7.59 root 6.89 colour 6.23 structure 5.25 width 5.23 function 4.86 interior 4.81 purpose 4.78 appearance 4.34 floor 4.00 point 3.54 center 3.15 length 3.09 framework 3.06 line 2.85 size 2.64 color 0.00 height 0.00 back 0.00

Cimiano and Wenderoth 2005 Extract from text the information about concepts specified by Pustejovsky’s Generative Lexicon theory Evaluation: human judgments

PUSTEJOVSKY’S GENERATIVE LEXICON Pustejovsky (1991, 1995): lexical entries have a QUALIA STRUCTURE consisting of four ‘roles’ – FORMAL role: what type of object it is (shape, color, ….) – CONSTITUTIVE role: what it consists of (parts, stuff, etc.) E.g., for books, chapters, index, paper …. – TELIC role: what is the purpose of the object (e.g., for books, READING) – AGENTIVE role: how the object was created (e.g., for books, WRITING)

EXAMPLE: QS FOR “KNIFE”

CIMIANO AND WENDEROTH’S APPROACH

PATTERNS FOR THE CONSTITUTIVE ROLE

GOOD EXAMPLES

PROBLEMS

PATTERNS FOR THE FORMAL ROLE

PATTERNS FOR THE AGENTIVE ROLE

PATTERNS FOR THE TELIC ROLE

Parsing for relation extraction Patterns are rigid – The purpose of AXES and KNIVES is to cut … – The Sphinx is a RECENTLY DISCOVERED animal … Using a parser may increase recall

Almuhareb & Poesio 2005 Repeat experiments of Almuhareb and Poesio 2004, but using a parser to identify the attribute and value constructions Parser used: RASP (a dependency parser)

FINDING PATTERNS THROUGH PARSING

Attributes in lexical acquisition (Almuhareb and Poesio) LOOKING ONLY FOR (POTENTIAL) ATTRIBUTES AND THEIR VALUES BETTER THAN USING ALL GRS

BUT: SOME CANDIDATE FEATURES OF DEER the first / last of the deer the rest / majority of the deer the picture / image / photos of the deer the cave / mountain / lake of the deer the meaning of the deer [in Western philosophy / … ]

Semi-supervised methods Hearst 1992: find new patterns by using initial examples as SEEDS This approach has been pursued in a number of ways – Espresso (Pantel and Pennacchiotti 2006) – OPEN INFORMATION EXTRACTION (Etzioni and colleagues)

THE GENERIC SEMI-SUPERVISED ALGORITHM 1.Start with SEED INSTANCES Depending on algorithm, seed may be hand-generated or automatically obtained 2.For each seed instance, extract patterns from corpus Choice of patterns depends on algorithm 3.Output the best patterns according to some metric 4.(Possibly) iterate steps 2-3

THE ESPRESSO SEMI-SUPERVISED ALGORITHM 1.Start with SEED INSTANCES Hand-chosen 2.For each seed instance, extract patterns from corpus Generalization of whole sentence 3.Output the best patterns according to some metric A metric based on PMI 3.Do iterate steps 2-3

STRUDEL Extract from corpora candidate concept- property-CONNECTOR triples – LICE in a number of DOGS Rank concept-property pairs by the number of connectors – Intuition: ‘important’ properties are expressed in a number of ways

STRUDEL: ranking

STRUDEL: methods

BOOK according to STRUDEL

STRUDEL: evaluation By comparison with McRae et al’s NORMS databases Categorization (aka clustering)

COMPARISON AGAINST NORMS

Clustering

KNOW-IT-ALL A system for ontology population developed by Oren Etzioni and collaborators at the University of Washington

KNOW-IT-ALL: ARCHITECTURE

INPUT

BOOTSTRAPPING This first step takes the input domain predicates and the generic extraction patterns and produces domain-specific extraction patterns

EXTRACTION PATTERNS

EXTRACTOR Uses domain-specific extraction patterns + syntactic constraints – In “Garth Brooks is a country singer”, country NOT extracted as an instance of the pattern “X is a NP” Produces EXTRACTIONS (= instances of the patterns that satisfy the syntactic constraints)

ASSESSOR Estimates the likelihood of an extraction using POINTWISE MUTUAL INFORMATION between the extracted INSTANCE and DISCRIMINATOR phrases E.g., INSTANCE: Liege DISCRIMINATOR PHRASES: “is a city”

ESTIMATING THE LIKELIHOOD OF A FACT P(f |  ) and P(f |  ) estimated using a set of positive and negative instances

TERMINATION CONDITION KNOW-IT-ALL could continue searching for instances – But for instance, COUNTRY has only around 300 instances Stop: Signal-to-Noise ratio – Number of high probability facts / Number of low probability ones

OVERALL ALGORITHM

EVALUATION 5 classes: CITY, US STATE, COUNTRY, ACTOR, FILM

EXTENSION: LEARNING PATTERNS The specializations of generic patterns do not include many very useful domain-specific patterns E.g., – “ STARRING X” – “HEADQUARTERED IN ”

THE KNOW-IT-ALL SEMI- SUPERVISED STEP 1.Start with seed instances in this case, generated by domain-independent extractors 2.For each seed instance, query Web and extract patterns in this case, pattern = window of size n centered on class word 3.Output the best patterns

EXAMPLES OF PATTERNS LEARNED

EXTENSION: LIST EXTRACTION

RESULTS: CITY

RESULTS: FILM

OVERALL JUDGMENT Very solid engineering work combining lots of ideas developed from others Main limitation: no attempt at discriminating between distinct instances with same name