Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.

Slides:

Advertisements

Similar presentations

Heuristic Search techniques

Advertisements

Improved TF-IDF Ranker

1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.

The Impact of Task and Corpus on Event Extraction Systems Ralph Grishman New York University Malta, May 2010 NYU.

Automatically Acquiring a Linguistically Motivated Genic Interaction Extraction System Mark A. Greenwood Mark Stevenson Yikun Guo Henk Harkema Angus Roberts.

NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.

Extracting an Inventory of English Verb Constructions from Language Corpora Matthew Brook O’Donnell Nick C. Ellis Presentation.

Person Name Disambiguation by Bootstrapping Presenter: Lijie Zhang Advisor: Weining Zhang.

Aki Hecht Seminar in Databases (236826) January 2009

Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.

July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:

A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Automatically Acquiring a Linguistically Motivated Genic Interaction Extraction System Mark A. Greenwood Mark Stevenson Yikun Guo Henk Harkema Angus Roberts.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

A Language Independent Method for Question Classification COLING 2004.

Triplet Extraction from Sentences Lorand Dali Blaž “Jožef Stefan” Institute, Ljubljana 17 th of October 2008.

A S URVEY ON I NFORMATION E XTRACTION FROM D OCUMENTS U SING S TRUCTURES OF S ENTENCES Chikayama Taura Lab. M1 Mitsuharu Kurita 1.

Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,

A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

Triplet Extraction from Sentences Technical University of Cluj-Napoca Conf. Dr. Ing. Tudor Mureşan “Jožef Stefan” Institute, Ljubljana, Slovenia Assist.

Comparing Information Extraction Pattern Models Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.

Using Semantic Relatedness for Word Sense Disambiguation

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Bo Lin Kevin Dela Rosa Rushin Shah.  As part of our research, we are working on a cross- document co-reference resolution system  Co-reference Resolution:

Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.

2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Information Extraction from Single and Multiple Sentences Mark Stevenson Department of Computer Science University of Sheffield, UK.

Heaps & Priority Queues

Probabilistic Text Structuring: Experiments with Sentence Ordering Mirella Lapata Department of Computer Science University of Sheffield, UK (ACL 2003)

Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.

Relation Extraction CSCI-GA.2591

Extracting Semantic Concept Relations

Text Categorization Berlin Chen 2003 Reference:

Hierarchical, Perceptron-like Learning for OBIE

Presentation transcript:

Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction Patterns Improving Semi-Supervised Acquisition of Relation Extraction Patterns Mark A. Greenwood Mark Stevenson

22/07/06Information Extraction Beyond the Document Introduction and Overview Recently a number of semi-supervised approaches to acquiring Information Extraction (IE) patterns have been reported. Many of these approaches have used a iterative algorithms to learn new patterns from a small seed set.  These approaches tend to be limited by their use of simplistic pattern representations, such as subject-verb-object (Yangarber et al., 2000) Other approaches to IE have used pattern representations derived from dependency trees:  Sudo et al (2003) used patterns consisting of a path from a verb to any of its descendents (direct or indirect) - the chain model  Bunescu and Mooney (2005) suggest the shortest path between the items being related.

22/07/06Information Extraction Beyond the Document Introduction and Overview These more complex pattern models:  Are capable of representing more of the information present in text  Require more complex methods of determining similarity between patterns which limit their use. We present a structural similarity measure inspired by kernel methods used in non-iterative learning algorithms (Culotta and Sorensen, 2004) This allows us to use more complex pattern models while retaining the semi-supervised iterative algorithm approach to acquiring new extraction patterns.

22/07/06Information Extraction Beyond the Document Learning Extraction Patterns Iterative Learning Algorithm 1.Begin with set of seed patterns which are known to be good extraction patterns 2.Compare every other pattern with the ones known to be good 3.Choose the highest scoring of these and add them to the set of good patterns 4.Stop if enough patterns have been learned, else repeat from step 2. Rank Patterns Candidates Seeds

22/07/06Information Extraction Beyond the Document Learning Extraction Patterns Such an algorithm requires for each IE task:  Unannotated text from which to acquire patterns  A small set of representative seed patterns Independent of the IE task this iterative algorithm requires :  An extraction pattern model  A measure of how similar two patterns are to each other

22/07/06Information Extraction Beyond the Document Extraction Patterns Linked chain model (Greenwood et al., 2005) used as pattern representation

22/07/06Information Extraction Beyond the Document Extraction Patterns

22/07/06Information Extraction Beyond the Document Structural Similarity Measure This similarity measure is inspired by the tree kernel proposed by Culotta and Sorensen (2004). It compares patterns by following their structure from the root nodes through the patterns until they diverge too far to be considered similar. Each node in a pattern has three features  The word – n word  The relation to a parent – n reln  The part-of-speech (POS) tag – n pos Nodes can be compared by examining these features and by the semantic similarity between words.

22/07/06Information Extraction Beyond the Document Structural Similarity Measure A set of our functions F = {word, relation, pos, semantic} is used to compare nodes  The first three correspond to the node features of the same name and return 1 if the value of the feature is equal for the two nodes and 0 otherwise.  For example the pos function compares the values of the POS features for nodes n 1 and n 2  The semantic function return a value between 0 and 1 to signify the semantic similarity of the lexical items represented by the two nodes. We compute this using the WordNet (Fellbaum, 1998) similarity function introduced by Lin (1998).

22/07/06Information Extraction Beyond the Document Structural Similarity Measure The similarity of two nodes is zero if their POS tags are different, and otherwise is simply the sum of the scores provided by the four function from F. The similarity of a pair of linked chains l 1 and l 2 is given by: Where r 1 and r 2 are the root nodes of patterns l 1 and l 2 and C r is the set of children of node r.

22/07/06Information Extraction Beyond the Document Structural Similarity Measure The final part of the measure calculates the similarity between the child nodes of n 1 and n 2. As only the root nodes of the patterns have multiple children in all but the first application this formula simplifies to As the maximum similarity between two nodes is 4 we normalise by dividing the score by 4 times the size (in nodes) of the larger pattern to remove length bias.

22/07/06Information Extraction Beyond the Document Experiments - Overview We used the similarity measure in the iterative algorithm described earlier  The four highest scoring patterns are accepted at each iteration  Only if their score is within 0.95 of the highest scoring pattern We compare this approach with our (Stevenson and Greenwood, 2005) previous approach based on the vector space model and cosine similarity. Three separate configurations  Cosine (SVO): uses the SVO model with the cosine similarity measure  Cosine (Linked Chains): same as above but uses linked chains  Structural (Linked Chains): uses linked chain patterns with the new structural similarity measure

22/07/06Information Extraction Beyond the Document Experiments - IE Scenario We use the data from the MUC-6 management succession task  We use a sentence level version produced by Soderland (1999)  This corpus contains four types of relation: Person-Person, Person- Post, Person-Organisation, and Post-Organisation At each iteration of the algorithm related items recognised by the current set of acquired patterns are extracted and evaluated. The texts have been previously annotated with named entities and MINIPAR is used to produce the dependency analysis.

22/07/06Information Extraction Beyond the Document Experiments – Seed Patterns COMPANY  subj— appoint —obj  PERSON COMPANY  subj— elect —obj  PERSON COMPANY  subj— promote —obj  PERSON PERSON  subj— resign PERSON  subj— depart PERSON  subj— quit These seeds were choose due to their use in previously reported work.  No tuning of this set was performed  It should be noted that they do not contain the Person-Post or Post- Organisation relations

22/07/06Information Extraction Beyond the Document Results and Analysis The seed patterns achieve an F- measure of (P=0.833, R=0.022) Cosine Similarity performs poorly irrespective of the pattern model  Linked chains perform better than SVO under this similarity measure which suggests the model is inherently superior Best result is the combination of linked chains and the structural similarity measure, F- measure of (P=0.434, R=0.265) after 190 iterations

22/07/06Information Extraction Beyond the Document Results and Analysis IterationCosine (SVO)Cosine (Linked)Structural (Linked) #PRFPRFPRF

22/07/06Information Extraction Beyond the Document Conclusions The results show that semi-supervised approaches to IE pattern acquisition benefit from the use of more expressive extraction pattern models.  Using linked chains resulted in better performance than using SVO even when using the same similarity measure Similarity measures (such as kernel methods) developed for supervised learning can be adapted and applied to semi- supervised approaches.  Future work should look at other similarity functions used in supervised learning to see if they can also be adapted for use with semi-supervised approaches. The structural similarity measure introduced here outperforms a previously proposed method based on cosine similarity and a vector space representation.

22/07/06Information Extraction Beyond the Document Any Questions?

22/07/06Information Extraction Beyond the Document Bibliography Razvan Bunescu and Raymond Mooney A Shortest Path Dependency Kernel for Relation Extraction. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages , Vancouver, B.C. Aron Culotta and Jeffery Sorensen Dependency Tree Kernels for Relation Extraction. In 42 nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain. Christiance Fellbaum, editor WordNet: An Electronic Lexical Database and some of its Applications. MIT Press, Cambridge, MA. Dekang Lin An Information-Theoretic Definition of Similarity. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML-98), Madison, Wisconsin. Dekang Lin MINIPAR: A Minimalist Parser. In Maryland Linguistics Colloquium, University of Maryland, College Park. Stephen Soderland Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 31(1-3): Mark Stevenson and Mark A. Greenwood. A Semantic Approach to IE Pattern Induction. In Proceedings of the 43 rd Annual Meeting of the Association for Computational Linguistics, pages , Ann Arbor, MI. Kiyoshi Sudo, Satoshi Sekine, and Ralph Grishman An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition. In Proceedings of the 41 st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages , Sapporo, Japan. Roman Yangarber, Ralph Grishman, Pasi Tapanainen, and Silja Huttenen Automatic Acquisition of Domain Knowledge for Information Extraction. In Proceedings of the 18 th International Conference on Computational Linguistics (CLOING 2000), pages , Saarbrücken, Germany.