Reporter: Longhua Qian School of Computer Science and Technology

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.

计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.

TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009.

Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.

Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun

Longhua Qian School of Computer Science and Technology

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Information Retrieval in Practice

Mining and Summarizing Customer Reviews

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Natural Language Processing Group Department of Computer Science University of Sheffield, UK Improving Semi-Supervised Acquisition of Relation Extraction.

Knowledge and Tree-Edits in Learnable Entailment Proofs Asher Stern, Amnon Lotan, Shachar Mirkin, Eyal Shnarch, Lili Kotlerman, Jonathan Berant and Ido.

Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL

Mining and Analysis of Control Structure Variant Clones Guo Qiao.

2010/2/4Yi-Ting Huang Pennacchiotti, M., & Zanzotto, F. M. Learning Shallow Semantic Rules for Textual Entailment. Recent Advances in Natural Language.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.

A Language Independent Method for Question Classification COLING 2004.

Date: 2014/02/25 Author: Aliaksei Severyn, Massimo Nicosia, Aleessandro Moschitti Source: CIKM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Building.

1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.

Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.

Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Supervised Relation Extraction.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Algorithmic Detection of Semantic Similarity WWW 2005.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Convolution Kernels on Constituent, Dependency and Sequential Structures for Relation Extraction Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date:

Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.

An Integrated Approach for Relation Extraction from Wikipedia Texts Yulan Yan Yutaka Matsuo Mitsuru Ishizuka The University of Tokyo WWW 2009.

Automatically Labeled Data Generation for Large Scale Event Extraction

Queensland University of Technology

Longhua Qian School of Computer Science and Technology

Relation Extraction CSCI-GA.2591

Social Knowledge Mining

Automatic Extraction of Hierarchical Relations from Text

Hierarchical, Perceptron-like Learning for OBIE

Presentation transcript:

Tree Kernel-based Semantic Relation Extraction using Unified Dynamic Relation Tree Reporter: Longhua Qian School of Computer Science and Technology Soochow University, Suzhou, China 2008.07.23 ALPIT2008, DaLian, China Good afternoon, everyone! It’s my great pleasure to share my research experience with everyone here. My name is .. And I am from Soochow university in china. My topic is …

Outline 1. Introduction 2. Dynamic Relation Tree 3. Unified Dynamic Relation Tree 4. Experimental results 5. Conclusion and Future Work This is the outline of my presentation. The first section is … The second one is … The third one is … The forth one is … And the last one is …

1. Introduction Information extraction is an important research topic in NLP. It attempts to find relevant information from a large amount of text documents available in digital archives and the WWW. Information extraction by NIST ACE Entity Detection and Tracking (EDT) Relation Detection and Characterization (RDC) Event Detection and Characterization (EDC) First Let’s have a look at the introduction. … According to NIST ACE definition, Information Extraction subsumes three following subtasks EDT means .. RDC means … And EDC means … Our focus is on RDC, that is, relation extraction in general.

RDC Function RDC detects and classifies semantic relationships (usually of predefined types) between pairs of entities. Relation extraction is very useful for a wide range of advanced NLP applications, such as question answering and text summarization. E.g. The sentence “Microsoft Corp. is based in Redmond, WA” conveys the relation “GPE-AFF.Based” between “Microsoft Corp” (ORG) and “Redmond” (GPE).

Two approaches Feature-based methods Kernel-based methods have dominated the research in relation extraction over the past years. However, relevant research shows that it’s difficult to extract new effective features and further improve the performance. Kernel-based methods compute the similarity of two objects (e.g. parse trees) directly. The key problem is how to represent and capture structured information in complex structures, such as the syntactic information in the parse tree for relation extraction? Typically, there exist two approaches to relation extraction

Kernel-based related work Zelenko et al. (2003), Culotta and Sorensen (2004), Bunescu and Mooney (2005) described several kernels between shallow parse trees or dependency trees to extract semantic relations. Zhang et al. (2006), Zhou et al. (2007) proposed composite kernels consisting of an linear kernel and a convolution parse tree kernel, and the latter can effectively capture structured syntactic information inherent in parse trees. kernel-based methods for relation extraction include the following work.

Structured syntactic information A tree span for relation instance a part of a parse tree used to represent the structured syntactic information for relation extraction. Two currently used tree spans PT(Path-enclosed Tree): the sub-tree enclosed by the shortest path linking the two entities in the parse tree CSPT(Context-Sensitive Path-enclosed Tree): Dynamically determined by further extending the necessary predicate-linked path information outside PT. There are two… One is PT, … The other one is …

Current problems Noisy information Useful information Both PT and CSPT may still contain noisy information. In other words, more noise should be pruned away from a tree span. Useful information CSPT only captures part of context-sensitive information only relating to predicate-linked path. That is to say, more information outside PT/CSPT may be recovered so as to discern their relationships. However, there still exist several problems relating to the tree span for relation extraction. One is … The other is …

Our solution Dynamic Relation Tree (DRT) Based on PT, we apply a variety of linguistics-driven rules to dynamically prune out noisy information from a syntactic parse tree and include necessary contextual information. Unified Dynamic Relation Tree (UDRT) Instead of constructing composite kernels, various kinds of entity-related semantic information, including entity types/sub-types/mention levels etc., are unified into a Dynamic Relation Tree. Our solution to these problems is to propose DRT and UDRT. DRT is Dynamic Relation Tree, … UDRT is Unified DRT, …

2. Dynamic Relation Tree Generation of DRT Remove operation Starting from PT, we further apply three kinds of operations (i.e. Remove, Compress, and Expansion) sequentially to reshaping PT, giving rise to a Dynamic Relation Tree at last. Remove operation DEL_ENT2_PRE: Removing all the constituents (except the headword) of the 2nd entity DEL_PATH_ADVP/PP: Removing adverb or preposition phrases along the path Now, let’s turn to the second section--DRT. Remove operation includes … and …,

DRT(cont’) Compress operation Expansion operation CMP_NP_CC_NP: Compressing noun phrase coordination conjunction CMP_VP_CC_VP: Compressing verb phrase coordination conjunction CMP_SINGLE_INOUT: Compressing single in-and-out nodes Expansion operation EXP_ENT2_POS: Expanding the possessive structure after the 2nd entity EXP_ENT2_COREF: Expanding entity coreferential mention before the 2nd entity Compress operation includes …., …. And …. And Expansion operation contains … and …

Some examples of DRT These are some examples of DRT, e.g. (a) shows how the constituents before the 2nd entity can be removed. (b) shows all the conjuncts other than the entity may be reduced into a single conjunct. (c) shows how the possessive structure should be kept along with the entity’s headword to differentiate it from positive instances.

3.Unified Dynamic Relation Tree T1: DRT T2: UDRT-Bottom T3: UDRT-Entity T4: UDRT-Top This illustration shows three different kinds of DRT setups incorporated with entity major types information. T1 means the original DRT. T2 means UDRT-Bottom with … T3 means UDRT-Entity with … And T4 means UDRT-Top with …

Four UDRT setups T1: DRT there is no entity-related information except the entity order (i.e. “E1” and “E2”). T2: UDRT-Bottom the DRT with entity-related information attached at the bottom of two entity nodes T3: UDRT-Entity the DRT with entity-related information attached in entity nodes T4: UDRT-Top the DRT with entity-related feature attached at the top node of the tree.

4. Experimental results Corpus Statistics Corpus processing The ACE RDC 2004 data contains 451 documents and 5702 relation instances. It defines 7 entity major types, 7 major relation type and 23 relation subtypes. Evaluation is done on 347 (nwire/bnews) documents and 4307 relation instances using 5-fold cross-validation. Corpus processing parsed using Charniak’s parser (Charniak, 2001) Relation instances are generated by iterating over all pairs of entity mentions occurring in the same sentence. The third section is about experimental results. The corpus we used is the ACE RDC 2004 dataset, this dataset contains … For comparison purposes, evaluation is done…. First the corpus is parsed … Then relation instances … Entity major types: PER, ORG, GPE, LOC, FAC, VEH, WEA Relation major types: PHY, PER-SOC, EMP-ORG, ART, OTHER-AFF, GPE-AFF, DISC

Classifier Tools One vs. others strategy SVMLight (Joachims 1998) Tree Kernel Tooklits (Moschitti 2004) The training parameters C (SVM) and λ (tree kernel) are also set to 2.4 and 0.4 respectively. One vs. others strategy which builds K basic binary classifiers so as to separate one class from all the others. The tools we used include .. by …and... by … For comparison purposes, the training … And for efficiency consideration, we also apply one vs. other strategy…

Contribution of various operation rules Each operation rule is incrementally applied on the previously derived tree span. The plus sign preceding a specific rule indicates that this rule is useful and will be added automatically in the next round. Otherwise, the performance is unavailable. Operation rules P R F PT (baseline) 76.3 59.8 67.1 +DEL_ENT2_PRE 62.1 68.5 DEL_PATH_PP - DEL_PATH_ADVP +CMP_SINGLE_INOUT 76.4 63.1 69.1 +CMP_NP_CC_NP 76.1 63.3 CMP_VP_CC_VP +EXP_ENT2_POS 76.6 63.8 69.6 +EXP_ENT2_COREF 77.1 64.3 70.1 This table indicates the contribution of performance on the major relation types in the ACE RDC 2004 corpus. It shows that the eventual Dynamic Relation Tree (DRT) achieves the best performance of 77.1%/64.3%/70.1 in P/R/F respectively after applying all the operation rules, with the increase of F-measure by 3.0 units compared to the baseline PT with entity-type info. Attached in entity nodes .

Comparison of different UDRT setups Tree Setups P R F DRT 68.7 53.5 60.1 UDRT-Bottom 76.2 64.4 69.8 UDRT-Entity 77.1 64.3 70.1 UDRT-Top 76.4 65.2 70.4 This tables compares the performance of different UDRT setups, i.e. …, … and … It shows that … And … Compared with DRT, the Unified Dynamic Relation Trees (UDRTs) with only entity type information significantly improve the F-measure by average 10 units due to the increase both in precision and recall. Among the three UDRTs, UDRT-Top achieves slightly better performance than the other two.

Improvements of different tree setups over PT CSPT over PT 1.5 1.1 1.3 DRT over PT 0.1 5.4 3.3 UDRT-Top over PT 3.9 9.4 7.2 Dynamic Relation Tree (DRT) performs better that CSPT/PT setups. the Unified Dynamic Relation Tree with entity-related semantic features attached at the top node of the parse tree performs best. This tables compares the performance improvements of different tree setups over the original PT. It shows that … And …

Comparison with best-reported systems F Zhou et al.: Composite kernel 82.2 70.2 75.8 Ours: CTK with UDRT-Top 80.2 69.2 74.3 Zhang et al.: 76.1 68.4 72.1 Zhou et al.: CS-CTK with CSPT 81.1 66.7 73.2 Zhao and Grishman 70.5 70.4 CTK with PT 74.1 62.4 67.7 Finally, we compare our system with the best-reported relation extraction systems on the ACE RDC 2004 corpus. … It shows that our UDRT-Top performs best among tree setups using one single kernel, and even better than the two previous composite kernels.

5. Conclusion Dynamic Relation Tree (DRT), which is generated by applying various linguistics-driven rules, can significantly improve the performance over currently used tree spans for relation extraction. Integrating entity-related semantic information into DRT can further improve the performance, esp. when they are attached at the top node of the tree. The last section is conclusion. From the previous experimental results, we can draw the following conclusions.

Future Work we will focus on semantic matching in computing the similarity between two parse trees, where semantic similarity between content words (such as “hire” and “employ”) would be considered to achieve better generalization. As to further work, we will…

References Bunescu R. C. and Mooney R. J. 2005. A Shortest Path Dependency Kernel for Relation Extraction. EMNLP-2005 Chianiak E. 2001. Intermediate-head Parsing for Language Models. ACL-2001 Collins M. and Duffy N. 2001. Convolution Kernels for Natural Language. NIPS-2001 Collins M. and Duffy, N. 2002. New Ranking Algorithm for Parsing and Tagging: Kernel over Discrete Structure, and the Voted Perceptron. ACL-02 Culotta A. and Sorensen J. 2004. Dependency tree kernels for relation extraction. ACL’2004. Joachims T. 1998. Text Categorization with Support Vector Machine: learning with many relevant features. ECML-1998 Moschitti A. 2004. A Study on Convolution Kernels for Shallow Semantic Parsing. ACL-2004 Zelenko D., Aone C. and Richardella A. 2003. Kernel Methods for Relation Extraction. Journal of MachineLearning Research. 2003(2): 1083-1106 Zhang M., , Zhang J. Su J. and Zhou G.D. 2006. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. COLING-ACL’2006. Zhao S.B. and Grisman R. 2005. Extracting relations with integrated information using kernel methods. ACL’2005. Zhou G.D., Su J., Zhang J. and Zhang M. 2005. Exploring various knowledge in relation extraction. ACL’2005.

End Thank You!