Longhua Qian School of Computer Science and Technology

Exploiting Constituent Dependencies for Tree Kernel-based Semantic Relation Extraction
Longhua Qian School of Computer Science and Technology Soochow University, Suzhou, China 19 Aug. 2008 COLING 2008, Manchester, UK Good morning, everyone! It’s my great pleasure to share my research experience with everyone here. My name is .. And I am from Soochow university in china. My topic is …

Outline 1. Introduction 2. Related Work
3. Dynamic Syntactic Parse Tree 4. Entity-related Semantic Tree 5. Experimental results 6. Conclusion and Future Work This is the outline of my presentation. The first section is … The second one is … The third one is … The forth one is … The fifth one is … And the last one is …

1. Introduction Information extraction is an important research topic in NLP. It attempts to find relevant information from a large amount of text documents available in digital archives and the WWW. Information extraction by NIST ACE Entity Detection and Tracking (EDT) Relation Detection and Characterization (RDC) Event Detection and Characterization (EDC) First Let’s have a look at the introduction. … According to NIST ACE definition, Information Extraction subsumes three following subtasks EDT means .. RDC means … And EDC means … Our focus is on RDC, that is, relation extraction in general.

RDC Function RDC detects and classifies semantic relationships (usually of predefined types) between pairs of entities. Relation extraction is very useful for a wide range of advanced NLP applications, such as question answering and text summarization. E.g. The sentence “Microsoft Corp. is based in Redmond, WA” conveys the relation “GPE-AFF.Based” between “Microsoft Corp” (ORG) and “Redmond” (GPE).

2. Related work Feature-based methods Kernel-based methods
have dominated the research in relation extraction over the past years. However, relevant research shows that it’s difficult to extract new effective features and further improve the performance. Kernel-based methods compute the similarity of two objects (e.g. parse trees) directly. The key problem is how to represent and capture structured information in complex structures, such as the syntactic information in the parse tree for relation extraction. Typically, there exist two approaches to relation extraction

Kernel-based related work
Zelenko et al. (2003), Culotta and Sorensen (2004), Bunescu and Mooney (2005) described several kernels between shallow parse trees or dependency trees to extract semantic relations. Zhang et al. (2006), Zhou et al. (2007) proposed composite kernels consisting of a linear kernel and a convolution parse tree kernel, with the latter effectively capture structured syntactic information inherent in parse trees. kernel-based methods for relation extraction include the following work.

Structured syntactic information
A tree span for relation instance part of a parse tree used to represent the structured syntactic information including two involved entities. Two currently used tree spans SPT(Shortest Path-enclosed Tree): the sub-tree enclosed by the shortest path linking the two entities in the parse tree (Zhang et al., 2006) CS-SPT(Context-Sensitive Shortest Path-enclosed Tree): Dynamically determined by further extending the necessary predicate-linked path information outside SPT. (Zhou et al., 2007) A tree span …. Currently there are two… One is SPT, … The other one is CS-SPT …

Current problems Noisy information Useful information
Both SPT and CS-SPT may still contain noisy information. In other words, more noise could be pruned away from these tree spans. Useful information CS-SPT only captures part of context-sensitive information only relating to predicate-linked path. That is to say, more information outside SPT/CS-SPT may be recovered so as to discern their relationships. However, there still exist several problems relating to the tree span for relation extraction. One is … The other is …

Our solution Dynamic Syntactic Parse Tree (DSPT)
Based on MCT (Minimum Complete Tree), we exploit constituent dependencies to dynamically prune out noisy information from a syntactic parse tree and include necessary contextual information. Unified Parse and Semantic Tree (UPST) Instead of constructing composite kernels, various kinds of entity-related semantic information, are unified into a Dynamic Parse and Semantic Tree. Our solution to these problems is to construct DSPT and UPST. DSPT is Dynamic Syntactic Parse Tree, … UPST is Unified Parse and Semantic Tree, …

3. Dynamic Syntactic Parse Tree
Motivation of DSPT Dependency plays a key role in relation extraction, e.g. the dependency tree (Culotta and Sorensen, 2004) or the shortest dependency path (Bunescu and Mooney, 2005). Constituent dependencies In a parse tree, each CFG rule has the following form: P  Ln…L1 H R1…Rm Where the parent node P depends on the head child H, this is what we call constituent dependency. Our hypothesis stipulates that the contribution of the parse tree to establishing a relationship is almost exclusively concentrated in the path connecting the two entities, as well as the head children of constituent nodes along this path. Now, let’s turn to the third section--DSPT. Dependency plays … On the other hand, … therefore, our hypothesis …

Generation of DSPT Starting from the Minimum Complete Tree, along the path connecting two entities, the head child of every node is found according to various constituent dependencies. Then the path nodes and their head children are kept while any other nodes are removed from the parse tree. Eventually we arrive at a tree span called Dynamic Syntactic Parse Tree (DSPT) Let’s look at the generation of DSPT

Constituent dependencies (1)
Modification within base-NPs Base-NPs do not directly dominate an NP themselves Hence, all the constituents before the headword may be removed from the parse tree, while the headword and the constituents right after the headword remain unchanged. Modification to NPs Contrary to the first one, these NPs are recursive, meaning that they contain another NP as their child. They usually appear as follows: NP  NP SBAR [relative clause] NP  NP VP [reduced relative] NP  NP PP [PP attachment] In this case, the right side (e.g. “NP VP”) can be reduced to the left hand side, which is exactly a single NP. Constituent dependencies can be classified into 5 categories according to constituent types of the CFG rules:

Constituent dependencies (2)
Arguments/adjuncts to verbs: This type includes the CFG rules in which the left side contains S, SBAR or VP. Both arguments and adjuncts depend on the verb and could be removed if they are not included in the path connecting the two entities. Coordination conjunctions: In coordination constructions, several peer conjuncts may be reduced into a single constituent, for we think all the conjuncts play an equal role in relation extraction. Modification to other constituents: Except for the above four types, other CFG rules fall into this type, such as modification to PP, ADVP and PRN etc. These cases occur much less frequently than others.

Some examples of DSPT These are some examples of DSPT, Typically
(a) shows how the constituents before the 2nd entity can be removed. (c) shows how the modification to NP (“nominated for …”) can be removed. (e) shows all the conjuncts other than the one containing the entity may be reduced into a single NP. Some examples of DSPT

4.Entity-related Semantic Tree
For the example sentence “they ’re here”, which is excerpted from the ACE RDC 2004 corpus, there exists a relationship “Physical.Located” between the entities “they” [PER] and “here” [GPE.Population-Center]. The features are encoded as “TP”, “ST”, “MT” and “PVB”, which denote type, subtype, mention-type of the two entities, and the base form of predicate verb if existing (nearest to the 2nd entity along the path connecting the two entities) respectively. The following section is about EST. This illustration shows three different kinds of EST setups incorporated with entity types/subtypes, mention types and predicate verb.

Three EST setups (a) Bag of Features (BOF): all feature nodes uniformly hang under the root node, so the tree kernel simply counts the number of common features between two relation instances. (b) Feature-Paired Tree (FPT): the features of two entities are grouped into different types according to their feature names, e.g. “TP1” and “TP2” are grouped to “TP”. This tree setup is aimed to capture the additional similarity of the single feature combined from different entities, i.e., the first and the second entities. (c) Entity-Paired Tree (EPT): all the features relating to an entity are grouped to nodes “E1” or “E2”, thus this tree kernel can further explore the equivalence of combined entity features only relating to one of the entities between two relation instances.

Construction of UPST Motivation How
we incorporate the EST into the DSPT to produce a Unified Parse and Semantic Tree (UPST) to investigate the contribution of the EST to relation extraction. How Detailed evaluation (Qian et al., 2007) indicates that the kernel achieves the best performance when the feature nodes are attached under the top node. Therefore, we also attach three kinds of entity-related semantic trees (i.e. BOF, FPT and EPT) under the top node of the DSPT right after its original children. Then, we look into the construction of UPST

5. Experimental results Corpus Statistics Corpus processing
The ACE RDC 2004 data contains 451 documents and 5702 relation instances. It defines 7 entity major types, 7 major relation types and 23 relation subtypes. Evaluation is done on 347 (nwire/bnews) documents and 4307 relation instances using 5-fold cross-validation. Corpus processing parsed using Charniak’s parser (Charniak, 2001) Relation instances are generated by iterating over all pairs of entity mentions occurring in the same sentence. The fifth section is about experimental results. The corpus we used is the ACE RDC 2004 dataset, this dataset contains … For comparison purposes, evaluation is done…. First the corpus is parsed … Then relation instances … Entity major types: PER, ORG, GPE, LOC, FAC, VEH, WEA Relation major types: PHY, PER-SOC, EMP-ORG, ART, OTHER-AFF, GPE-AFF, DISC

Classifier Tools One vs. others strategy SVMLight (Joachims 1998)
Tree Kernel Toolkits (Moschitti 2004) The training parameters C (SVM) and λ (tree kernel) are also set to 2.4 and 0.4 respectively. One vs. others strategy which builds K basic binary classifiers so as to separate one class from all the others. The tools we used include .. by …and... by … For comparison purposes, the training … And for efficiency consideration, we also apply one vs. others strategy…

Contributions of various dependencies
Two modes: --[M1] Respective: every constituent dependency is individually applied on MCT. --[M2] Accumulative: every constituent dependency is incrementally applied on the previously derived tree span, which begins with the MCT and eventually gives rise to a Dynamic Syntactic Parse Tree (DSPT). Dependency types P R F MCT (baseline) 75.1 53.8 62.7 Modification within base-NPs 76.5 (76.5) 59.8 (59.8) 67.1 (67.1) Modification to NPs 77.0 (76.2) 63.2 (56.9) 69.4 (65.1) Arguments/adjuncts to verb 77.1 (76.1) 63.9 (57.5) 69.9 (65.5) Coordination conjunctions 77.3 (77.3) 65.2 (55.1) 70.8 (63.8) Other modifications 77.4 (75.0) 65.4 (53.7) 70.9 (62.6) This table indicates the contribution of various dependencies on the major relation types in the ACE RDC 2004 corpus.

Contributions of various dependency
The table shows that the final DSPT achieves the best performance of 77.4%/65.4%/70.9 in precision/recall/F-measure respectively after applying all the dependencies, with the increase of F-measure by 8.2 units over the baseline MCT. This indicates that reshaping the tree by exploiting constituent dependencies may significantly improve extraction accuracy largely due to the increase in recall. And modification within base-NPs contributes most to performance improvement, acquiring the increase of F-measure by 4.4 units. This indicates the local characteristic of semantic relations, which can be effectively captured by NPs around the two involved entities in the DSPT.

Comparison of different UPST setups
Tree Setups P R F DSPT 77.4 65.4 70.9 UPST (BOF) 80.4 69.7 74.7 UPST (FPT) 80.1 70.7 75.1 UPST (EPT) 79.9 70.2 74.8 Compared with DSPT, Unified Parse and Semantic Trees (UPSTs) significantly improve the F-measure by average ~4 units due to the increase both in precision and recall. Among the three UPSTs, UPST (FPT) achieves slightly better performance than the other two setups. This tables compares the performance of different UPST setups, i.e. …, … and … It shows that … This means that they can effectively capture both the structured syntactic information and the entity-related semantic features. And … This suggests that additional bi-gram entity features captured by FPT are more useful than tri-gram entity features captured by EPT.

Improvements of different tree setups over SPT
CS-SPT over SPT 1.5 1.1 1.3 DSPT over SPT 0.1 5.6 3.8 UPST(FPT) over SPT 10.9 8.0 It shows that Dynamic Syntactic Parse Tree (DSPT) outperforms both SPT and CS-SPT setups. Unified Parse and Semantic Tree with Feature-Paired Tree performs best among all tree setups. This tables compares the performance improvements of different tree setups over the original SPT. It shows that … And … This implies that the entity-related semantic information is very useful and contributes much when they are incorporated into the parse tree for relation extraction.

Comparison with best-reported systems
Systems (composite) P R F Systems (single) Ours: Composite kernel 83.0 72.0 77.1 CTK with UPST 80.1 70.7 75.1 Zhou et al.: 82.2 70.2 75.8 Zhou et al.: CS-CTK with CS-SPT 81.1 66.7 73.2 Zhang et al.: 76.1 68.4 72.1 CTK with SPT 74.1 62.4 67.7 Zhao and Grishman 69.2 70.5 70.4 Finally, we compare our system with the best-reported relation extraction systems on the ACE RDC 2004 corpus. … It shows that Our composite kernel achieves the so far best performance. And our UPST performs best among tree setups using one single kernel, and even better than the two previous composite kernels.

6. Conclusion Dynamic Syntactic Parse Tree (DPST), which is generated by exploiting constituent dependencies, can significantly improve the performance over currently used tree spans for relation extraction. In addition to individual entity features, combined entity features (especially bi-gram) contribute much when they are integrated with a DPST into a Unified Parse and Semantic Tree. The last section is conclusion. From the previous experimental results, we can draw the following conclusions.

Future Work we will focus on improving performance of complex structured parse trees, where the path connecting the two entities involved in a relationship is too long for current kernel methods to take effect. Our preliminary experiment of applying some discourse theory exhibits certain positive results. As to further work, we will…

References Bunescu R. C. and Mooney R. J A Shortest Path Dependency Kernel for Relation Extraction. EMNLP-2005 Chianiak E Intermediate-head Parsing for Language Models. ACL-2001 Collins M. and Duffy N Convolution Kernels for Natural Language. NIPS-2001 Collins M. and Duffy, N New Ranking Algorithm for Parsing and Tagging: Kernel over Discrete Structure, and the Voted Perceptron. ACL-02 Culotta A. and Sorensen J Dependency tree kernels for relation extraction. ACL’2004. Joachims T Text Categorization with Support Vector Machine: learning with many relevant features. ECML-1998 Moschitti A A Study on Convolution Kernels for Shallow Semantic Parsing. ACL-2004 Qian, Longhua, Guodong Zhou, Qiaoming Zhu and Peide Qian Relation Extraction using Convolution Tree Kernel Expanded with Entity Features. PACLIC21 Zelenko D., Aone C. and Richardella A Kernel Methods for Relation Extraction. Journal of MachineLearning Research. 2003(2): Zhang M., , Zhang J. Su J. and Zhou G.D A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. COLING-ACL’2006. Zhao S.B. and Grisman R Extracting relations with integrated information using kernel methods. ACL’2005. Zhou G.D., Su J., Zhang J. and Zhang M Exploring various knowledge in relation extraction. ACL’2005. Zhou, Guodong, Min Zhang, Donghong Ji and Qiaoming Zhu Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information. EMNLP/CoNLL-2007

End Thank You!

Longhua Qian School of Computer Science and Technology

Similar presentations

Presentation on theme: "Longhua Qian School of Computer Science and Technology"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Longhua Qian School of Computer Science and Technology

Similar presentations

Presentation on theme: "Longhua Qian School of Computer Science and Technology"— Presentation transcript:

Similar presentations

About project

Feedback