Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning to Extract Relations from the Web using Minimal Supervision Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University.

Similar presentations


Presentation on theme: "Learning to Extract Relations from the Web using Minimal Supervision Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University."— Presentation transcript:

1 Learning to Extract Relations from the Web using Minimal Supervision Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin Raymond J. Mooney Machine Learning Group Department of Computer Sciences University of Texas at Austin

2 Introduction: Relation Extraction People are often interested in finding relations between entities: –What proteins interact with IRAK1? –Which companies were acquired by Google? –In which city was Mozart born? Relation Extraction (RE) is the task of automatically locating predefined types of relations in text documents. 1

3 Relation Examples: 1)Protein Interactions: 2)Company Acquisitions: 3)People Birthplaces: Introduction: Relation Extraction –The phosphorylation of Pellino2 by activated IRAK1 could trigger the translocation of IRAKs from complex I to II. –Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. –Wolfgang Amadeus Mozart was born to Leopold and Ana Maria Mozart, in the front room of Getreidegasse 9 in Salzburg. 2

4 Motivation: Minimal Supervision Developing an RE system usually requires a significant amount of human effort: –Extraction patterns designed by a human expert [Blaschke et al., 2002]. –Extraction patterns learned from a corpus of manually annotated examples [Zelenko et al., 2003; Culotta and Sorensen, 2004]. A different RE approach: –Extraction patterns learned from weak supervision derived from a significantly reduced amount of human supervision. 3

5 Relation Extraction with Minimal Supervision Human supervision  a handful of pairs of entities known to exhibit (+) or not exhibit (–) a particular relation. Weak supervision  bags of sentences containing the pairs, automatically extracted from a very large corpus. Use bags of sentences in a Multiple Instance Learning framework [Dietterich et al., 1997] to train a relation extraction model. 4

6 Types of Supervision for RE Single Instance Learning (SIL): –A corpus of positive and negative sentence examples, with the two entity names annotated. –A sentence example is positive iff it explicitly asserts the target relationship between the two annotated entities. Multiple Instance Learning (MIL): –A corpus of positive and negative bags of sentences. –A bag is positive iff it contains at least one positive sentence example. 5

7 RE from Web with Minimal Supervision +/  Argument a 1 Argument a 2 +GoogleYouTube +Adobe SystemsMacromedia +ViacomDreamWorks +NovartisEon Labs  YahooMicrosoft  PfizerTeva Example pairs of named entities for R  Corporate Acquisitions. 6

8 Minimal Supervision: Positive bags Use a search engine to extract bags of sentences containing both entities in a pair.  Google, YouTube  S1S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet SnSn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9,

9 Minimal Supervision: Positive bags Use a search engine to extract bags of sentences containing both entities in a pair.  Google, YouTube  S1S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet SnSn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9,

10 Minimal Supervision: Positive bags Use a search engine to extract bags of sentences containing both entities in a pair.  Google, YouTube  S1S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet SnSn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9,

11 Minimal Supervision: Negative Bags Use a search engine to extract bags of sentences containing both entities in a pair.  Yahoo, Microsoft  S1S1 Yahoo is starting to look more like Microsoft and less like the innovative, unified service that got my loyalty in the first place. S2S2 Whatever it is, Yahoo is dashing in front, with Microsoft close behind SnSn Yahoo and Microsoft teamed up on October 12 to make their instant messaging software compatible. 10

12 Minimal Supervision: Negative Bags Use a search engine to extract bags of sentences containing both entities in a pair.  Yahoo, Microsoft  S1S1 Yahoo is starting to look more like Microsoft and less like the innovative, unified service that got my loyalty in the first place. S2S2 Whatever it is, Yahoo is dashing in front, with Microsoft close behind SnSn Yahoo and Microsoft teamed up on October 12 to make their instant messaging software compatible. 11

13 MIL Background: Domains Originally introduced to solve a Drug Activity prediction problem in biochemistry [Dietterich et al., 1997] –Each molecule has a limited set of low energy conformations  bags of 3D conformations. –A bag is positive is at least one of the conformations binds to a predefined target. –MUSK dataset [Dietterich et al., 1997] A bag is positive if the molecule smells “musky”. Content Based Image Retrieval [Zhang et al., 2002] Text categorization [Andrews et al., 03], [Ray et al., 05]. 12

14 MIL Background: Algorithms Axis Parallel Rectangles [Dietterich, 1997] Diverse Density [Maron, 1998] Multiple Instance Logistic Regression [Ray & Craven, 05] Multi-Instance SVM kernels of [Gartner et al., 2002] –Normalized Set Kernel. –Statistic Kernel. 13

15 MIL for Relation Extraction Focus on SVM approaches –Through kernels, can work efficiently with instances that implicitly belong to a high-dimensional feature spaces. –Can reuse existing relation extraction kernels. Multi-Instance kernels of [Gartner et al., 2002] not appropriate when very few bags: –Bags (not instances) are considered as training examples. –The number of SVs is upper bounded by the number of bags –Very few bags  very few SVs  insufficient capacity. 14

16 MIL for Relation Extraction A simple approach to MIL is to transform it into a standard supervised learning problem: –Apply the bag label to all instances inside the bag. –Train a standard supervised algorithm on the transformed dataset. –Despite class noise, obtains competitive results [Ray & Craven, 05]  Google, YouTube  S1S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet SnSn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9,

17 MIL for Relation Extraction A simple approach to MIL is to transform it into a standard supervised learning problem: –Apply the bag label to all instances inside the bag. –Train a standard supervised algorithm on the transformed dataset. –Despite class noise, obtains competitive results [Ray & Craven, 05]  Google, YouTube  S1S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet SnSn Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9,

18 SVM Framework with MIL Supervision minimize: subject to: 17

19 SVM Framework with MIL Supervision minimize: subject to: Regularization term 18

20 SVM Framework with MIL Supervision minimize: subject to: Error on positive bags 19

21 SVM Framework with MIL Supervision minimize: subject to: Error on negative bags 20

22 SVM Framework with MIL Supervision minimize: subject to: c p, c n > 0, c p + c n = 1, controls the relative influence that false negative vs. false positives have on the objective function. want c p < 0.5 (penalize false negatives less than false positives); used c p =

23 SVM Framework with MIL Supervision minimize: subject to: Dual formulation  kernel between bag instances K(x 1,x 2 )   (x 1 )  (x 2 ). Use SSK  a subsequence kernel customized for relation extraction. [Bunescu & Mooney, 2005] 22

24 The Subsequence Kernel for Relation Extraction Implicit features are sequences of words anchored at the two entity names.  e 1  … bought …  e 2  … billion … deal.  s   s  a word sequence Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. g1 1g1 1g1 1g1 1 g2  3g2  3g2  3g2  3 g3  4g3  4g3  4g3  4 g4  0g4  0g4  0g4  0  x  s  x  an example sentence, containing s as a subsequence [Bunescu & Mooney, 2005].   s (x)  sx   s (x)  the value of feature s in example x 23

25 The Subsequence Kernel for Relation Extraction K(x 1,x 2 )   (x 1 )  (x 2 )  the number of common “anchored” subsequences between x 1 and x 2, weighted by their total gap. Many relations require at least one content word  modify kernel to optionally ignore sequences formed exclusively of stop words and punctuation signs. Kernel is computed efficiently by a generalized version of the dynamic programming procedure from [Lodhi et al., 2002]. [Bunescu & Mooney, 2005]. 24

26 Two Types of Bias The MIL approach to RE differs from other MIL problems in two respects: –The training dataset contains very few bags. –The bags can be very large. These properties lead to two types of bias: –[Type I] Combinations of words that are correlated to the two relation arguments are given too much weight in the learned model. –[Type II] Words specific to a particular relation instance are given too much weight. 25

27 Type I Bias  Google, YouTube  S1S1 Search engine giant Google has bought video-sharing website YouTube in a controversial $1.6 billion deal. S2S2 The companies will merge Google's search expertise with YouTube's video expertise, pushing what executives believe is a hot emerging market of video offered over the Internet. Overweighted Patterns: – search …  e 1  … video …  e 2  – …  e 1  … video …  e 2  –  e 1  … search …  e 2  –  e 1  … search …  e 2  … video 26

28 Type II Bias  Google, YouTube  S1S1 Ever since Google paid $1.65 billion for YouTube in October, plenty of pundits  from Mark Cuban to yours truly  have been waiting for the other shoe to drop. S2S2 Google Gobbles Up YouTube for $1.6 BILLION  October 9, 2006 S3S3 Google has acquired social media company YouTube for $1.65 billion in a stock-for-stock transaction as announced by Google Inc. on October 9, Overweighted Patterns: – …  e 1  … for …  e 2  … October – …  e 1  … has …  e 2  … October 27

29 A Solution for Type I Bias Use the SSK approach, with new feature weight: Modify subsequence kernel computations to use word weights  (w). Want small  (w) for words w correlated with either of the two relation arguments. 28

30 A Solution for Type I Bias: Word Weights 29 Use a formula for word weights  (w) that discounts the effect of correlations of w with either of the two arguments a 1 and a 2.

31 A Solution for Type I Bias: Word Weights The # of sentences in bag X. 30

32 A Solution for Type I Bias: Word Weights The # of sentences in bag X that contain word w. 31

33 A Solution for Type I Bias: Word Weights The probability that the word w appears in a sentence due only to the presence of X.a 1 or X.a 2, assuming X.a 1 and X.a 2 are independent causes for w. P(w|a) is the probability that w appears in a sentence due to the presence of a. Estimate P(w|a) using counts from a separate bag of sentences containing a. 32

34 MIL Relation Extraction Datasets Given two arguments a 1 and a 2, submit query string “a 1 * * * * * * * a 2 ” to Google. Download the resulting documents (less than 1000). Split text into sentences and tokenize using the OpenNLP package. Keep only sentences containing both a 1 and a 2. Replace closest occurrences of a 1 and a 2 with generic tags  e 1  and  e 2 . 33

35 MIL Relation Extraction Datasets +/  Argument a 1 Argument a 2 Bag size +GoogleYouTube1375 +Adobe SystemsMacromedia622 +ViacomDreamWorks323 +NovartisEon Labs311  YahooMicrosoft163  PfizerTeva247 +PfizerRinat Neuroscience50 (41) +YahooInktomi433 (115)  GoogleApple281  ViacomNBC231 Training Pairs Testing Pairs manually labeled all bag sentences Corporate Acquisitions Dataset 34

36 MIL Relation Extraction Datasets +/  Argument a 1 Argument a 2 Bag size +Franz KafkaPrague522 +Andre AgassiLas Vegas386 +Charlie ChaplinLondon292 +George GershwinNew York260  Luc BessonNew York74  W. A. MozartVienna288 +Luc BessonParis126 (6) +Marie AntoinetteVienna39 (10)  Charlie ChaplinHollywood266  George GershwinLondon104 Training Pairs Person  Birthplace Dataset 35 Testing Pairs manually labeled all bag sentences

37 Experimental Results: Systems [SSK-MIL] MIL formulation using the original SSK. [SSK-T1] MIL formulation with the SSK modified to use word weights in order to reduce Type I bias. [BW-MIL] MIL formulation using a bag-of-words kernel. [SSK-SIL] SIL formulation using the original subsequence kernel: –Use manually labeled instances from the test bags. –Train on instances from one positive bag and one negative bag, test on instances from the other two bags. –Average results over all four combinations. 36

38 Experimental Results: Evaluation 1)Plot Precision vs. Recall (PR) graphs: – vary a threshold on the extraction confidence. 2)Report Area Under PR Curve (AUC). 37

39 Company Acquisitions 38

40 Person–Birthplace 39

41 Experimental Results: AUC SSK-T1 is significantly more accurate than SSK-MIL. SSK-T1 is competitive with SSK-SIL, however: –SSK-T1 supervision  only 6 pairs (4 positive). –SSK-SIL average supervision: ~500 manually labeled sentences (78 positive) for Acquisitions. ~300 manually labeled sentences (22 positive) for Birthplaces. DatasetSSK-MILSSK-T1BW-MILSSK-SIL Company Acquisitions76.9%81.1%45.8%80.4% People Birthplace72.5%78.2%69.2%73.4% 40

42 Applications & Extensions A “Google Sets” system for relation extraction –Ideally, the user provides only positive pairs. –Likely negative examples are created by pairing the argument entity with other named entities in the same sentence. –Any pair of entities different from the relation pair is likely to be negative  implicit negative evidence. GoogleYouTube Adobe SystemsMacromedia ViacomDreamWorks NovartisEon Labs PfizerRinat Neuroscience YahooInktomi InputOutput 41

43 Future Work Investigate methods for reducing Type II bias. Experiment with other, more sophisticated MIL algorithms. Explore the effect of Type I and Type II bias when using dependency information in the relation extraction kernel. 42

44 Conclusion Presented a new approach to Relation Extraction, trained using only a handful of pairs of entities known to exhibit or not exhibit the target relationship. Extended an existing subsequence kernel to resolve problems caused by the minimal supervision provided. The new MIL approach is competitive with its SIL counterpart that uses significantly more human supervision. 43


Download ppt "Learning to Extract Relations from the Web using Minimal Supervision Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University."

Similar presentations


Ads by Google