Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 14 Relation Extraction

Similar presentations


Presentation on theme: "Lecture 14 Relation Extraction"— Presentation transcript:

1 Lecture 14 Relation Extraction
CSCE Natural Language Processing Lecture 14 Relation Extraction Topics Relation Extraction Readings: Chapter 22 NLTK March 4, 2013

2 Overview Last Time Today Readings NER Relation extraction
NLTK Chunking Example 7.4 (code_chunker1.py), chinking Example 7.5 (code_chinker.py) Evaluation Example 7.8 (code_unigram_chunker.py) Example 7.9 (code_classifier_chunker.py) Today Relation extraction ACE: Freebase, DBPedia Ontological relations Rules for IS-A extracting Supervised Relation Extraction for relations Relation Bootstrapping Unsupervised relation extraction NLTK 7.5 Named Entity Recognition Readings NLTK Ch

3 Dear Dr. Mathews, I have the following questions: 1
Dear Dr. Mathews, I have the following questions: 1. (c) Do you need the regular expression that will capture the link inside href="..."? (d) What kind of description you want? It is a python function with no argument. Do you want answer like that? 3. (f-g) Do you mean top 100 in terms of count? 4.(e-f) You did not show how to use nltk for HMM and Brill tagging. Can you please give an example? -Thanks

4 Relation Extraction What is relation extraction? Founded in 1801 as South Carolina College, USC is the flagship institution of the University of South Carolina System and offers more than 350 programs of study leading to bachelor's, master's, and doctoral degrees from fourteen degree-granting colleges and schools to an enrollment of approximately 45,251 students, 30,967 on the main Columbia campus. … [wiki] complex relation = summarization focus on binary relation predicate(subject, object) or triples <subj predicate obj>

5 Wiki Info Box – structured data
template standard things about Universities Established type faculty students location mascot

6 Focus on extracting binary relations
predicate(subject, object) from predicate logic triples <subj relation object> Directed graphs

7 Why relation extraction?
create new structured KB Augmenting existing: words -> wordnet, facts -> FreeBase or DBPedia Support question answering: Jeopardy Which relations Automated Content Extraction (ACE) 17 relations ACE examples

8 Unified Medical Language System (UMLS)
UMLS: Unified Medical 134 entities, 54 relations

9 UMLS semantic network

10 Current Relations in the UMLS Semantic Network
 isa     associated_with         physically_related_to             part_of             consists_of             contains             connected_to             interconnects             branch_of             tributary_of             ingredient_of         spatially_related_to             location_of             adjacent_to             surrounds             traverses         functionally_related_to             affects                  …                  … temporally_related_to co-occurs_with precedes conceptually_related_to evaluation_of degree_of analyzes assesses_effect_of measurement_of measures diagnoses property_of derivative_of developmental_form_of method_of …

11 Databases of Wikipedia Relations
DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and to make this information readily available DBpedia allows you to make sophisticated queries

12 English version of the DBpedia knowledge base
3.77 million things 2.35 million are classified in an ontology including: including 764,000 persons, 573,000 places (including 387,000 populated places), 333,000 creative works (including 112,000 music albums, 72,000 films and 18,000 video games), 192,000 organizations (including 45,000 companies and 42,000 educational institutions), 202,000 species and  5,500 diseases.

13 freebase google (freebase wiki)

14 Ontological relations
IS-A hypernym Instance-of has-Part hyponym (opposite of hypernym)

15 How to build extractors

16 Extracting IS_A relation
(Hearst 1992) Atomatic Acquisition of hypernyms Naproxen sodium is a nonsteroidal anti-inflammatory drug (NSAID). [wiki]

17 Hearst's Patterns for IS-A extracting
Patterns for <X IS-A Y> “Y such as X” “X or other Y” “Y including X” “Y, especially X”

18 Extracting Richer Relations
Extracting Richer Relations Using Specific Rules Intuition: relations that commonly hold: located-in, cures, owns What relations hold between two entities

19 Fig 22.16 Pattern and Bootstrapping

20 Hand-built patterns for relations
Hand-built patterns for relations Pros Cons

21 Supervised Relation Extraction
How to do Classification is supervise relation extraction 1 find all pairs of named entities 2. decides if they are realted 3,

22 ACE- Automated Content Extraction
Linguistic Data Consortium Entity Detection and Tracking (EDT) is Relation Detection and Characterization (RDC) Event Detection and Characterization (EDC) 6 classes of relations 17 overall

23 Word features for relation Extraction
Headwords of M1 and M2 Named Entity Type and Mention Level Features for relation extraction name, pronoun, nominal

24 Parse Features for Relation Extraction
Parse Features for Relation Extraction base syntatic chuck seq from one to another constituent path Dependency path

25 Gazeteer and trigger word features for relation extraction
Trigger list fo kinship relations Gazeteer: name-list

26 Evaluation of Supervised Relation Extraction
P/R/F Summary + hgh accuracies - training set models are brittle don't generalize well

27 Semi-Supervised Relation Extraction
Seed-based or bootstrapping approaches to RE No training set Can you … do anything? Bootsrapping

28 Relation Bootstrapping
Relation Bootstrapping (Hearst 1992) Gather seed pairs of relation R iterate find sentences with pairs, look at context... use patterns to search for more pairs

29 Bootstrapping Example

30 Extract <author, book> pairs
Dipre: start with seeds Find instances Extract patterns Now iterate

31 Snowball Algorithm Agichtein, Gravano 2000
Snowball Algorithm by Agichtein, Gravano 2000 Distant supervision Distant supervision paradigm Like classified

32 Unsupervised relation extraction
Banko et al 2007 “Open information extraction from the Web” Extracting relations from the web with no training data no predetermined list of relations The Open Approach Use parse data to train a “trust-worthy” classifier Extract trustworthy relations among NPs Rank relations based on text redundancy

33 Evaluation of Semi-supervised and Unsupervised RE
No gold std ... the web is not tagged no way to compute precision or recall Instead only estimate precision draw sample check precision manually alternatively choose several levels of recall and check the precision there No way to check the recall? randomly select text sample and manually check

34 NLTK Info. Extraction .

35 NLTK Review NLTK 7.1-7.3 Chunking Example 7.4 (code_chunker1.py),
chinking Example 7.5 (code_chinker.py) simple re_chunker Evaluation Example 7.8 (code_unigram_chunker.py) Example 7.9 (code_classifier_chunker.py

36 Review 7.4: Simple Noun Phrase Chunker
grammar = r""" NP: {<DT|PP\$>?<JJ>*<NN>} # chunk determiner/possessive, adjectives and nouns {<NNP>+} # chunk sequences of proper nouns """ cp = nltk.RegexpParser(grammar) sentence = [("Rapunzel", "NNP"), ("let", "VBD"), ("down", "RP"), ("her", "PP$"), ("long", "JJ"), ("golden", "JJ"), ("hair", "NN")] print cp.parse(sentence)

37 (S (NP Rapunzel/NNP) let/VBD down/RP (NP her/PP$ long/JJ golden/JJ hair/NN))

38 Review 7.5: Simple Noun Phrase Chinker
grammar = r""" NP: {<.*>+} # Chunk everything }<VBD|IN>+{ # Chink sequences of VBD and IN """ sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")] cp = nltk.RegexpParser(grammar) print cp.parse(sentence)

39 >>> (S (NP the/DT little/JJ yellow/JJ dog/NN) barked/VBD at/IN (NP the/DT cat/NN))

40 RegExp Chunker – conll2000 import nltk from nltk.corpus import conll2000 cp = nltk.RegexpParser("") test_sents = conll2000.chunked_sents('test.txt', chunk_types=['NP']) print cp.evaluate(test_sents) grammar = r"NP: {<[CDJNP].*>+}" cp = nltk.RegexpParser(grammar)

41 ChunkParse score: IOB Accuracy: 43. 4% Precision: 0. 0% Recall: 0
ChunkParse score: IOB Accuracy: 43.4% Precision: 0.0% Recall: 0.0% F-Measure: 0.0% IOB Accuracy: 87.7% Precision: 70.6% Recall: 67.8% F-Measure: 69.2%

42 Conference on Computational Natural Language Learning
Conference on Computational Natural Language Learning (CoNLL-2000) CoNLL 2013 : Seventeenth Conference on Computational Natural Language Learning

43 Evaluation Example 7.8 (code_unigram_chunker.py)
AttributeError: 'module' object has no attribute 'conlltags2tree'

44 code_classifier_chunker. py NLTK was unable to find the megam file
code_classifier_chunker.py NLTK was unable to find the megam file! Use software specific configuration paramaters or set the MEGAM environment variable. For more information, on megam, see: <

45 7.4 Recursion in Linguistic Structure

46 code_cascaded_chunker
grammar = r""" NP: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN PP: {<IN><NP>} # Chunk prepositions followed by NP VP: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments CLAUSE: {<NP><VP>} # Chunk NP, VP """ cp = nltk.RegexpParser(grammar) sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"), ("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")] print cp.parse(sentence)

47 >>> (S (NP Mary/NN) saw/VBD (CLAUSE (NP the/DT cat/NN) (VP sit/VB (PP on/IN (NP the/DT mat/NN)))))

48 A sentence having deeper nesting
sentence = [("John", "NNP"), ("thinks", "VBZ"), ("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"), ("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")] print cp.parse(sentence) (S (NP John/NNP) thinks/VBZ (NP Mary/NN) saw/VBD (CLAUSE (NP the/DT cat/NN) (VP sit/VB (PP on/IN (NP the/DT mat/NN)))))

49 Trees print tree4[1] (VP chased (NP the rabbit)) tree4[1].node 'VP‘ tree4.leaves() ['Alice', 'chased', 'the', 'rabbit'] tree4[1][1][1] ‘rabbitt’ tree4.draw()

50 Trees - code_traverse.py
def traverse(t): try: t.node except AttributeError: print t, else: # Now we know that t.node is defined print '(', t.node, for child in t: traverse(child) print ')', t = nltk.Tree('(S (NP Alice) (VP chased (NP the rabbit)))') traverse(t)

51 NLTK 7.5 Named Entity Recognition
sent = nltk.corpus.treebank.tagged_sents()[22] print nltk.ne_chunk(sent, binary=True)

52


Download ppt "Lecture 14 Relation Extraction"

Similar presentations


Ads by Google