Presentation is loading. Please wait.

Presentation is loading. Please wait.

IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:

Similar presentations


Presentation on theme: "IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:"— Presentation transcript:

1 IE with Dictionaries Cohen & Sarawagi

2 Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks: 0 –Projects are due: 4/28 (last day of class) –Additional requirement: draft (for comments) no later than 4/21

3 Finding names you know about Problem: given dictionary of names, find them in email text –Important task beyond email (biology, link analysis,...) –Exact match is unlikely to work perfectly, due to nicknames (Will Cohen), abbreviations (William C), misspellings (Willaim Chen), polysemous words (June, Bill), etc –In informal text it sometimes works very poorly –Problem is similar to record linkage (aka data cleaning, de-duping, merge-purge,...) problem of finding duplicate database records in heterogeneous databases.

4 Finding names you know about Technical problem: –Hard to combine state of the art similarity metrics (as used in record linkage) with state of the art NER system due to representational mismatch: Opening up the box, modern NER systems don’t really know anything about names....

5 IE as Sequential Word Classification Yesterday Pedro Domingos spoke this example sentence. Person name: Pedro Domingos A trained IE system models the relative probability of labeled sequences of words. To classify, find the most likely state sequence for the given words: Any words said to be generated by the designated “person name” state extract as a person name: person name location name background

6 IE as Sequential Word Classification Modern IE systems use a rich representation for words, and clever probabilistic models of how labels interact in a sequence, but do not explicitly represent the names extracted. w t-1 w t O t w t+1 O t +1 O t - 1 identity of word ends in “-ski” is capitalized is part of a noun phrase is in a list of city names is under node X in WordNet is in bold font is indented is in hyperlink anchor last person name was female next two words are “and Associates” … … part of noun phrase is “Wisniewski” ends in “-ski”

7 Semi-Markov models for IE Train on sequences of labeled segments, not labeled words. S=(start,end,label) Build probability model of segment sequences, not word sequences Define features f of segments (Approximately) optimize feature weights on training data f(S) = words x t...x u, length, previous words, case information,..., distance to known name maximize: with Sunita Sarawagi, IIT Bombay

8 Details: Semi-Markov model

9 Segments vs tagging 12345678 Fredpleasestopbymyofficethisafternoon Personother Loc otherTime t 1 =u 1 =1t 2 =2, u 2 =4t 3 =5,u 3 =6t 4 =u 4 =7t 5 =u 5 =8 Fredpleasestopbymyofficethisafternoon PersonotherLocotherTime t x y t,u x y f(x t,y t ) f(x j,y j )

10 Details: Semi-Markov model

11 Conditional Semi-Markov models CMM: CSMM:

12 A training algorithm for CSMM’s (1) Review: Collins’ perceptron training algorithm Correct tags Viterbi tags

13 A training algorithm for CSMM’s (2) Variant of Collins’ perceptron training algorithm: voted perceptron learner for T TRANS like Viterbi

14 A training algorithm for CSMM’s (3) Variant of Collins’ perceptron training algorithm: voted perceptron learner for T TRANS like Viterbi

15 A training algorithm for CSMM’s (3) Variant of Collins’ perceptron training algorithm: voted perceptron learner for T SEGTRANS like Viterbi

16 Viterbi for HMMs

17 Viterbi for SMM

18 Sample CSMM features

19 Experimental results Baseline algorithms: –HMM-VP/1: tags are “in entity”, “other” –HMM-VP/4: tags are “begin entity”, “end entity”, “continue entity”, “unique”, “other” –SMM-VP: all features f(w) have versions for “f(w) true for some w in segment that is first (last, any) word of segment” –dictionaries: like Borthwick HMM-VP/1: f D (w)=“word w is in D” HMM-VP/4: f D,begin (w)=“word w begins entity in D”, etc, etc Dictionary lookup

20 Datasets used Used small training sets (10% of available) in experiments.

21 Results

22

23 Results: varying history

24 Results: changing the dictionary

25 Results: vs CRF

26

27


Download ppt "IE with Dictionaries Cohen & Sarawagi. Announcements Current statistics: –days with unscheduled student talks: 2 –students with unscheduled student talks:"

Similar presentations


Ads by Google