Presentation is loading. Please wait.

Presentation is loading. Please wait.

Named Entity Tagging with Conditional Random Fields

Similar presentations


Presentation on theme: "Named Entity Tagging with Conditional Random Fields"— Presentation transcript:

1 Named Entity Tagging with Conditional Random Fields
Ryan McDonald, Fernando Pereira and Fei Sha Computer and Information Science University of Pennsylvania

2 Goals Improve on the results of the current NE tagger used by UPenn ACE Accomplish through Conditional Random Field Model (Lafferty et al. 2001) Compare MaxEnt and CRFs in a controlled environment

3 ACE Definition Find entities and classify them as Person, GPE, Organization, Location and/or Facility “Bush took over the White House from the Clinton Administration” Bush: Person White House: Facility, GPE The Clinton Administration: Organization Clinton: Person

4 MaxEnt vs. CRFs Ran an MEMM tagger and a CRF tagger with:
The exact same features Exact same training algorithm (limited memory quasi-Newton) Exact same training data and test data Have not used Sept. test data yet since more improvements on the way

5 Features Word: Unigram*
1-suffix, 2-suffix, 3-suffix and 4-suffix: Unigram and Bigram Word length bins: Unigram and bigram Word features defined by Tom's script: Caps, Numeric, etc.* * used in original ACE system

6 MEMM vs. CRF Same feature set Same training algorithm

7 ACE vs. CRF Different feature sets (CRF is richer)

8 Summary These results and (Sha 2002) show that CRFs perform slightly better than MEMMs Richer feature set leads to larger improvement Portable CRF, MEMM code Congugate Gradient, Limited Memory Quasi-Newton, Perceptron

9 Future and Current Work
“Person” and “Organization” recall Multilayer taggers Name lists Document class information

10 Multilayer Taggers If entity information known, can lead to a 10-20% increase in F-Score First layer of tagger attempts to find generic entities Can achieve around F-Score of 0.87 Second layer uses entity information as feature for each category classifier Leads to about a 2-5% increase in F-Score

11 Name Lists Aim is to increase Recall results for person and organization categories Name list size: 80,000 Organization list size: 30,000 Binary feature: is token in name list? Increase Person F-Score to (From 0.755) Binary feature: is token in organization list? Increase Person F-Score to (From 0.569)

12 Name Lists Small name lists can lead to a substantial improvement in F-Score Even features were simplistic Investigating better name lists MT name list of 500,000 names and 50,000 orgs Investigating more sophisticated features frequency

13 Document Class Features
“Atlanta defeated Florida in extra innings ...” Atlanta and Florida should be tagged as organizations Mistakenly tagged as GPE If document classified as SPORTS, NE classifier may recognize things normally tagged GPE should be orgs Currently beginning to look at state of the art document classification algorithms Could provide a richer source of knowledge


Download ppt "Named Entity Tagging with Conditional Random Fields"

Similar presentations


Ads by Google