Named Entity Tagging with Conditional Random Fields

Named Entity Tagging with Conditional Random Fields
Ryan McDonald, Fernando Pereira and Fei Sha Computer and Information Science University of Pennsylvania

Goals Improve on the results of the current NE tagger used by UPenn ACE Accomplish through Conditional Random Field Model (Lafferty et al. 2001) Compare MaxEnt and CRFs in a controlled environment

ACE Definition Find entities and classify them as Person, GPE, Organization, Location and/or Facility “Bush took over the White House from the Clinton Administration” Bush: Person White House: Facility, GPE The Clinton Administration: Organization Clinton: Person

MaxEnt vs. CRFs Ran an MEMM tagger and a CRF tagger with:
The exact same features Exact same training algorithm (limited memory quasi-Newton) Exact same training data and test data Have not used Sept. test data yet since more improvements on the way

Features Word: Unigram*
1-suffix, 2-suffix, 3-suffix and 4-suffix: Unigram and Bigram Word length bins: Unigram and bigram Word features defined by Tom's script: Caps, Numeric, etc.* * used in original ACE system

MEMM vs. CRF Same feature set Same training algorithm

ACE vs. CRF Different feature sets (CRF is richer)

Summary These results and (Sha 2002) show that CRFs perform slightly better than MEMMs Richer feature set leads to larger improvement Portable CRF, MEMM code Congugate Gradient, Limited Memory Quasi-Newton, Perceptron

Future and Current Work
“Person” and “Organization” recall Multilayer taggers Name lists Document class information

Multilayer Taggers If entity information known, can lead to a 10-20% increase in F-Score First layer of tagger attempts to find generic entities Can achieve around F-Score of 0.87 Second layer uses entity information as feature for each category classifier Leads to about a 2-5% increase in F-Score

Name Lists Aim is to increase Recall results for person and organization categories Name list size: 80,000 Organization list size: 30,000 Binary feature: is token in name list? Increase Person F-Score to (From 0.755) Binary feature: is token in organization list? Increase Person F-Score to (From 0.569)

Name Lists Small name lists can lead to a substantial improvement in F-Score Even features were simplistic Investigating better name lists MT name list of 500,000 names and 50,000 orgs Investigating more sophisticated features frequency

Document Class Features
“Atlanta defeated Florida in extra innings ...” Atlanta and Florida should be tagged as organizations Mistakenly tagged as GPE If document classified as SPORTS, NE classifier may recognize things normally tagged GPE should be orgs Currently beginning to look at state of the art document classification algorithms Could provide a richer source of knowledge

Named Entity Tagging with Conditional Random Fields

Similar presentations

Presentation on theme: "Named Entity Tagging with Conditional Random Fields"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Named Entity Tagging with Conditional Random Fields

Similar presentations

Presentation on theme: "Named Entity Tagging with Conditional Random Fields"— Presentation transcript:

Similar presentations

About project

Feedback