Presentation is loading. Please wait.

Presentation is loading. Please wait.

PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications

Similar presentations


Presentation on theme: "PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications"— Presentation transcript:

1 PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications buptliyan@gmail.com

2 Outline Introduction Preprocessing Entity Expansion Pattern bootstrapping Post-processing Evaluation results Conclusion

3 Introduction: the framework

4 Preprocessing NLP (the Standford CoreNLP toolkit)  POS tagger  NER  Date and time expression recognition  Dependency parser  Coreference resolution

5 Preprocessing (cont’) Example: Takeshi Watanabe, the first president of the ADB, died in his native Japan. The categorizations of slots

6 PERORG DomainSlotsDomainSlots PER alternate_names; spouses; children; parents; siblings; other_family PER alternate_names; members; shareholders; founded_by; top_members/emplyees ORGmember_of; employee_of ORG parents; members; member_of; shareholders; subsidiaries LOC country/state/city_of_birth/de ath/residence DATEdate_of_birth/death LOC member_of; country/state/city_of_headqu arters; NUMage ORIorigin RELreligionDATEfounded; dissolved SCHOOLschools_attended NUM number_of_employees/mem bers CAUSEcause_of_death TITLEtitlesURLwebsite CHARGEchargesRELpolitical/religious_affiliation

7

8 Entity Expansion The coreferences and alternate names of an entity exist in relevant documents. In the purpose of improving recall. Scheme 1 (PER & ORG): coreference resolution  The relation chain run by the Stanford CoreNLP.  Example:

9 Entity Expansion (cont’) Scheme 2 (PER & ORG): identifying alternate names  Rule-based information extraction  Interpretative entities in parenthesis  Example:  Starr International Co., known as SICO, …… Scheme 3 (ORG)  Removing the corporate suffixes in queries  Finding the acronyms or full expressions  Example:  Norwegian University of Science and Technology (NTNU)

10

11 Pattern Bootstrapping: Workflow Ralph Grishman and Bonan Min, “New York University KBP 2010 Slot ‐ Filling System”, 2010.

12 Pattern Bootstrapping: Seed Pairs The KBP English Monolingual Slot Filling Evaluation Data in the past three years  92 PER entities  106 ORG entities  1,627 entity-value pairs

13 Word sequence pattern  the middle context between an entity-value pair  Example: PER:countries_of_residence native Dependency path pattern  the shortest dependency path which connects an entity-value pair  Example: PER:title appos PER:member_of appos president prep_of PER:country_of_death nsubj-1 died prep_in Pattern Bootstrapping: Pattern Generation

14 Pattern Bootstrapping: Pattern Evaluation In the purpose of improving precision Pattern frequency Trigger phrase High-confidence patterns New entity-value pairs Iteration

15

16 Post-processing In the purpose of improving precision DATE  The SUTime module of the CoreNLP  TIMEX2 normalization PER: spouses, children and parents  Last name complement  Example: John Doe’s first wife, Ruth “Ruth Doe” is better than “Ruth”.

17 Post-processing (cont’) Identifying countries, states/provinces and cities for LOC slots  A Wikipedia list containing all countries and states or provinces. Adding modifiers into fillers of per: title  adjectival modifier: financial Minister  noun compound modifier: police chief  prepositional modifier: chief of military operations

18 Evaluation Results PRIS Summary Statistics LDCTop-1Top-2Median Precision0.92786070.67573220.489552230.11392405 Recall0.72521060.418664930.212572920.0874919 F10.81411420.51700680.29643020.0989736

19 Slotnon-NIL correctredundantinexactwrongmissing Alternate names600023 Date of birth164011 Date of death171042 age220022 Country of birth10001 State or province of birth80232 City of birth131052 Country of death10020 State or province of death130212 City of death170041 Country of residence102273 State or province of residence2214513 City of residence3510148 origin1620170 Cause of death1800113 Schools attended1970114 titles85138244 Member of26241710 Employee of702520 religion40013 spouses1651310 Children7303106 Parents214014 Siblings200183 Other family20007 Charges50042

20 Slotnon-NIL correctredundantinexactwrongmissing Alternate names4645255 Political/religious affiliations 71063 Top members/employees5912208 Number of employees/members 30008 Members00004 Member of00007 Subsidiaries700310 Parents41044 Founded by50035 Founded50013 Dissolved10002 Country of headquarters300120 State or province of headquarters 110711 City of headquarters200310 Shareholders30180 Website70018

21 Conclusion In the slot filling task of KBP 2012, we designed an enhanced pattern- matching system which consists of preprocessing, entity expansion, pattern bootstrapping and post-processing. The precision and recall are relatively good for some specific slots. It is urgent to improve the remaining slots.

22 Tips Adequate preparation A harmonious team Active and disciplined environment Be passionate, patient and hardworking ……

23


Download ppt "PRIS at Slot Filling in KBP 2012: An Enhanced Adaboost Pattern-Matching System Yan Li Beijing University of Posts and Telecommunications"

Similar presentations


Ads by Google