Presentation is loading. Please wait.

Presentation is loading. Please wait.

Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach Hien Nguyen * (Ton Duc Thang University, Vietnam) Tru Cao (Ho Chi.

Similar presentations


Presentation on theme: "Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach Hien Nguyen * (Ton Duc Thang University, Vietnam) Tru Cao (Ho Chi."— Presentation transcript:

1 Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach Hien Nguyen * (Ton Duc Thang University, Vietnam) Tru Cao (Ho Chi Minh City University of Technology, Vietnam) Semantic Web Group (VN-KIM) Faculty of Computer Science & Engineering Ho Chi Minh City University of Technology BK TP.HCM * Email: hien@tut.edu.vn

2 Outline Introduction Wikipedia Algorithm Experimental results Concluding remarks

3 Introduction: Named Entities Named Entities (NE) are considered: people, organizations, locations, date, time, money, measures, percentage, etc. Example “Ms. Washington's candidacy is being championed by several powerful lawmakers including her boss, Chairman John Dingell (D., Mich.) of the House Energy and Commerce Committee.”

4 Introduction: Problem Different NEs may have the same name. “John McCarthy has been a staple of the Ultimate Fighting Championship since its second event on March 11, 1994.” John McCarthy  John McCarthy (referee) “John McCarthy, professor of computer science at Stanford University, who developed LISP.” John McCarthy  John McCarthy (computer scientist) “John McCarthy, Britain's longest-held hostage in Lebanon, has been set free after more than five years in captivity.” John McCarthy  John McCarthy (journalist)

5 Introduction: Motivation Web searches Queries about Named Entities (NEs) constitute a significant portion of popular web queries (Bunescu et al., EACL 2006). ~ 30% of search engine queries include person names (R. Guha et al., WWW 2004) Named entity disambiguation can lead to improve effectiveness of search results on the web for popular named entities. Web-based Information Extraction Identifying exactly NEs in web pages can improve accuracy in IE tasks (e.g. extracting relationships between NEs). Question & Answering Identifying exactly NEs in questions can improve accuracy of answers

6 Introduction: NE Disambiguation Mapping entity names (in a text) to actual entities in a KB of discourse (e.g. Wikipedia). An ambiguous entity names are out of the KB An ambiguous entity names occur in the KB, but they refer to named entities out of the KB An ambiguous entity names refer to two or more than named entities in the KB

7 Introduction: NE disambiguation But much like the first presidential debate held two weeks ago in Oxford, Mississippi, a draw for Obama would be considered a win.

8 Introduction: NE disambiguation Gamsakhurdia is seen as a national hero by those who mourn him Zviad Gamsakhurdia, Georgia's first president after independence from the USSR, has been buried in the capital Tbilisi 14 years after his death.

9 NE disambiguation John McCarthy, 'great man' of computer science, wins major award

10 Introduction: Approach Disambiguation based on context Co-occurring entity names Co-occurring NE identifiers Tokens in a window context centered at a name in consideration Disambiguation based on a KB We view that instances in the KB have two in formation Attributes Relations We represent those instances by their attributes and relations

11 Introduction: Approach All keywords in the window text centred around the ambiguous name The whole text is extended with page titles of the previously identified NEs enclosed Entity page titles Redirecting page titles Category labels Hyperlink labels Text containing ambiguous names Wikipedia article Heuristics +TF-IDF vector similarity

12 Wikipedia Wikipedia is a free encyclopedia written by a collaborative effort of global community of more than 150,000 volunteers These volunteers have contributed more than 11 million articles in 265 languages More than 275 million people visit Wikipedia site every month 2,697,848 articles in English version (visiting Jan 14 th, 2009) 2,697,848English

13 Wikipedia – Pages &Titles Page Titles (ID)

14 Wikipedia – Pages &Titles Disambiguation text

15 Wikipedia – Category Category

16 Wikipedia – Redirect pages Redirect page titles

17 Wikipedia – Hyperlinks Hyperlinks

18 Wikipedia – Hyperlinks Hyperlinks

19 Algorithm Hybrid statistical and rule-based incremental algorithm: Rule-based NE disambiguation Utilizing Wikipedia disambiguation texts E.g. “… Rockville, Maryland …”, disambiguation text Maryland helps identifying Rockville is an area in Maryland

20 Algorithm Rule-based NE disambiguation (cont.) Exploiting coreference relationship between referents: Propagation of the identified NE, if any, along its coreference chain E.g. Extension of the whole text with the Wikipedia entity page titles of the identified NEs On Thursday morning, Sen. Barack Obama warned supporters not to get "cocky," while a few hours later McCain pledged to Pennsylvania voters he would erase Obama's lead by Election Day.

21 Algorithm After Rule-based stage, for remaining ambiguous names, matching the whole text vector with Wikipedia candidate entity pages All keywords in the window text centred around the ambiguous name The whole text is extended with page titles of the previously identified NEs enclosed Entity page titles Redirecting page titles Category labels Hyperlink labels The extracted context surrounding ambiguous names Wikipedia article TF-IDF vector similarity

22 Algorithm

23 Experimental results Experiments: 10 news from CNN on Travel, Entertainment, World, World Business, and Americas

24 Experimental results D1 obtained after running GATE D2 obtained after GATE’s errors corrected

25 Experimental results We measure accuracy as the total number of right assignments NE (in text)/Wiki NE divided by the total number of assignments

26 Experimental results Results:

27 Concluding remarks The proposed method is a hybrid and incremental process that utilizes previously identified NEs and related terms co- occurring with ambiguous names in a text for entity disambiguation Work under investigation: Disambiguating ambiguous cases when ambiguous names occur in a KB, but they refers to named entities out of the KB.

28 Thanks for your attention VN-KIM Group http://www.cse.hcmut.edu.vn/vn-kim/ Contact author: hien@tut.edu.vn or nthien97@yahoo.comhien@tut.edu.vn


Download ppt "Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach Hien Nguyen * (Ton Duc Thang University, Vietnam) Tru Cao (Ho Chi."

Similar presentations


Ads by Google