Presentation is loading. Please wait.

Presentation is loading. Please wait.

Name Disambiguation in Digital Libraries Tan Yee Fan 2005 October 19 WING Group Meeting.

Similar presentations


Presentation on theme: "Name Disambiguation in Digital Libraries Tan Yee Fan 2005 October 19 WING Group Meeting."— Presentation transcript:

1 Name Disambiguation in Digital Libraries Tan Yee Fan 2005 October 19 WING Group Meeting

2 Digital libraries DBLP, Citeseer, etc. Information is stored as metadata records to facilitate searching  Author names  Titles  Publication titles Inconsistency in metadata records hinders searching  Abbreviation of names and publication titles  Typographical errors

3 Are they the same author? Danny Poo  Danny C. C. Poo, Teck-Kang Toh, Christopher S. G. Khoo, Glenn Hong. Development of an Intelligent Web Interface to Online Library Catalog Databases. APSEC 1999: 64-7  Danny Chiang Choon Poo, Isaac K. C. Tan. Design of an Automatic Annotation Framework for Corporate Web Content. APSEC 2004: 384-391 Hui Yang  Maan A. Kousa, Ahmed K. Elhakeem, Hui Yang. Performance of ATM networks under hybrid ARQ/FEC error control scheme. IEEE/ACM Trans. Netw. 7(6): 917-925 (1999)  Hui Yang, Tat-Seng Chua. QUALIFIER: Question Answering by Lexical Fabric and External Resources. EACL 2003: 363-370

4 Who am I, I am who? Author name disambiguation  Given a large number of citations, how to determine which name is which author? Closely related problem: citation matching  Given a large number of citations, how to determine which citations refer to the same papers? Solutions must be scalable  DBLP has more than 660,000 citations  Citeseer has more than 730,000 documents

5 Ideas Idea 1: determine the research field  Unfortunately, paper titles have limited words and some conferences tend to be broad Idea 2: use coauthors information  Likely that an author will collaborate with a selected group of people  This group will likely publish a number of papers together To find the similarity of coauthor lists

6 Forward direction: M. Kan = M.-Y. Kan = Min-Yen Kan Problem  Pairwise comparison on all the coauthor lists is very expensive (few days also cannot finish) Solution  Soft clustering on the coauthor lists using some cheap distance measure  Then perform pairwise comparison within the clusters  What is a good soft clustering algorithm?

7 Backward direction: This Hang Cui is not that Hang Cui Difficult to determine using the metadata alone without external resources  Many authors have several distinct research areas  Each research area with different collaborators Currently investigating what kind of external resource to use  Goooooooooogle for URLs?

8 The end But the research has just begun…


Download ppt "Name Disambiguation in Digital Libraries Tan Yee Fan 2005 October 19 WING Group Meeting."

Similar presentations


Ads by Google