Presentation is loading. Please wait.

Presentation is loading. Please wait.

Author linkage Vetle I. Torvik. PubMed/MEDLINE is topic-driven Articles in MEDLINE are assigned medical subject headings (MeSH) PubMed converts a free.

Similar presentations

Presentation on theme: "Author linkage Vetle I. Torvik. PubMed/MEDLINE is topic-driven Articles in MEDLINE are assigned medical subject headings (MeSH) PubMed converts a free."— Presentation transcript:

1 Author linkage Vetle I. Torvik

2 PubMed/MEDLINE is topic-driven Articles in MEDLINE are assigned medical subject headings (MeSH) PubMed converts a free text into a query that utilizes MeSH One can search for an authors’ last name and initials, restricted by title words, MeSH, 1 st author's affiliation, etc. 4 CANNOT restrict by corresponding author’s affiliation, full first names, or sole, last or first author 4 NOT sufficient when searching for papers by a particular individual

3 Until 2002, MEDLINE author names are encoded by (last name, initials) –1200+ articles with the name JA Smith –hard to find papers by an author with a common name

4 If we knew who published what, we could... 4 Study the structure of the scientific enterprise (e.g. collaboration graphs) 4 Improve citation analysis 4 Link authors across disciplines A and C –to find a collaborator in another field, with expertise (A) complementary to yours (C) –to make a list of invitees for a cross-disciplinary meeting/workshop 4 ETC.

5 Author-ity: A probabilistic model for author name disambiguation 4 Does a pair of author names (sharing last name, first initial), on two different MEDLINE articles, refer to the same individual? 4 Automatically generate two large reference sets of pairs of matching and non-matching papers, unbiasedly representing MEDLINE as a whole 4 Capture multiple aspects of similarity between a pair of articles (title, jrnl, co-authors, MeSH, lang, affl, mid initial, suffix) x = ( 2 0 0 1 1 1 2 0 ) Pr{x|Match}/Pr{x|Non-match} = 22.6 C. Friedman overall probability of match, Pr{Match} = 0.021 Bayes theorem says, Pr{Match|x} = 0.32

6 AUTHOR-ITY INPUT: 1) a last name and first initial 2) click on a Medline article

7 OUTPUT: All articles with that name ranked by decreasing match probability

8 Turns out that... 4 Even though matching papers tend to have much more in common than non-matching ones, 4 Almost 40% of all matching papers have nothing in common other than last name, initials and language, partly because –only 40% of MEDLINE records have affiliations (mostly older ones) –middle initial is often omitted 4 That is, –The pairwise model is not sufficient –MEDLINE information alone is not sufficient

9 Can we partition all of MEDLINE by author-individuals? 4 Using clustering algorithms 4 Using supplemental information –from publishers (EBSCO, OVID, Elsevier,...) full author names affiliations for all authors –on the web automatically recognizing scientists home pages extraction information into database form (e.g. list of publications)

10 Improving accuracy by using clustering algorithms 4 To create clusters of papers by individuals 4 Takes into account higher order of interactions between papers –even though the pair of papers (P 2, P 3 ) have a low match probability, due to paper P 1, they are likely to refer to the same individual P1P1 P2P2 P3P3 0.9 0.2

11 Improving accuracy with supplemental information from publishers web sites and scientists’ home pages 4 Supplemental information can be automatically extracted from the internet 4 Original articles most often encode –full first names –all affiliations for all authors (by superscripts) 4 Scientists’ home pages often include –their affiliation –a list of their publications

12 Author linkage: the story of how Professor Cohen found Professor Gould 4 Professor Cohen, a vascular surgeon, has had number of patients who presented with aortic aneurysm and retinal detachment 4 He searches the literature and finds some articles describing similar cases but nothing directly explaining the connection between the two symptoms 4 He then performs an Arrowsmith search and finds many potential connections among the B-terms like Marfan’s syndrome, Ehler-Danlo’s syndrome, and amyloidosis 4 He wants to find an expert in retinal detachment who would be interested in studying these potential connections with a vascular surgeon

13 Who would be a good candidate collaborator to study these connections? 4 Professor Cohen then –defines the A-literature by narrowing down the retinal detachment literature to include some of the interesting B-terms –defines the C-literature by aortic aneurysm –performs an Arrowsmith author-mode search and finds that Professor Gould has published a number of articles on retinal detachment in relation to several of the interesting B-terms (but not to aortic aneurysm) and has co-authored papers with Dr. Williams who has separately published articles on aortic aneurysms 4 Turns out that Arrowsmith has a link to Professor Gould’s home page with contact information and everything. 4 Professor Cohen then picks up the phone...

14 Four degrees of B-authors 4 0 th - with papers in the direct A  C literature 4 1 st - with papers in both A and C, but not in the direct A  C literature 4 2 nd - with papers in either A or C, but not both and have co-authored with individuals who have papers in the other literature (Professor Gould) 4 3 rd - no papers in either A or C, but have co-authored papers with an A-author and a C-author


16 What type of research are the B-authors conducting? 4 0 th - a research project that crosses A and C? 4 1 st - somewhat disparate research in each of A and C? 4 2 nd - research in A or C, and collaborating with individuals working in the other discipline? 4 3 rd - research in a collaborative discipline (e.g., bioethics, statistics, or bioinformatics)?

17 Who would want to identify B-authors, and why? 4 scientists looking for information related to A and C but is not in the public domain (e.g., raw data, failed experiments, personal research notes) 4 scientists looking for collaborators that are specialists in a different discipline 4 administrators (e.g., program directors for funding agencies) or meeting organizers looking for individuals that may facilitate research collaborations across two disciplines 4 Etc.

18 Are you looking to branch into another discipline? Who ya gonna call?

Download ppt "Author linkage Vetle I. Torvik. PubMed/MEDLINE is topic-driven Articles in MEDLINE are assigned medical subject headings (MeSH) PubMed converts a free."

Similar presentations

Ads by Google