Presentation is loading. Please wait.

Presentation is loading. Please wait.

Who’s Sharing with Who? Acknowledgements-driven identification of resources David Eichmann School of Library and Information Science & Information Science.

Similar presentations


Presentation on theme: "Who’s Sharing with Who? Acknowledgements-driven identification of resources David Eichmann School of Library and Information Science & Information Science."— Presentation transcript:

1 Who’s Sharing with Who? Acknowledgements-driven identification of resources David Eichmann School of Library and Information Science & Information Science Track, Iowa Graduate Program in Informatics

2 Motivation Public information regarding collaboration networks are partial and post-hoc Grants and publications Research profiling systems (e.g., VIVO) primarily feed on the above data Institutional grant tracking systems carry data on attempts at collaboration, but are not open

3 Goals Extend the model to include informal interactions Explore the degree to which sharing of data, resources, etc. can be identified from full text of papers

4 Melissa’s LinkIn Map

5 Holly Falk-Krzesinski’s LinkedIn Map

6 Ferrets in CTSAsearch

7 PubMed Central Open Access 886,172 papers (as of Thursday) 423,764 with acknowledgements 994,931 sentences 4,329,972 parses

8 The Simple Cases PMCID: 3008610 SeqNum: 2 SentNum: 6 Sentence: EK analysed the data. POS: [EK/NNP, analysed/VBD, the/DT, data/NNS,./.] Parse: [S [NP EK/NNP ] [VP analysed/VBD [NP the/DT data/NNS ] ]./. ]

9 And the Not So Simple… PMCID: 4159542 Sentence: We thank Sheila Harvey, Clinical Trials Unit Manager at ICNARC, and Ruth Canter, Trials Administrator at ICNARC, for their assistance in chasing completed surveys; Dr Kevin Gunning for early advice and project development; Drs Neill K. J. Adhikari and Gordon D. Rubenfeld for feedback and discussion of analysis plan; Dr Chris AKY Chong for his valuable comments on the initial draft of this manuscript; and our Responders: Addenbrooke’s Hospital ( Dr Kevin Gunning ), Airedale General Hospital ( Dr John Scriven ), Alexandra Hospital ( Dr Tracey Leach ), Arrowe Park Hospital ( Dr Lawrence Wilson ), Barnet Hospital ( Dr AH Wolff ), … 8,245 character long sentence

10 Syntax Fragment Frequency Approach Walk the syntax trees and for every interior node (basically phrases), generate a syntax fragment of depth 2 [S [NP EK/NNP ] [VP analysed/VBD [NP the/DT data/NNS ] ]./. ] [S [NP NNP ] [VP VBD [NP DT NNS ] ]. ] [NP EK/NNP ] [VP VBD [NP DT NNS ] ] [NP DT NNS ]

11 SFF Approach, con’t. Frequency distribution Fragments / DocumentFrequency 964758 1084604 844521 1024329

12 SFF Approach, con’t. Frequency distribution Fragments / DocumentFrequency 147001 110731 95751 85711

13 SFF Approach, con’t. Prior to fragmentation, annotate nodes with entity classes This is domain-specific and run-time extensible [S [NP EK/NNP ] [VP analysed/VBD [NP the/DT data/NNS ] ]./. ] [S [NP:Author NNP:Author ] [VP VBD [NP:Resource ] ]. ]

14 Frequency Distribution of Fragments Total distinct patterns: 4,090,978 1,768,966 [NP:Project DT NN:Project ] 1,074,603 [NP NN ] 725,626 [NP:Author NN:Author ] 657,897 [NP:Author PRP ] 654,904 [NP:Place NNP:Place ] 654,565 [ADVP RB ] 644,590 [NP:Person NNP NN ]

15 Filtering for Top Nodes (Sentences) Total distinct patterns: 523,602 (87% reduction) 600,618 [S [VP TO [VP ] ] ] 452,753 [S [NP:Project DT NN:Project ] [VP VBD [VP ] ] ] 169,990 [S [NP:Project DT NN:Project ] [VP VBD [VP ] ]. ] 115,543 [S [VP VBG [NP ] ] ] 79,036 [S [NP:Author NN:Author ] [VP NN [NP ] ] ]

16 Filtering for Co-mentions of Authors and Persons Total distinct patterns: 7,870 (98% reduction) 26,703 [S [NP:Author NN:Author ] [VP NN [NP:Person ] [PP ] ]. ] 20,395 [S [NP:Author NN:Author ] [VP NN [NP:Person ] [PP ] ] ] 16,588 [S [NP:Author PRP ] [VP VBP [NP:Person ] [PP ] ]. ] 16,034 [S [NP:Author NN:Author ] [VP NN [NP:Person ] [PP ] [PP ] ]. ] 9,149 [S [NP:Author PRP ] [VP VBP [NP:Person ] [PP ] [PP ] ]. ]

17 Extract Entities/Relationships with Syntactic Queries [S [NP:Author NN:Author ] [VP NN [NP:Person ] [PP ], [PP ] ] ] S <1NP:Author <2[VP <1/thank/ <2(NP) <3(PP) ] For the sentence having this pattern, match the object noun phrase and the next prepositional phrase NP <#2 <1(NNP) <2(NNP) For the noun phrase, extract two proper nouns PP <#2 <1DT <2(NP) For the prepositional phrase, match the noun phrase

18 Person Results Snippet IDTitleFirst NameMiddle NameLast Name 76HansMatrin 77JeffVieira 78P.ZAMORE 79Prof.EricSchon 80CarlosLois 81Andrea Möll 82ElenaGovorkova 83K.M.Pollard 84Dr.MichaelBerton

19 Relationships for Person 77 PMCIDCategoryPP 4006053Supportthe kind gift of rKSHV.219 4006053Supportthe kind gift of rKSHV.219 and for helpful discussions 4006053Collaborationhelpful discussions

20 Relationships for Person 79 PMCIDCategoryPP 2801706Resourcethe rabbit polyclonal antibody 2801706Resourcethe ECFP and EYFP plasmids 4013013Collaborationhis helpful advice and discussions

21 Category Frequencies CategoryCount Collaboration47,052 46,327 Technique33,598 Resource8,894 Support6,836 Event3,744 Project854 Place Name229 Publication Component 210 Place186 Organization93

22 Next Steps Continue slogging through extraction pattern definition Define patterns for funding declarations chairs, fellowships, etc. Merge data into CTSAsearch visualizations Align current category scheme with Melissa Haendel’s current draft ontology for CASRAI taxonomy and then merge with VIVO-ISF

23 Questions?


Download ppt "Who’s Sharing with Who? Acknowledgements-driven identification of resources David Eichmann School of Library and Information Science & Information Science."

Similar presentations


Ads by Google