Presentation is loading. Please wait.

Presentation is loading. Please wait.

EBI is an Outstation of the European Molecular Biology Laboratory. Bibliography 2.0: A case study from the Wellcome Trust Genome Campus Dr. Duncan Hull.

Similar presentations

Presentation on theme: "EBI is an Outstation of the European Molecular Biology Laboratory. Bibliography 2.0: A case study from the Wellcome Trust Genome Campus Dr. Duncan Hull."— Presentation transcript:

1 EBI is an Outstation of the European Molecular Biology Laboratory. Bibliography 2.0: A case study from the Wellcome Trust Genome Campus Dr. Duncan Hull European Bioinformatics Institute, e-Science workshop: The influence and impact of Web 2.0 on various applications 11th-12th May 2010, Edinburgh

2 Overview Introduction: Wellcome Trust Genome Campus The European Bioinformatics Institute ( The Wellcome Trust Sanger Institute ( The Library Problem: economics and freakonomics of publishing The unintended consequences of publish or perish Burying data in publication silos Obscuring identities and obstructing social applications Solution? Bibliography 2.0 with citeulike Incentives Disincentives Case study: What weve learnt Conclusions and future work

3 EBI is an Outstation of the European Molecular Biology Laboratory. Wellcome to the Genome Campus Home of The European Bioinformatics Institute The Sanger Institute Just outside Cambridge, UK

4 EBI: a data hub for bioinformatics in Europe Literature DNA +RNA sequences Genomes: Transcriptomes e.g. ArrayExpress Protein structure Protein domains, families Pathways Systemsbiomodel s.netbiomodel Small molecules mbl Protein sequence Protein protein interactions ~400 staff (research/services), publishing data on the web

5 e.g. Chemical Entities of Biological Interest (ChEBI) Free database /ontology of 500,000 small molecules (many drugs)

6 The Wellcome Trust Sanger Institute The Sanger Institute is a world leading genome research institute using DNA sequencing to further understanding of gene function in health and disease funded by charity (The Wellcome Trust) From THE human genome ten years ago to 1000 genomes today 2010 More Bio than Informatics (c.f. EBI) with progressive approach to Web 2.0 e.g. Daub, J., et al (2008). The RNA wikiproject: community annotation of RNA families. RNA 14 (12), DOI: /rna DOI: /rna ct_RNA ct_RNA Alex Bateman ~900 Sanger staff (total)

7 Shared Library Annual Journal subscription budget £500,000 (modest compared to multi million pound journal budgets of university libraries) More later

8 ) People respond to incentives, although not necessarily in ways that are predictable and manifest. Therefore, one of the most powerful laws in the universe is the law of unintended consequences. This applies to schoolteachers and Realtors and crack dealers as well as expectant mothers, sumo wrestlers, bible salesman, and the Ku Klux Klan… …and scientists too…

9 Unintended consequences, an example Incentive: publish or perish Publications are rewarded with recognition, hiring, promotion, tenure, fame, funding, fortune, prizes, job satisfaction etc Unintended consequences: Valuable data gets damaged, destroyed or buried (see later) Inaccessible to data and text mining on the Web Copyright and toll-access journals Luddite scientists Minimal exploitation of social software for sharing data Minimal exploitation of Web 2.0 for sharing data

10 Gene names: e.g. Hexokinase, HK1, HK2, HK3 Protein names: e.g. Hexokinase, HK1, HK2, HK3 Chemical names: e.g. Glucose-6-phosphate, G6P, Glu, Gluc Author names: e.g. Mark Baker (see next slide) Poor precision and recall Why bury it [data] first and then mine it again? Barend Mons, Wikiproteins Which gene did you mean? BMC Bioinformatics Jun 7;6:142 DOI: /

11 Identity crisis: Mark Baker etc Until we have unique author identifiers, it is difficult or impossible to reliably find the papers published by a particular person Open Researcher and Contributor ID Tell me whenever Mark Baker publishes a paper

12 Social information (need identity for this) Socialisation: (e-science > we-science) How many other people have read this paper? What are my friends / enemies reading? What other papers did they also read? Personalisation (e-science > me-science) These are my publications This is my bibliography (stuff Im reading / have read) Digital libraries document-centred rather than people-centred Author name disambiguation in MEDLINE by: Vetle I. Torvik, Neil R. Smalheiser ACM Trans. Knowl. Discov. Data, Vol. 3, No. 3. (2009), pp DOI: / DOI: /

13 A solution, Lack of personalisation of library data Lack of socialisation of library data Works a lot like

14 Click Post to Citeulike

15 Tag it (optional) e.g. author tags

16 Journal picks is a group of 40+ invited users on campus, who select interesting papers

17 ,016 unique articles in journal picks (less than one year) 3,880,055 unique articles total

18 Citeulike + ZeitGeist = CiteGeist

19 Citeulike incentives Selfish scientist (just organise my reference mess) Whats popular (interesting stuff CiteGeist) Serendipity (find papers you wouldnt find normally) Increase visibility and PageRank of papers? Person-centred access points into first / second page of Google results e.g. Has result below fairly high up list,

20 Citeulike disincentives Privacy, dont want to share with rivals (but can make collections private) Citeulike might go bust? But Springer sponsored Parsers are fragile easily (and deliberately) broken by publishers Valuable data in the hands of a commercial company? But Facebook? LinkedIn? Twitter etc? No academic reward for using it publication = finished Social software works best with network effects There are LOTS of other tools that do this…

21 And the rest… iTunes for PDF files of research

22 Giant corporate commercial competitors With significant vested financial interests Scopus ISI WOK Wrote a review of these systems: Hull, D., S. R. Pettifer, and D. B. Kell (2008). Defrosting the digital library: Bibliographic tools for the next generation web. PLoS Comput Biol 4 (10), e DOI: /journal.pcbi DOI: /journal.pcbi

23 Conclusions Publish or perish has some unfortunate and unintended consequences in science Citeulike is an interesting Web 2.0 tool Weve had some success using it (typical long tail) Weak incentives for use by many cultural barriers to adoption Technical barriers to adoption, many tools, messy data Future work Social network analysis, clickthroughs, tag analysis Any other ideas… But the times they are a changin Citeulike or something like it will work much better if/when publishing incentives change over time…

24 Acknowledgements Mark Baker for organising this workshop EBI, Christoph Steinbeck (laboratory head) Carole Goble, University of Manchester The Sanger, Alex Bateman, Frances Martin, Tim Hubbard and all the contributors to the Journal Picks group Richard Cameron, Kevin Emamy and the rest of the citeulike team BBSRC for funding Any questions?

Download ppt "EBI is an Outstation of the European Molecular Biology Laboratory. Bibliography 2.0: A case study from the Wellcome Trust Genome Campus Dr. Duncan Hull."

Similar presentations

Ads by Google