Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?

Similar presentations


Presentation on theme: "Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?"— Presentation transcript:

1 Data Publication COASP 2012

2 Publications

3 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016? Europe PubMed Central

4 How many open access articles in UKPMC? PubMed (995K) UKPMC (18%,182K) OA (9.6%, 96K)

5 Big Data: Deposition Primary Research articles Big Data: Curated Annotation Managing the public data ecosystem Unstructured Data 1 2 1 2 3

6 Literature citation from data (data annotation)

7 Links from Literature to Databases Proteins Nucleotides OMIM Chemicals Structure Clinical reviews Protein families Protein-protein interactions Gene expression experiments 800 K 370 K 110 K

8 Database crosslinks

9 Data citation from literature (provenance)

10 Semantic TypeUnique TermsArticlesAnnotations Accession No.233,01766,356387,787 Chemical76,7121,694,38583,923,066 Disease171,6921,768,21457,821,871 Gene/Protein227,3181,310,38277,189,022 GO Terms32,6641,832,29465,061,579 Organism180,6371,713,28070,832,222 Text Mining in UKPMC (2.2 million articles)

11 Accession numbers stories: data citation in OA articles Senay KafkasJee-Hyub Kim

12 publisher-annotatedtext-mined Annotation of accession numbers (OA) ~10,000 articles>25,000 articles Névéol A, Wilbur WJ, Lu Z (2012) Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE. Database 2012:bas026 (PMC3371192) Névéol A, Wilbur WJ, Lu Z (2011) Extraction of data deposition statements from the literature: a method for automatically tracking research results. Bioinformatics 27, 3306-3312 (PMC3223368)

13 bmc genomics bmc evolutionary biology the journal of cell biology virology journal bmc microbiology the journal of experimental medicine bmc bioinformatics bmc plant biology the journal of biological chemistry bmc molecular biology plos one acta crystallographica section e: british journal of cancer the journal of cell biology environmental health perspectives nucleic acids research the journal of experimental medicine critical care emerging infectious diseases bmc bioinformatics plos one nucleic acids research bmc genomics bmc evolutionary biology the journal of cell biology plos pathogens bmc bioinformatics virology journal bmc microbiology emerging infectious diseases Most publisher tagsMost articles Most text-mined tags BMC Genomics:1,484 TM tags*,4,337 articles PLoS One: 4,226 TM tags*,42,888 articles Efficacy of Accession number tagging (OA)

14 Scientific: Linking articles that cite the same data Citation: Data Citation as measure of impact (Thomson: Data citation index) Context of data citation: submission, reuse, analysis Operational: Services for publishers to improve Accession number tagging Editorial policies and adherence Extension of NLM DTD Lessons learned for considering unstructured data Why is this important? Implications

15 AY387398: needle in a haystack

16

17 Unstructured data

18 Articles with supplemental data (UKPMC) 235,000 articles (50K+ in 2011) 718, 511 files 459 extensions 0.8 TB (1200 CDs) (However most data in ~60 extension types) % Pub Year

19 Big Data: Deposition Primary Research articles Big Data: Curated Annotation Managing the public data ecosystem Structured links Unstructured Data reuse analysis provenance Open Citable Discoverable Reusable

20 People Paula Buttery Andrew Caines Norman Cobley Yuci Gou Senay Kafkas Jyothi Katuri Oliver Kilian Jee-Hyub Kim Nikos Marinos Jo McEntyre Xingjun Pi Philip Rossiter Rebholz Group Peter Stoehr University of Manchester British Library OpenAIRE/OpenAIRE Plus NCBI, NLM


Download ppt "Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?"

Similar presentations


Ads by Google