Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The.

Similar presentations


Presentation on theme: "Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The."— Presentation transcript:

1 Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The Natural History Museum London

2 http://… What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

3 http://… link, What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

4 http://… What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

5 http://… is a author of person Fred book What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

6 The Semantic web: “The future of the web …and always will be” – Peter Norvig (Google) What is the semantic web? Slide adjusted from Page R. presentation in pro-iBiosphere

7 Biodiversity informatics The study of the transformation and communication of information in Life and Earth sciences provides the means (generating and enhancing the necessary infrastructure)

8 Research vs Infrastructure Slide adapted from Patterson D. 2013, Tempe, Arizona

9 vs Infrastructure  Discovery  Ephemeral  Individualistic  Massive redundancy  Optional  Risk taking Slide adapted from Patterson D. 2013, Tempe, Arizona Research

10 vs Infrastructure  Discovery  Ephemeral  Individualistic  Massive redundancy  Optional  Risk taking  Implementation  Communal / agreed  Essential  Persistent  Robust & reliable  Adaptable Slide adapted from Patterson D. 2013, Tempe, Arizona Research

11 What are the current challenges in Biodiversity informatics?

12 Publications based on countless specimens, images, maps, keys and datasets Current taxonomic data production Typically generated by small communities for “local” research projects Figure from Costello M.J et al, 2013 doi: 10.1126/science.1230318

13 15-20k new spp. described annually (2M total) 1 30k nomenclatural acts (12M total) 1 20k phylogenies (750k total) 2 31k taxa sequenced (360k taxa total) 3 800k BioMed papers (40M total pp. of taxonomy) 4 Countless specimens, images, maps, keys and datasets Our current taxonomic data production Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed. 1.8 M described spp. (17M names) 300M pages (over last 250 years) 1.5-3B specimens

14 Estimates of 7.5 million species still undescribed 1 1 How Many Species Are There on Earth and in the Ocean? Mora C et al. doi:10.1371/journal.pbio.1001127 Now imagine that…

15 Biodiversity informatics landscape Key problems Landscape is complex, fragmented & hard to navigate Many audiences (policy makers, scientists, amateurs, citizen scientists) Many scales (global solutions to local problems) Figure adapted from Peterson et al, Syst. & Biodiv. 2010 doi: 10.1080/14772001003739369

16 Science is carried out “locally” By local scientists Being part of local infrastructures Having local funders Science is global It needs global standards Global workflows Cooperation of global players BUT

17 Expected volume of taxonomic and biodiversity data Need of extracting, aggregating and linking data on a global level

18 Cyndy Parr, Rob Guralnick, Nico Cellinese and Rod Page. TREE doi:10.1016/j.tree.2011.11.001 This requires data, information & knowledge to be… Digital Not printed paper Openly accessible Not behind barriers (e.g. paywalls) Linked-up Not in silos “ Link together evolutionary data … by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses” To achieve this…

19 Hour-glass motif for big data infrastructure Data re-use Data generation Data pool Slide adapted from Patterson D. 2013, Tempe, Arizona

20 Big data world with re-use data AggregationVisualizationAnalysisManipulation ModelsObservationsExperimentsProcessed Data re-use Data generation Data pool

21 AggregationVisualizationAnalysisManipulation ModelsObservationsExperimentsProcessed Data re-use Data generation Data pool Big data world with re-use data

22 Nodes interconnected Slide adapted from Patterson D. 2013, Tempe, Arizona

23 But how many biodiversity informatics projects are out there?

24 At least 679 ! But how many biodiversity informatics projects are out there? Sources: EDIT, TDWG & ViBRANT 2013 Categories: Data Aggregator - a web site that collates data from a variety of sources (digital and hardcopy) and presents it in one form Data Indexer - a web site that provides lists or indexes of other sites that provide data Data Provider - a web site that provides data directly from research or other studies Data Standards - a web site that contributes to formulating or developing standards for data Facilitator - a web site that facilitates the provision of data by other projects or web sites

25 GBIF: Our global leader in occurrence data Aggregators

26 http://www.eu-nomen.eu/portal/ EU-NOMEN - PESI Aggregators

27 Making taxonomy digital, open & linked Aggregators

28 Scratchpads are an integrated system to Enter, Curate, Mark-up, Link and Publish data taxonomic workflow in a single virtual environment

29 A Scratchpad is a website that holds data for you and your community The Scratchpads concept Your data External data & services

30 65,000 unique visitors/month Per month unique visitors to Scratchpads sites 580 Scratchpads Communities by 8,185 active registered users covering 55,607 taxa in 653,274 pages. In total more than 1,300,000 visitors

31 Researchers can assemble, test, and analyse their data records in BOLD before uploading them to: International Nucleotide Sequence Database Collaboration (DDBJ, ENA, GenBank) BOLD Barcode of Life Data Systems Facilitators

32 Biodiversity literature openly available to the world as part of a global biodiversity community Biodiversity Heritage Library BHL http://www.biodiversitylibrary.org/ > 40 M pages of legacy literature Providers

33 Standard Exchange formats

34 http://rs.tdwg.org/dwc/index.htm Darwin Core (DwC) Primarily used as a specimen records metadata standard Standard Exchange formats

35 Access to Biological Collection Data (ABCD) http://www.tdwg.org/standards/115/ highly detailed and aims to provide a complete set of data elements for natural history collection items Standard Exchange formats

36 Audubon Core Multimedia Resources Metadata Schema http://www.tdwg.org/standards/638/ The Audubon Core metadata schema ("AC") is a representation-neutral metadata vocabulary for describing biodiversity-related multimedia resources and collections. Standard Exchange formats

37 http://tdwg.napier.ac.uk/index.php?pagename=HomePage Taxonomic Concept Transfer Schema (TCS) Mechanism to exchange data concerning the names of organisms Standard Exchange formats

38 Standards facilitate systems interoperability

39 UPIDs to identify content Identifiers A key to find something in a database. We need Unique Identifiers

40 10.4289/0013-8797.115.1.75 We need Unique Identifiers

41 http://hdl.handle.net/10.4289/0013-8797.115.1.75 http://dx.doi.org/10.4289/0013-8797.115.1.75 http://www.google.co.uk/search?q=10.4289/0013-8797.115.1.75 http://zoobank.org/10.4289/0013-8797.115.1.75 We need Unique Identifiers

42 Can a taxonomic name be used as a UPID? Is it Unique? Is it Persistent? Is it an Identifier? Are taxonomic names enough for communication between Scientists? YES Are taxonomic names enough for communication between machines? CAN BE IF We need Unique Identifiers

43 For example: Page R., Brief Bioinform (2008) 9 (5): 345-354. doi: 10.1093/bib/bbn022 We need Unique Identifiers

44 ONLY IF Name reconciliation Patterson, D. J. et al. 2010. Names are key to the big new biology. TREE 25: 686-691 doi: 10.1016/j.tree.2010.09.004 We need Unique Identifiers

45 The need for Controlled Vocabularies and Ontologies Knowledge Organisation Systems Google has done it: http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html Ontologies Plant anatomical and structural development Ontology http://www.plantontology.org/

46 Deans A. et al. Time to change how we describe biodiversity, Trends in Ecology & Evolution 2012 doi:10.1016/j.tree.2011.11.007 Example of ontology usage

47 Examples of integrated projects http://protectedplanet.net http://thymus.myspecies.info

48 How are all this relevant to my work ? What should I take home ?

49 Repositories #bigdata Repositories #bigdata Providers Data silos Community

50 The four nodes of data workflow 1. We collect and generate data 2. We curate, link and structure data 3. We analyse data 4. We publish data

51 Data curation Data curation Data analysis Data analysis Data publishing Data publishing The four nodes of data workflow Data collection & generation Data collection & generation What are the bottlenecks in the workflow ?

52 Data curation Data curation Data analysis Data analysis Data publishing Data publishing What we need is… Data collection & generation Data collection & generation a seamless workflow

53 Old Joke: A drunk is crawling around a lamp post on his hands and knees. A cop comes along … Cop: What are you doing? Drunk: Looking for my car keys. Cop: Are you sure you dropped them here? Drunk: No, I dropped them in the alley. Cop: So why are you looking here? Drunk: Because the light’s better. Old Joke

54 Science is a ‘light’s better’ endeavor in that research effort is not directed at areas where the work is technically infeasible. Research is directed where real, interpretable results may be obtained. We do, in fact, conduct research where the light’s better. But, when the light changes, so does science. With better illumination, we look in new areas. We find new things… Old Joke

55 Addressing the challenges of biodiversity informatics “…the field [of biodiversity informatics] appears to be growing in a void of overarching, motivating questions, effectively making it a set of technologies in search of questions to address.” Peterson et al, Syst. & Biodiv. 2010 doi: 10.1080/14772001003739369


Download ppt "Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The."

Similar presentations


Ads by Google