Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Similar presentations


Presentation on theme: "BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior."— Presentation transcript:

1 BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF)

2 Issue #2: Geospatial integration Issue #3: Taxonomic integration Issue #1: The consequences of scale 3 issues

3 Issue #1: The consequences of scale Goal – Provide timely access to a large federated network of biodiversity databases

4 About GBIF 341 publishers 9290 datasets 310M records The mission of the Global Biodiversity Information Facility (GBIF) is to facilitate free and open access to biodiversity data worldwide via the Internet to underpin sustainable development. 57 countries 45 organisations

5 “Wrapper” Software PyWrapper (Python) TAPIR Link (PHP) DiGIR (PHP) Your database Insect Collection Install one of these ‘wrappers’ ABCD Bird Observations Herbarium Data DarwinCore

6 The promise of federation Insect CollectionHerbarium Bird Observations Herbarium Any specimens from Thailand? GBIF Data Portal I will ask! I do! Nope! GBIF Data Portal as a Gateway

7 The challenge of federation Insect CollectionHerbarium Bird Observations Herbarium Hello? Server Not Available GBIF Data Portal Hi!

8 The rise of Indexing Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? Send me a copy of your data GBIF Data Portal (now with Data!) GBIF Data Portal as a Data Index

9 The wrong tools for the job Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? Send me a copy of your data once per month Here is page one. If I go offline, start again Not too fast! You ask the same questions every time GBIF Data Portal (now with Data!)

10 TAPIR request example dataset of 260,000 specimens 200 records retrieved per request requires 1300 request/response pairs over 9 hours to complete 500 MB of XML data is transferred becomes 32 MB text file in the GBIF server 32 MB is compressible to 3 MB zip file

11 Darwin Core Archives A text-based solution to publishing biodiversity data

12 A Refined Approach Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? This is fast! GBIF Data Portal (now with Data!) This is easy URL - index very large data sets - reduce latency

13 2007Today 70 million 201020082009 147 million 180 million 201 million 302 million Growth Need for a new standard identified

14 Issue #2: Geospatial Integration Goal – Provide accurate reporting of nationally-bound data Challenge – Inaccurate recording of geospatial coordinates

15 Geo-referenced USA data Verbatim data as shared on the network

16 Issue #2: Geospatial Integration Remediation includes: Use of country boundary shapefiles to verify that coordinates fall within them – Including EEZ boundaries – Including islands Outliers identified Nature of the error qualified (e.g., “coordinates inverted”) Offending records marked and omitted from display

17 Geo-referenced USA data Data following interpretation -Coastal regions recognised -Offshore islands recognised

18 Issue #3: Taxonomic Integration Goal – Provide access to biodiversity data according to taxonomic groups and concepts Challenge – – Heterogeneous and sometimes inaccurate classification Same taxon appearing in different classifications – Presence of homonyms that complicate reconciling above – Misspellings – Wide range of orthographies for the same name

19 Enabling authoratative taxonomic data to be published through GBIF

20 Trochilidae (Hummingbirds) (today) Misinterpretations (Hummingbirds are restricted to the Americas)

21 Trochilidae (Hummingbirds) (next month) Improved interpretation

22 Search for Oenanthe (water dropwort plant or wheatear bird) Difficult for user to interpret Accurate search results Today Next month resolution of homonyms

23 Improved means to match names to authority files

24 In summary GBIF has had to deploy different data access strategies in order to effectively scale Darwin Core Archive offers a scalable solution that has led to rapid growth in data published through GBIF Geospatial filtering via shapefiles provides basis for more accurate national reporting – Basis for additional services later (e.g., ecosystem shapefiles, protected areas, etc.) Heterogenous taxonomy inherent to collections data is nearly impossible to consolidate into a taxonomically accurate structure. – Comprehensive authoritative taxonomic data is a key organisational component of collections data

25 Thank you


Download ppt "BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior."

Similar presentations


Ads by Google