Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa IABIN/CHM Cancún, Mexico, 12-14 August 2003 WWW.GBIF.ORG.

Slides:



Advertisements
Similar presentations
The Biosafety Clearing-House of the Cartagena Protocol on Biosafety Tutorial – BCH Resources.
Advertisements

Web Service Architecture
Integrating Biodiversity Data
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.
Entomological Collections Network Meeting, Indianapolis, IN 13 December 2009 Darwin Core Ratified in the Year of Darwin Gail E. Kampmeier Illinois Natural.
14 October 2003ADASS 2003 – Strasbourg1 Resource Registries for the Virtual Observatory R.Plante (NCSA), G. Greene (STScI), R. Hanisch (STScI), T. McGlynn.
1 The IIPC Web Curator Tool: Steve Knight The National Library of New Zealand Philip Beresford and Arun Persad The British Library An Open Source Solution.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Web Services Andrea Miller Ryan Armstrong Alex. Web services are an emerging technology that offer a solution for providing a common collaborative architecture.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa, Donald Hobern, Larry Speers, Per Bjørn & Giorgos Ksouris.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa Norwegian GBIF meeting Oslo 25 September
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Species Banks a GBIF mechanism to provide electronic access to quality species information Peter H. Schalk, Marc Brugman ETI, University of Amsterdam Tinde.
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat Data Publishing.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
GLOBAL BIODIVERSITY INFORMATION FACILITY The Global Biodiversity Information Facility (GBIF ): The distributed architecture Samy Gaiji Head of Informatics.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
U.S. Department of Agriculture eGovernment Program August 14, 2003 eAuthentication Agency Application Pre-Design Meeting eGovernment Program.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer October DarwinCore Archives – Simplified Format for publishing.
1 DanBIF Danish Biodiversity Information Facility Arbejdsseminar om GBIF i Norge Norges Forskningsråd, Oslo 25. September 2003 Isabel Calabuig.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Common Data Index CDI V1 How to proceed By Dick M.A. Schaap – technical coordinator Madrid, March 09.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Zope/Plone/Python for Research Ben Best OBISSEAMAP mapping marine megavertebrates
1 GBIF and Ocean Biodiversity, OBI'07 Conference, Oct 2-4, 2007, Dartmouth, Nova Scotia GBIF and Ocean Biodiversity Building the data web with OBIS Éamonn.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
BioCASE – A Biological Collection Access Service for Europe BioCASE programme – metadata and computing methods The Irish National Node Workshop: October.
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.
WEB SERVICE DESCRIPTION LANGUAGE (WSDL). Introduction  WSDL is an XML language that contains information about the interface semantics and ‘administrivia’
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
1 1 ECHO Extended Services February 15, Agenda Review of Extended Services Policy and Governance ECHO’s Service Domain Model How to…
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
IABIN Executive Committee / Coordinating Institution Meeting GBIF and IABIN: status and opportunities in 2011 Juan Bello, Mélianie Raymond & Alberto González-Talaván.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen,
Global Biodiversity Information Facility. GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa & al. Ecoinformatics Workshop Brussels 22 September.
Networking Biodiversity Data – Online Access to Distributed Data Sources in GBIF-D Andrea Hahn, A. Kirchhoff & W.G. Berendsohn Botanic Garden and Botanical.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Don’t Duck Metadata March 2005 Introducing Setting Up a Clearinghouse Node Topic: Introduction to Setting Up a Clearinghouse Node Objective: By.
IABIN Species and Specimens Thematic Network (SSTN) IABIN Executive Committee/Coordinating Institution Meeting. Tierras Enamoradas, Costa Rica. February.
IABIN Standards & Protocols Presented by: Mike Frame, USGS NBII Developed by Darrell McClarty IABIN Regional Coordinator.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
TapirLink: Enabling the transition to TAPIR Renato De Giovanni TDWG 2007.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa ECOINFORMATICS 2006 JRC, Ispra,
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.
GBIF NODES Committee Meeting Copenhagen, Denmark 4 th October 2009 The GBIF Integrated Publishing Toolkit Alberto GONZÁLEZ-TALAVÁN Programme Officer for.
Michael Hucka1 The Modeler’s Workspace Current Design Plans and Status Current project members: Michael Hucka Kavita Shankar Sara Emardson David Beeman.
International Planetary Data Alliance Registry Project Update September 16, 2011.
GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell.
TRIG: Truckee River Info Gateway Dave Waetjen Graduate Student in Geography Information Center for the Environement (ICE) University of California, Davis.
An Overview of Data-PASS Shared Catalog
The IPT user interface and data quality tools
Flanders Marine Institute (VLIZ)
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Wsdl.
SDMX IT Tools SDMX Registry
Presentation transcript:

Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa IABIN/CHM Cancún, Mexico, August Information architecture of the GBIF

Global Biodiversity Information Facility Outline 1. Data 2. Software 3. Hardware 4. Peopleware (Nodes) 5. Status of network and conclusion

Global Biodiversity Information Facility 1. Data l Policy, decisions, l Knowledge and l Information depend on l Data Refinement, analysis, synthesis

Global Biodiversity Information Facility GBIF is concerned with ”primary biodiversity data” only l Specimens l Observations l Names l Species l Literature l Metadata on the above

Global Biodiversity Information Facility How the data will be organised ? l By having a common information model and shared data standards Institutions Data sources Knowledge Bases Unstructured information Checklists Redlists Observation data Specimen data Species Knowledge Datasets Units/ Records Objects Taxonomies in ECAT and Catalogue of Life Source: URL Protocol: SOAP, DiGIR Format: XML Schema Description: Rights: Format: Rights: Services: Cen- tral Distri- buted

Global Biodiversity Information Facility Data exchange standards are the key Data description in XML l Specimen/ Observation l Name/ Taxon l Providers / Collections / Persons in various roles Standards process l GBIF-DADI works with TDWG l Discussion, documentation l Open source l digir.sourceforge.net Leading standards l DiGIR l Darwin Core l ABCD/BioCASE l Dublin Core l SOAP l Grid OGSA

Global Biodiversity Information Facility 2.Software GBIF is buidling a distributed network of databases using a web services approach

Global Biodiversity Information Facility Web Services: Definitions  A Web Service is a software application or component identified by a URI, whose interfaces and bindings are capable of being described by standard XML vocabularies and that supports direct interactions with other software applications or components through the exchange of information that is expressed in terms of an XML infoset via Internet-based protocols. - Chris Ferris, Sun Microsystems, W3C

Global Biodiversity Information Facility The Web Services Stack DiGIR,

Global Biodiversity Information Facility 2.1. The l Used for communication between data providers and users l More light-weight and specialised than SOAP l Enables single point of access (portal/search) to distributed information resources l Resource: a collection of data objects that conform to a common schema (DB records, XML documents) l Distributed resources conform to federation schema l Enables search & retrieval of structured data l Search for data values in context (semantics) l Results as structured data set l Makes location and technical characteristics of native resource transparent to the user l The Distributed Generic Information Retrieval protocol has been invented by David Vieglais (University of Kansas) and Stan Blum (California Academy of Sciences) protocol

Global Biodiversity Information Facility Portal Search engine A simple DiGIR architecture Data Providers

Global Biodiversity Information Facility Portal Data provider Provider Services Provider query Request Marshaller Query Engine Available providers Registry Institutions Providers Services ( UDDI ) User Resource Metadata Resource Metadata GBIF DiGIR Architecture Index Name provider Provider Services Resource Metadata Resource Metadata and name query Metadata response Full data query Full data response Metadata and logs Synonyms, guids Publish availability Cache Metadata Accounting SOAP DiGIR

Global Biodiversity Information Facility l Global marketplace of shared biodiversity data l Technically available now, awaits being populated l Multiple UDDI servers possible in 2004 (v3) l Based on UDDI (Systinet WASP) and web services l Directory of Participants and data providers l Services of the providers, i.e., datasources and datasets offered l tModels of the standards that must be adhered to l Open interfaces for portals and specialised search engines l Anybody can write their portal/search tool that uses the registry l Use of index is optional 2.2. The registry You don’t get very far with web services unless you have a registry...” -Tom Gaskins, uddi.org

Global Biodiversity Information Facility How does the GBIF UDDI registry work? GBIF UDDI Registry Services Registrations Provider Registrations G I F B 1) GBIF Secretariat and other developers create and populate the registry with descriptions of standards (tModels) 2) Museums and other data providers install data provider packages which are automatically registered 6) Scientists and policy users use portals to build data sets for analysis and synthesis 5) Portals and search engines query the registry and the index to build tageted user interfaces 4) A global index queries the registry and caches metadata and usage statistics, creating unique identifier for each record (and name) 3) GBIF Participant is notified of new provider in their domain, possible endorsement

Global Biodiversity Information Facility 2.3. Metadata and names index l Closely paired with the services registry will be a global index of the available data l Retrieves metadata of datasets/resources available in the registered providers l Indexes on scope and coverage of datasets/resource (Dublin Core registry)Dublin Core l Taxonomic, spatial, temporal,... l Maintains a cache of key data in case provider goes off-line

Global Biodiversity Information Facility Name Service (ECAT) is a major component of the global index Catalogue of Life Biodiversity Data Access Name Usage Index Taxonomic Name Service (ECAT) Specimen Data Observation Data Name Lists Unstructured Data URLs XML Data AccessHTML Data Access GBIF Portal ECAT elements have been coloured orange: “Name Lists” are lists of names for a specific purpose (e.g. Red List, regional checklist) Indexing of usage Indexing of usage Index Manager GBIF Data Nodes

Global Biodiversity Information Facility 2.4.a. Data provider software l Each system entails l Provider software l Communication with the DiGIR protocol l Data standards Darwin Core, Dublin Core l Installation for each provider l Configuration for each resource (local existing database) l Registration with GBIF UDDI registry l Turn-key package for Linux and Windows l Based on PHP and digir.sourceforge.net code l Available in August 2003

Global Biodiversity Information Facility 2.4.b. Data repository tool l A data warehouse tool to manage and share data without database l Upload and manage datasets in document format either as a) spreadsheet, b) embedded Darwin Core, or c) ABCD l Release dataset to public l Data is parsed into embedded MySQL database and becomers available as DiGIR resource l Revoke release l Data is deleted from database l Stand-alone package or module of GBIF PTK l For Linux and Windows l Based on Python and Zope, available Q3/2003

Global Biodiversity Information Facility 2.5. Logging and accounting l Track the usage of the network and document the data provided by the nodes. l Why? l Recognise the efforts of the data providers l Help the users to acknowledge the sources of the data they are using l Report back to the Participants whether the GBIF network is really used l Optimise network performance and services l How? l Willing data providers log their transactions l Central accounting service downloads logs, providing statistics of usage and a citation service on the web site and via l Part of the Index

Global Biodiversity Information Facility 2.6. Portals l Portals are gateways to distributed information resources l You do not need your own portal in order to become data provider l Just access to one that talks to a registry l Anybody can write their specialised portal/search tool that uses the registry and the index through their open interfaces (DiGIR, SOAP) l The MANIS portal is available now (Java) l GBIF Portal Toolkit v2 that can be used to access data planned for availability Q1/2004

Global Biodiversity Information Facility Two roles of portals l Communication/ coordination needs l Portals are integrative tools and gateways to information that go beyond single websites l Portals and related directory services can be used to coordinate network activities l Data access needs l Much of the content on the portals can be built automatically out of contents of the central Index l GBIF central portal is only one of many portals and search engines making use of the central metadata registry and related index through their open interfaces l Participant nodes need portals to data in their domain

Global Biodiversity Information Facility GBIF Portal Toolkit Communications portal (version 1) released at the end of 2002, and as portal toolkit (PTK) for use by nodes l News syndication with RSS/RDF l Events, calendar of calendars, projects l Articles, documents, images, audio and video content l Search within the site, across the GBIF network l Download area l Getting started service and how to become a node l About GBIF l CIRCA-based group collaboration services l Directory services (CIRCA-based open LDAP) l Suggestions and feedback from users l Prototype data repository Data access portal (version 2) Q1/2004, l Registry l Access to primary biodiversity data derived from the central index l Accounting service of use of data l Links to Participant nodes and their content

Global Biodiversity Information Facility Test version of the central GBIF communic- ations portal

Global Biodiversity Information Facility 3. Hardware l Each Participant should have on Internet, alternatively, or both: l A network of distributed data providers l A central data warehouse l At least one server and an Internet connection that are stable l Can be hosted elsewhere, if stablity is problem

Global Biodiversity Information Facility 4. Peopleware How to become a GBIF data provider? Data is provided by the nodes.

Global Biodiversity Information Facility GBIF node responsibilities GBIF Registry, Index, and Portal Data Node Participant Node Portal 1.Network 2.Registry 3.Standards 4.Tools 1.Encourage participation 2.Manage registration of Data Nodes 1.Coordination 2.Network 3.Registry 4.Standards 5.Tools 6.Consolidated Data 1.Register metadata 2.Allow indexing 1.Identify Data Nodes 2.Endorse and quality assure data nodes 3.National Language Interfaces

Global Biodiversity Information Facility NODES coordinate their Participant networks l The NODES Committee l Comprises the managers of the Participant nodes l Works with the Information and Communications Technology (ICT) staff of the Secretariat to develop the network of nodes l NODES are in key position in promotion and helping of inclusion of new data providers and data sets l Building of data network requires building of a human network l Maintains global directory of people, roles, data providers l Sharing the best practices, experiences and ideas and share software tools

Global Biodiversity Information Facility What tools Participant node needs l Registry tools to endorse institutions and data providers l Access to the central UDDI registry l Local directory server or UDDI server l Directory of people, collections, institutions and related communication tools l Portal server for domain-specific website l National language support as needed l Data warehouse to host data from the willing/unable data nodes l Tools for quality assurance

Global Biodiversity Information Facility Training l Training programme is being shaped l 7 regional workshops in 2003 on ”Becoming a GBIF data provider” l Stockholm, Ottawa, Tsukuba, Lisbon, San Jose, Africa, ”francophonie” l Secretariat only works with the Participant nodes, therefore: l ”Train the trainer” concept l Certification of a cadre of trainers l Standardised tools and materials

Global Biodiversity Information Facility Helpdesk l For all operational services l Ticket handling, followup l Will be geographically distributed l For ”GBIF-approved packages”

Global Biodiversity Information Facility Why would I share my data? l Identity of each record will be maintained l Globally unique identifier (LSID/URN) l Network:Provider:Namespace:Key:Version, E.g. GBIF-LSID:mysite.org:SpecimenID:123456:1 l Comparable to authorship of names l Usage will be logged and statistics provided l The efforts of the data providers will be recognised l Users required to acknowledge the sources of the data they are using l Users will be informed who is using their data (difficult without authentication) l Could be required for publication (cf. GenBank) l ”GBIF Public Licence”

Global Biodiversity Information Facility GBIF IPR Principles l GBIF will seek to ensure that data in GBIF-affiliated databases is in public domain l In particular data enabling linking with other data l GBIF will seek to ensure that source of data is acknowledged by all users l Cf. Open Source licenses, commons l Maintenance and control of data remain in hands of database owners l There will be no central data banks (except caches) l Database owners can block access to sensitive data l Countries have sovereignity over their biological resources  It follows that GBIF services will mainly be integrative metadata services, and standards

Global Biodiversity Information Facility Conclusion

Global Biodiversity Information Facility GBIF as a global inte- grator

Global Biodiversity Information Facility GBIF network status l NODES committee set its goal to have a DiGIR network up and running by end of 2003 l Seven regional workshops and training events l Two DiGIR provider implementations available August 2003 l UDDI registry up and running July 2003 l Global index Q4/2003 l Portal to browse and search data Q4/2003, toolkit Q1/2004 l Specialised services such as BIODI GARP service emerging

Global Biodiversity Information Facility SUMMARY l Central registry and marketplace of distributed data l Anyone can build their vertical portals or specilised search engines on top of that l Participant nodes: Major role in coordination and dissemination, quality assurance l Data nodes: Register your datasets, provide online access to database or repository l Data remains under the control of providers l Data standards and web services make it work