Ricardo Pereira Software Engineer TDWG Infrastructure Project (TIP)

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
TDWG GUID-2 June 10, 2006Jessie Kennedy/Rob Gales LSID Resolution In SEEK Taxon.
W3C and RDF. Why OCLC is a W3C Member Access to networked information resources –the browser and online access –the breath and depth of networked information.
Forest Markup / Metadata Language FML
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Semantic Web Thanks to folks at LAIT lab Sources include :
CS570 Artificial Intelligence Semantic Web & Ontology 2
RDF Tutorial.
Semantic Web Introduction
Entomological Collections Network Meeting, Indianapolis, IN 13 December 2009 Darwin Core Ratified in the Year of Darwin Gail E. Kampmeier Illinois Natural.
The JISC IE Metadata Schema Registry Pete Johnston UKOLN, University of Bath JISC Joint Programmes Meeting Brighton, 6-7 July 2004
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Semantic Web Tools for Authoring and Using Analysis Results Richard Fikes Robert McCool Deborah McGuinness Sheila McIlraith Jessica Jenkins Knowledge Systems.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
RDF: Data Description With HTML, the Web is for reading With XML, the Web is for processing Necessary to know  who wrote this?  who owns it?  who authorised.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Practical RDF Chapter 1. RDF: An Introduction
RDA data and applications Gordon Dunsire Presented to staff of the British Library, Boston Spa, 20 Mar 2014.
An Introduction to the Resource Description Framework Eric Miller Online Computer Library Center, Inc. Office of Research Dublin, Ohio 元智資工所 系統實驗室 楊錫謦.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Logics for Data and Knowledge Representation
The Semantic Web Web Science Systems Development Spring 2015.
Globally Unique Identifiers Workshop (GUID-1) International Working Group on Taxonomic Databases - TDWG Global Biodiversity Information Facility - GBIF.
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
TDWG Standards Roadmap Roger Hyam (Technical Architecture Group)
Semantic Web - an introduction By Daniel Wu (danielwujr)
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
RDF, XML and interoperability Managing networks : understanding new technologies, Birmingham, 13 September 2001 Pete Johnston UKOLN, University of Bath.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
TAPIR 1.0 Renato De Giovanni, Markus Döring, Javier de la Torre October 2006.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
TDWG Life Sciences Identifiers Applicability Statement Ben Richardson Review Manager, LSID Applicability Statement Western Australian Herbarium Department.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
INLS 150 Session 5 February 7, 2002 Cristina Pattuelli School of Information & Library Science UNC.
An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.
The future of the Web: Semantic Web 9/30/2004 Xiangming Mu.
TDWG Infrastructure Project (TIP) Technical Architecture Group (TAG) Roger Hyam TDWG Executive Meeting June 1-2, Madrid, Spain.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
LSIDs and RDF in TDWG Roger Hyam, TDWG, RBGE Donald Hobern, GBIF June 7-9, Edinburgh, UK.
Linked Data Best Practices and BibFrame December 15 th, 2015 Rob Sanderson (google doc) CNI 2015 F ALL F ORUM.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
TDWG – Looking Backward and Forward Donald Hobern, Director, Atlas of Living Australia 20 October 2008.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.
1 RDF, XML & interoperability Metadata : a reprise Communities, communication & XML An introduction to RDF RDF, XML and interoperability.
TapirLink: Enabling the transition to TAPIR Renato De Giovanni TDWG 2007.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Course on persistent identifiers, Madrid (Spain) Information architecture and the benefits of persistent identifiers Greg Riccardi Director Institute for.
TDWG Core Ontology J Kennedy R Gales, R Hyam, R Kukla, J Wieczorek, G Hagedorn, M Döering D Vieglais, S Perry, D Hobern.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Jessie Kennedy Rob Gales, Robert Kukla
Middleware independent Information Service
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
LOD reference architecture
Presentation transcript:

Ricardo Pereira Software Engineer TDWG Infrastructure Project (TIP)

1980 – Efforts to computerize collections 1990 – Networks & data exchange standards The Species Analyst (Z39.50) The Australian Virtual Herbarium (HISPID3) 2000 – The XML boom Allowed integration of millions of collection records Data protocols such as BioCase and DiGIR Schemas such as ABCD, DarwinCore, SDD, TCS, NCD, TaXMLit Developed independently and were largely successful But...

Lack of synchronization and oversight lead to Overlap Minimal reuse and No interoperability between standards Problems with schema versioning (DiGIR)

Truly distributed environment: Authorities publish objects Others annotate objects and create derivatives Identification of duplicates Foreign annotation and aggregation Traceability of source in derivative work Better interoperability between standards Expressing semantics XML Schema are not designed to handle new use cases

Proposed by TDWG and GBIF & funded by the Moore Foundation (US$1.5m) for 2.5 years Three full time staff Goals (one view) Strengthen TDWG standards development process Provide technical guidance to the community The creation of the TDWG Technical Architecture Group (TAG) Create a common architecture…

“The architecture is concerned with shared data.” Data only matters when crossing system boundaries Not concerned with internal structure  “Biodiversity data will be modeled as a graph of identifiable objects.” A means to achieve maximum interoperability

The three legs are all equally important: remove one and the architecture fails; there are multiple dependencies between the legs.

The core ontology acts like a type catalog Shared objects must be typed according to that catalog Application specific ontologies may be defined Extending or constraining existing concepts and properties Adding new properties from other vocabularies Currently being implemented using RDF(S) and OWL The ontology is not a new model! TDWG has already modelled its domain and the semantics are available in the existing schemas. The ontology is a process of translation, re-factoring and mapping RDF representation of existing schemas TCS has been translated into RDF: TaxonName, TaxonConcept, etc DarwinCore is being incorporated Others will follow (NCD and ABCD) LSID Vocabularies

Limitations of XML Schema: A simple statement could be expressed in many different ways Requires Human reader interpretation Application programs require prior knowledge of schema design Imposes syntactic constraints on how statement are expressed Less flexibility but greater interoperability Provides semantic context Permits a consistent human and machine interpretation Enables reuse of existing vocabularies: May incorporate overlapping structures from different domains Metadata may be used by other applications without prior knowledge of the schema Improved interoperability

Foundation of a truly distributed system Implementation of the arcs in the graph model, making linking possible (“Biodiversity data will be modelled as a graph of identifiable objects.”) New use cases are easier to implement Custodianship Discovery of Duplication Effective Validation Procedures Data Update Indexing and Caching Services Verification of derived product Tracking of annotations TDWG GUID Task Group recommended adoption of Life Sciences Identifiers (LSIDs)

Example: urn:lsid:tdwg.org:names:1234 Persistent association with objects Independent of location (vs. HTTP) Independent of protocol (vs. HTTP) Cost is $0: assigning millions no problem But It isn’t directly interoperable with Semantic Web technologies as generic Semantic Web clients cannot dereference using HTTP TDWG is addressing this problem by using HTTP proxies (via LSID Applicability Statement) …Kevin Richards

Stack of protocols in increasing order of accessibility and functionality Resolution Retrieve object description associated with identifier One object at a time Low requirement for resolving an identifier HTTP GET & LSID Resolution Protocol Harvest Retrieve all objects of a given type Useful for aggregators (such as GBIF) Search Distributed queries Implemented using TAPIR Agents can choose response metadata representation (existing or arbitrary XML Schema or RDF). Potential to use Semantic Web standards (such as SPARQL) in a centralized environment (e.g. aggregator or indexer)

Slide by Roger Hyam (TIP & TAG)

Any questions? ricardo (at) tdwg (dot) org Kevin Richards will now present more details about LSID and its resolution protocol Some slides derived from work by: Tim Berners-Lee Roger Hyam (add UK metadata folks here) Cliparts provided by Clipart ETC Florida Center for Instructional TechnologyFlorida Center for Instructional Technology (FCIT) University of South Florida, U.S.A.

XML Schema vs. RDF

A simple statement could be expressed in many different ways in XML Human reader interpretation Application programs require prior knowledge of schema design

page Ora Ora href="page" Ora href="page" Ora <document href= author="Ora" />

qwerty XML Schema supports questions about the document structure: Is there a element within ? What is the content of the element within the element? Etc. No support for questions about meaning: Who’s the author of page?

RDF is the language of the semantic web RDF imposes syntactic constraints on how statement are expressed RDF provides semantic context RDF permits a consistent human and machine interpretation Less flexibility but greater interoperability Better support for reuse of existing vocabularies May incorporate overlapping structures from different domains Metadata may be used by other applications without prior knowledge of the schema Improved interoperability

RDF models are based in assertions: Subject – Verb (or Predicate) – Object Examples: The Page author is John This is a slide Subject, Predicate and Object (tripples) are identified by URIs Globally Unique Objects can be literals (i.e. “John Smith”, “house”)

<Description about= tdwg:Author=“John Doe" /> Or: “John Doe” (subject) (verb) (object)

<Description about= x:y=“qwerty" /> The machine now knows: We are talking about an identified object and the object has a value “qwerty” for property “x:y” Verbs (predicates) are uniquely identified by URI & are retrievable Machines can fetch a description of x:y and ask: Is x:y something I already know? Is there a label associated with the x:y property so I can at least display it instead? Actionable unique identifiers allow others to: Make assertions about the same object Link to other uniquely identified objects Suitable for distributed environment, foreign annotation, and persistent linking

Use the information you want Ignore what you don’t know Homepage Web $# $#

Server A (authority): is a species name Server B: is a synonym to is circumscribed to those specimenshttp://xxxx.org/xyz Foreign assertions can be used or not, depending on: Trust (of source) Contents

Yes, we could, but it would be complicated We would have to build from scratch: A standard way to identify resources globally A standard way to express assertions...That’s what RDF does anyway!

RDF does not support all use cases XML Schema is still appropriate To support document centered data transfer When all parties know how the semantics is hardcoded to the document structure So how do we integrate both technologies?

TDWG Access Protocol for Information Retrieval Based on XML Schema Highly configurable – supports arbitrary schemas Can be configured to return valid RDF Keeps the best of both worlds: When properly configured, a TAPIR provider can encode the response using an arbitrary XML Schema and also RDF

Principles: Architecture is concerned with shared data Data modeled as a graph of identifiable objects Data typed according to known vocabularies Data Transfer Protocols for: Resolution Harvesting Querying