Presentation is loading. Please wait.

Presentation is loading. Please wait.

IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi.

Similar presentations


Presentation on theme: "IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi."— Presentation transcript:

1 IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi

2 What good is identification? How are identifiers used by consumers Providing IDs Resolving IDs in a server – Strategies for storing IDs in databases Linked Data Annotations ~ all sorts Feedback Overview

3 What good is identification? Aggregation – If you get info from 2 sources that are about the same object, you can combine the info Resolution (finding information about object) – Types of resolution Determine where to get information Determine how to get information Providing information – How to create IDs – How to publish IDs – How to fetch database information for IDs

4 HTTP URIs Biggest problem – Identification and 2 types of resolution are comingled Resolution – Where to get information Look somewhere – How to get information Fetch information using some protocol

5 DOI example The DOI is 10.3897/zookeys.209.3135 URI (for aggregating) is doi:10.3897/zookeys.209.3135 A URL for information retrieval (proxy resolution) is http://dx.doi.org/10.3897/zookeys.209.3135 Information fetched from – HTML: http://www.pensoft.net/journals/zookeys/article/3 135/abstract/five-task-clusters-that-enable- efficient-and-effective-digitization-of-biological- collections – RDF: http://data.crossref.org/10.3897/zookeys.209.3135

6 What’s in an ID? For consumer: – NOTHING! No information – Might as well be UUID Can’t type it, remember it, parse it, resolve it – Useful for comparison and aggregation Equal strings (persistence) Different strings about the same object – fetching information Send the ID somewhere for info

7 What’s in an ID? For Provider/resolver: – Use ID to find local storage of information – E.g. parse out the DWC triple Extract the database table and primary key Look up the ID in a table of IDs Look up ID in a URI field of a database table

8 What’s in an id for the provider? record id112234 uuid 954c8760-e1a6-4b4b-ab82-6bf7311c25f3 lsid urn:lsid:example.org:specimen:22545 an http - uri ezid http://n2t.net/ark:/99999/fk42b9hdf doi doi:10.1038/ng0609-637

9 What about Specimen identifiers? identifier on the specimen? – readable text – encoded data – barcode is a contextual identifier identifier in the database? – http://ids.usms.edu/herb/0014097 – http://ids.usms.edu/herb/0303134303937

10 How do providers identify?  Notice online databases and your database and find the identifiers of the various objects  Some identifiers are local (e.g. primary key)  Some identifiers are globally unique  Some identifiers are URIs

11 Identification in the field

12 Storing IDs in databases your contextual ids?, your guids? What to use for IDs? – record id – uuid – lsid – uri what’s in your wallet database? Morphbank Example

13 IDs in Morphbank Morphbank Example http://www.morphbank.net/818505

14 IDs in Morphbank Morphbank Example http://www.morphbank.net/643261

15 Sharing data with IDs into a publication uploaded to the web data shared with a database integrator / aggregator – GBIF – iDigBio – VertNet – Morphbank what is it exactly in the publication? – an id?, a guid? a link to more information? – what will be cited? searched for?

16 Feedback with IDs Annotations – Target of annotation http://www.morphbank.net/818505 – filtered PUSH linked data ~ the semantic web – (benefits – in a minute) updating the database – be(a)ware – Remember previous IDs

17 What’s coming up next? expect guids for all sorts of objects – collection objects (example: specimen) – georeferences – taxon concepts – determinations – people

18 GUIDs are key 1 to many IDs known for a given object store and share the ones you know about Specimen RecordID 19537 Specimen Previous Catalog Number 212345 Specimen Catalog Number / bar code bbbrc000123 Darwin Core Triplet (DwC) flmnh:herb:bbbrc000123 DwC Occurrence URI urn:catalog:flmnh:herb:bbbrc000123 Specimen GUID of type lsid urn:lsid:biocol.org:flmnh:bbbrc000123 Specimen Opaque Identifier (UUID) 424854d7-baec-42cf-a142-805b64117b9f URI for UUID urn:uuid:424854d7-baec-42cf-a142-805b64117b9f Specimen GUID of type HTTP-URI http://ids.flmnh.ufl.edu/herb/bbbrc000123 *Cannot enforce single identifier per object

19 caring for guids store them – database adjustments – tweaking current standard practices share them – data standards – 3 ways to modify darwin core reap the benefits

20 caring for guids – reap the benefits Data quality feedback Dialog based on annotation Tracking objects through analysis and use Maintaining attribution to provider Find related objects Find a way to take advantage of efforts of many smart dedicated people – BHL, biscicol, filtered PUSH, GNA, TNRS, SGR,…

21 Thanks from iDigBio

22 uniqueness Uniqueness can be guaranteed – by context as in UPC, ISBN, DOI – by design: URI based on scheme plus DNS – By sparseness as in UUID Uniqueness can be reinforced by encoding – As in UPC, make values sparse Cannot enforce single identifier per object

23 persistence “Persistence” refers to the binding of identifier to object – Not object availability – An unexpected interpretation A persistent identifier is one that can be relied on for its connection to an object. – Once assigned to 1 object it will never be assigned to another


Download ppt "IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi."

Similar presentations


Ads by Google