Presentation is loading. Please wait.

Presentation is loading. Please wait.

Globally Unique Identifiers and Life Science Identifiers Dave Thau University of Kansas California Academy of Sciences

Similar presentations


Presentation on theme: "Globally Unique Identifiers and Life Science Identifiers Dave Thau University of Kansas California Academy of Sciences"— Presentation transcript:

1 Globally Unique Identifiers and Life Science Identifiers Dave Thau thau@learningsite.com University of Kansas California Academy of Sciences www.learningsite.com

2 Outline 1.Describe Global Unique Identifiers 2.Show how they’re relevant 3.Describe one GUID system (LSIDs) 4.Outline some issues around using GUIDs for TDWG-related activities 5.Provide some resources 6.Open discussion

3 GUID Is Not An Ugly Word It ’s guid to be merry and wise, It ’s guid to be honest and true, Robert Burns Here’s a Health to Them that ’s Awa’. Pteroptochos tarnii AKA Guidguid Image From: animaldiversity.ummz.umich.edu

4 GUID: Globally Unique Identifier A short name for a complex entity Useful for locating information about the entity Each name identifies only one entity There is some sense of permanence

5 Some things which fit this description GenBank accession numbers: AP006480.1 US Patent numbers: 5443036 (laser guided cat exercise) Digital Object Identifier: 10.121/3212

6 In Our Domain SDD Document – Representing some data set. Cypselurus heterurus (Rafinesque, 1810) lsid.gbif.net:www.fishbase.org:1029 sp SDD Document – Representing some data set. Cypselurus heterurus (Rafinesque, 1810) lsid.gbif.net:www.fishbase.org:1029 sp Napier Schema Document – Representing some taxon. <TaxonConcept id=“urn:lsid:bioguid.org:seek:121212“ type="original"> Canis lupus … Napier Schema Document – Representing some taxon. <TaxonConcept id=“urn:lsid:bioguid.org:seek:121212“ type="original"> Canis lupus …

7 Features of a GUID system Global uniqueness scoped to Internet Should be easily resolvable by a computer or human Should identify things down to whatever level of granularity necessary Should not be limited to proprietary systems Should serve up all sorts of data –Database records –Text files –Images It would be nice if the identifier had associated metadata

8 Life Science Identifiers Official standard of the Object Management Group (OMG) Support for metadata and authentication Supports multiple protocols (e.g. HTTP, SOAP) Can serve up data in any format Decentralized – anyone can issue an LSID LSID code available in Java and Perl. A young standard, but increasingly used.

9 Organizations Using LSIDs National Center for Biotech Information (NCBI) –Pubmed –Genbank European Bioinformatics Institute (EBI) US Long Term Ecological Research Network (LTER) BioMOBY – an biological database interoperability program (biomoby.org) Open Bioinformatics Foundation (open-bio.org) myGrid– a BioGRID project (mygrid.org.uk)

10 A Small Pause For More Squid Humor

11 LSID Format urn – indicates that this is a URN lsid – indicates that it’s an LSID-type urn bioguid.org – the authority who issued the LSID –Doesn’t have to be a domain name – but for now probably should be. –bioguid.org does not necessarily have the data or metadata. –There may not even be a machine called bioguid.org. seek – a name space id internal to that authority –The name space is meaningless to systems outside that authority. 117866 – the local identifier within that authority –Also internal to the authority v1 – an optional version number –If no version, no trailing colon either. urn:lsid:bioguid.org:seek:117866:v1

12 Data and Metadata An LSID has data –Examples The gene sequence in GenBank The actual LTER data set, maybe in excel, or in a text file –The data should never change An LSID also has metadata –Example metadata The format of the data A display title for clients displaying the LSID Dublin core metadata Anything you want –The metadata can change

13 Example LSIDs An LTER fish abundance data set –urn:lsid:limnology.wisc.edu:dataset:ntlfi02 A PubMed reference: –urn:lsid:ncbi.nlm.nih.gov.lsid.biopathways.org:pubmed:12441808 A GenBank sequence: –urn:lsid:ncbi.nlm.nih.gov.lsid.biopathways.org:genbank_gi:30350027

14 How LSIDs work LSID Client Maybe Launchpad Maybe Haystack Maybe BioFerret Maybe myGRID Maybe Yours! DNS Find DNS record Resolve it to get Address of Authority LSID Authority 1.Find the authority for this LSID Returns the LSID Authority Server 2. Query authority for available services Returns WSDL for this LSID 3. Chose a service, get the goods HTTP, SOAP, FTP, others Data Store Metadata Store

15 LSID Promises I promise to never change the data behind an LSID. I will make sure my LSIDs are being served, or give them to someone who can do it. I will give my LSIDs metadata – at least give them a title and a format

16 Other GUID systems URLs –Files move –The data change –Unstructured metadata UUIDs – 128 bit string, guaranteed unique –58f202ac-22cf-11d1-b12d-002035b29092 –No resolution –No metadata Handle System / DOIs (10.12/2312) –Non standard protocol –Centralized resolution –Unstructured metadata (for Handle System) –High costs (for DOI)

17 Issues For This Community What gets a GUID? For each of those things, what’s the data, what’s the metadata? One GUID per item? Centralization – who issues GUIDs?

18 What Gets a GUID? These things probably should get GUIDs –Taxonomic concepts –Specimens –Publications –People These things might get GUIDs –Taxonomic names –Journals –Data providers –Observations

19 Specimen Data? Metadata? If specimens get a GUID – what does it identify? –The physical specimen? –A collection’s database record of the specimen? –What about multiple labels? –Main question – what doesn’t change about a specimen? –Other main question – how should the data be represented? Darwin core includes current institution location. Not a good idea for the data of a GUID since that may change.

20 One GUID Per Item? No GUID system inherently enforces a 1:1 mapping between GUID and data. Everyone should TRY to limit the number of GUIDs per item. Should there be any centralization to help achieve this?

21 Degrees of Centralization An index –List your GUID authority in an index so your GUIDs are easy to find. A central authority –One authority could be responsible for issuing GUIDs to the community for specific types of information – you’d have to get one from here. GBIF? The IC_Ns? (ICZN, ICBN….) lsidauthority.org? –This would help enforce a 1:1 mapping of GUIDs and data items –It would also alleviate data providers from the need to maintain their own authorities –It MAY also reduce the likelihood of GUIDs becoming unresolvable –It may also be infeasible technically, or socially. A respected authority –With LSIDs, an authority can be set up to serve its own GUIDs and proxy other authorities. –This would help enforce a 1:1 mapping for those who use the authority –It may also be more feasible.

22 LSID Resources LSID Articles and code from IBM –http://www-124.ibm.com/developerworks/oss/lsid/#whatislsidhttp://www-124.ibm.com/developerworks/oss/lsid/#whatislsid Current LSID specification –http://www.omg.org/cgi-bin/doc?dtc/04-05-01http://www.omg.org/cgi-bin/doc?dtc/04-05-01 Launchpad – An LSID resolver for Windows IE –available from first link A website which resolves LSIDs –http://lsid.biopathways.org/resolver/http://lsid.biopathways.org/resolver/ URN specification –http://www.ietf.org/rfc/rfc2141.txthttp://www.ietf.org/rfc/rfc2141.txt

23 Acknowledgements My work on GUIDs has been funded by the SEEK project – seek.ecoinformatics.org. SEEK is funded by National Science Foundation award 0225676. Thanks to Ben Szekely at IBM for his LSID articles, his LSID java code, and for answering all my questions.

24 Questions for Discussion Do we need GUIDs? What gets a GUID? One GUID per item? Centralization?


Download ppt "Globally Unique Identifiers and Life Science Identifiers Dave Thau University of Kansas California Academy of Sciences"

Similar presentations


Ads by Google