Presentation is loading. Please wait.

Presentation is loading. Please wait.

Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven.

Similar presentations


Presentation on theme: "Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven."— Presentation transcript:

1 Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven J. Baskauf – Vanderbilt University Thomas Sasek - University of Louisiana at Monroe

2 GUIDs Good for what ails you Globally Unique Identifiers (GUIDs), a.k.a. Persistent Identifiers Properties of GUIDs: 1. Globally unique (no two alike!) 2. Persistent (lasts forever!) 3. Actionable (explains itself to you and web crawlers on demand!). = technical detail warning

3 Identifiers that are persistent should be scalable http://lod.geospecies.org/ses/4XSQO This URI could represent a passive file delivery system where ses is the name of a directory on the server and 4XSQO the name of a file in that directory (no illegal file characters) ses/4XSQO could also represent an identifier passed to a server- side script that generates a file on the fly from a database In accordance with the principle of REST (representational state transfer), the client (i.e. user with a web browser) doesn’t need to know how the server produces the file it sends-the method could change over time as needed. Other nice things about this style of URI – could correspond to a user’s hierarchy (e.g. collectionCode/catalogNumber) – relatively short – no characters that need to be escaped in XML Thanks for the example, Pete DeVries. My grant got funded!

4 Identifiers that are persistent should be able to survive the apocalypse Grants end. People quit. People loose interest. http://lsid.tdwg.org/urn:lsid:gdb.org:GenomicSegment:GDB132938 My grant ran out.

5 How can we provide actionability? “Adoption of Persistent Identifiers for Biodiversity Informatics” GBIF, 2009. Server Man We can do this easily with a mod_rewrite accessing a php script that uses our MySQL database! If this is so easy, why aren’t people using actionable GUIDs with occurrence data???

6 The Chicken and Egg Problem of Actionability Nobody is going to go to the trouble of making their GUIDs actionable if the metadata that the GUIDs return aren’t ever going to be used for anything. Nobody is going to build a system that gleans data from actionable GUIDs if there aren’t any GUIDs from which to harvest metadata. (Just like the early Internet where little content was available for users!)

7 Economics of investing in GUIDs The use of GUIDs for occurrences will increase when the benefits outweigh the costs of implementation. If no one uses the metadata from actionable GUIDs, then in order for them to be adopted either: – the cost of implementation must be very low – there must be other benefits – or both!

8 SERNEC (Southeast Regional Network of Expertise and Collections): Representing herbaria in the Southeast USA 125 member herbaria 53 survey respondents 43% of institutions have negligible to no IT support. 40% have web pages (most are rudimentary) 3-4 serve data Data courtesy of Zack Murrell of SERNEC Economics 101

9 Databasing technology in SERNEC 75% are databasing approximately 35% are using Excel or nothing Although some are institutions with significant budgets, IT support, some are one-person operations with no budgets and no IT staff Data courtesy of Zack Murrell of SERNEC These people don’t need help These people need a lot of help

10 Costs: 1. Risk: depending on someone else’s complicated solutions that may result in disaster.

11 Costs: 2. You may invest time in something that never happens.

12 Cost: 3. Unavailability of a template for generating RDF/XML The TDWG, GBIF, and Linked Data guidelines say we must use Resource Description Framework (RDF) in XML format to describe metadata. What is it? RDF describes metadata properties in a way that can be understood by computers. It looks like this: Field individual of Arborus rarus

13 Summary: Users having few IT resources need a simple system: – that requires little or no help to implement – that can use existing database output – that requires the least possible maintenance on the server The cost of complex systems is too high for small users to implement without a very large benefit.

14 Methods for lowering the cost of implementing actionable GUIDs for small-scale users: RAX and REJAX

15 Review of Linked Data rules 1.URIs of physical or conceptual (non-information) resources must differ from the URLs of documents that describe them, e.g.: http://bioimages.vanderbilt.edu/vanderbilt/7-314 is an oak tree http://bioimages.vanderbilt.edu/vanderbilt/7-314.rdf is a metadata file describing the oak tree 2.Content negotiation for actionable non- information resource URIs should produce: A.a web page for humans to see B.an RDF/XML file for semantic clients (i.e. computers)

16 EXtensible Stylesheet Language Transformation (XSLT) RDF/XML metadata in the file 0134.rdf XSLT stylesheet in the file guid-o-matic.xsl XHTML web page as seen by a human being

17 RDF And XSLT (RAX) method 1. Client requests extension-less URI. 2. Server concatenates “.rdf” to the URI. 3. RDF/XML file delivered to client regardless of requested content-type. 4. Web browsers use an XSLT stylesheet to create an XHTML web page for humans from the RDF/XML. 5. Semantic clients just use the RDF/XML.

18 RAX Content Negotiation web server GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134 Content-type: application/rdf+xml http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf I cannot send a specimen! RDF/XML file I am a computer. Send me http://www.cyberfloralouisiana.com/specimens/lsu000/0134

19 RAX Content Negotiation “I am a human. Send me http://www.cyberfloralouisiana.com/specimens/lsu000/0134” web server GET http://www.cyberfloralouisiana.com/specimens/lsu000/0134 Content-type: text/html Duh, what’s that mean? He gets RDF anyway. http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf RDF/XML file what the web browser shows

20 Static file structure for RAX http://www.cyberfloralouisiana.com/specimensnlu0009506.rdflsu0000134.rdf0435.rdf0532.rdfguid-o-matic.xslnlu0090505.rdf The specimen having barcode LSU0000134 is identified by the URI http://www.cyberfloralouisiana.com/specimens/lsu000/0134 Its RDF formatted metadata is in the file http://www.cyberfloralouisiana.com/specimens/lsu000/0134.rdf

21 Asynchronous JavaScript And XML (AJAX) RDF/XML metadata in the files vanderbilt/4-145.rdf (the tree) baskauf/79687.rdf (an image) baskauf/79695.rdf (another image), etc. JavaScript in the file metadata.htm retrieves metadata XHTML web page created using those metadata as seen by a human being

22 Redirection, Javascript, and XSLT (REJAX) method 1. Client requests extension-less URI. 2. Server does content negotiation based on requested content-type. 3. Semantic clients are sent the RDF/XML. 4. Web browsers are sent a TEXT/HTML webpage which uses JavaScript (i.e. AJAX) to open RDF/XML files and obtain the metadata required to construct the web page. The JavaScript can also retrieve blocks of XSLT formatted RDF data.

23 REJAX Content Negotiation web server GET http://bioimages.vanderbilt.edu/vanderbilt/4-145 Content-type: application/rdf+xml http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf I cannot send a tree! I’ll send information about the tree. RDF/XML file “I am a computer. Send me http://bioimages.vanderbilt.edu/vanderbilt/4-145”

24 REJAX Content Negotiation “I am a human. Send me http://bioimages.vanderbilt.edu/vanderbilt/4-145” web server GET http://bioimages.vanderbilt.edu/vanderbilt/4-145 Content-type: text/html Got it. I’ll send XHTML. http://bioimages.vanderbilt.edu/vanderbilt/4-145.htm XHTML file web page created by JavaScript

25 http://bioimages.vanderbilt.eduind-baskauf66920.rdf66920.htm70905.rdf70905.htmvanderbilt4-145.rdf4-145.htm 7-314.rdf etc. metadata.htm The tree identified by the URI http://bioimages.vanderbilt.edu/vanderbilt/4-145 has RDF metadata in the file http://bioimages.vanderbilt.edu/vanderbilt/4-145.rdf while the file http://bioimages.vanderbilt.edu/vanderbilt/4-145.htm passes information to the javascript in http://bioimages.vanderbilt.edu/metadata.htm? vanderbilt/4-145/metadata/ind/etc. Static file structure for REJAX

26 Comparison of RAX and REJAX SimilaritiesDifferences Both use static files. Both will work offline with at least some browsers. Both require modification of only a single file to change the appearance of the web page. RAX uses metadata from a single RDF file while REJAX inputs metadata from several RDF files. RAX simply displays the metadata for one or more closely related resources while REJAX allows the user to interact with many resources in complex ways. RAX and REJAX are not programs or languages. They are simple content-negotiation methods that make use of the RDF/XML required by the Linked Data concept to create web pages.

27 Back to economics… Cost reduction Risk is lowered because they can operate on a generic web server with no server-side scripting. No maintenance required once set up (although a minor server rewrite rule is required). Little time must be invested – existing database can be used to provide metadata and implementation can be immediate. Scalable: URIs are such that static files can be replaced at any time by server-side scripting.

28 What about the RDF? RAX (specimen record) single RDF file using hash URIs … [metadata about the individual] … … [metadata about the determination] … … [metadata about the specimen] … etc.

29 What about the RDF? REJAX (live plant image records) using multiple RDF files … [metadata about the individual] … … [metadata about the determination] … DigitalStillImage … [metadata about the image] …

30 The importance of separation of resources in the RDF … [metadata about the individual] … … [metadata about the specimen] … … [metadata about the image] … This file is served from the herbarium’s website This file is served from the image repository’s website See Biodiversity Informatics 7:17-44 for much more on this.

31 Guid-O-Matic 1.Create CSV export containing terms that vary among specimens. 3. Create a directory to hold the RDF files. 2. Download guid-o-matic.exe (200 kB) from http://bioimages.vanderbilt.edu/guid-o-matic (no installation required). 4. Enter (one time) the stuff about your institution that doesn’t change. 5.Click this button and poof! the RDF files appear in the directory you created. 6. Re-publish your website using WinSCP or whatever.

32 What’s the point??? Appropriate design of the RDF structure allows for both – simple methods of generating a representation for humans – semantic clients drawing correct inferences about the relationships among resources The human end user doesn’t care about this and doesn’t have to know about it (they’ll just see the web page. The raw data provider shouldn’t have to worry about what RDF is or how to use it (They just need some simple software to map their data correctly!).

33 Economics: benefits to small users Serving the files from the user’s own web server allows the users to brand their GUIDs by including their own domain name rather than that of an external host. Clickable attribution on websites Reference link in PDF publication citations. Instant iPhone “app” to access collection metadata. XSLT can easily be modified to meet the needs of the users, e.g. QR codes on displays.

34 QR code on a museum display

35 Try these on your portable device (iPhone=yes, others=?) Juncus diffusissimus specimen at the LSU herbarium http://www.cyberfloralouisiana.com/specimens/lsu000/0428 The “Bicentennial Oak” in Vanderbilt’s arboretum http://bioimages.vanderbilt.edu/vanderbilt/7-314 RAX exampleREJAX example

36 Summary It is possible for GUIDs of the HTTP URI form to be implemented right now, even by users with very few IT resources. Restricting the format of the URIs to a simple structure (no weird characters, short, slashes to indicate hierarchy) prevents dependence on a particular delivery method (you can change your mind later). Making HTTP URI GUIDs actionable (i.e. resolvable in XHTML) in a simple way provides immediate benefits to the issuer even if the RDF is never used by a semantic client. Making it practical to implement resolvable GUIDs on a large scale increases the likelihood that semantic web- based databases will evolve because the economics are shifted toward their favor (solution to chicken and egg problem).

37 References Links from Bioimages GUID page http://bioimages.vanderbilt.edu/pages/guid.htm http://bioimages.vanderbilt.edu/pages/guid.htm TDWG GUID/LSID applicability statement http://www.tdwg.org/stdtrack/article/download/150/51 http://www.tdwg.org/stdtrack/article/download/150/51 Cool URIs don't change (Tim Berners-Lee) http://www.w3.org/Provider/Style/URI http://www.w3.org/Provider/Style/URI Cool URIs for the Semantic Web http://www.w3.org/TR/cooluris/ http://www.w3.org/TR/cooluris/ Recommendations for implementation of guids in the SERNEC collections community http://bioimages.vanderbilt.edu/guid http://bioimages.vanderbilt.edu/guid Biodiversity Informatics 7:17-44 https://journals.ku.edu/index.php/jbi/article/view/3664 https://journals.ku.edu/index.php/jbi/article/view/3664 Note: this PowerPoint will be linked from the first URL below (QR code at right loads the URL).


Download ppt "Full implementation of GUIDs at SERNEC institutions: A strategy that accommodates institutions of varying sizes and complex resource relationships Steven."

Similar presentations


Ads by Google