PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress rgue@loc.gov NDIIPP Partners Meeting July 21, 2010
Outline of presentation PREMIS in METS Toolbox (PiM) Authorities and vocabularies web service (id.loc.gov) NDIIPP Partners Meeting July 21, 2010
NDIIPP Partners Meeting PREMIS in METS toolbox Developed by Florida Center for Library Automation under contract with LC A set of open-source tools to support the implementation of PREMIS especially in the METS container format 3 components: validate, convert, describe Source code being made available: http://pimtoolbox.sourceforge.net NDIIPP Partners Meeting July 21, 2010
Describe: uses the DAITSS description service <premis> <ext> </premis> /a/real/file droid/jhove
Convert: between PREMIS and PREMIS in METS OR PREMIS in METS to PREMIS <mets> <premis> </mets> <premis/> xslt
Validate: PREMIS in METS document confirmation or errors <mets> <premis/> </mets> Schematron
Demo: http://pim.fcla.edu/ Audio file: http://lcweb2.loc.gov/diglib/ihas/loc.natlib.ihas.200150574/default.html http://lcweb2.loc.gov/natlib/ihas/service/sousa/200150574/0001.mp3 PDF file: describe demo.pdf Image: http://lcweb2.loc.gov/diglib/ihas/loc.natlib.gottlieb.09601/default.html NDIIPP Partners Meeting July 21, 2010
Authorities and vocabularies web service id.loc.gov Makes LC owned and maintained authorities vocabularies available as Linked Data Allows both human-oriented and programmatic access to LC-promulgated authorities and vocabularies. First offering was LCSH; later additional vocabularies added Search and download available NDIIPP Partners Meeting July 21, 2010
Why establish controlled vocabularies? Control values that occur in metadata Reduce ambiguity Control synonyms Document and publish for reuse Test and validate terms Establish formal relationships among terms (where appropriate) Includes enumerated values in schemas, formal thesauri, code lists, etc. Many metadata schemes allow for content from other sources. Some data elements may be more useful if a controlled vocabulary is used. Some are published formally, others are developed and used locally. Formal controlled vocabularies may be used for testing and validation of terms– this is often done in integrated library systems, where bibliographic records may validate against authority records. This is one instance of testing and validation of terms. There is work being done on establishing metadata registries for both documentation and machine validation of both controlled vocabularies and metadata elements/terms. This could be particularly useful for controlled vocabularies, since their usefulness depends on consistency. NDIIPP Partners Meeting July 21, 2010
Standards maintained at LC that contain controlled vocabularies LCSH/NAF Thesaurus of Graphic Materials MARC Code lists: GACs, countries, languages ISO 639-2 and ISO 639-5 (language codes) Other MARC controlled lists Enumerated lists in XML schemas MODS enumerated values METS enumerated values MIX (Technical metadata for digital still images) PREMIS controlled vocabularies Others… NDIIPP Partners Meeting July 21, 2010
Simple Knowledge Organization System (SKOS) RDF application used to express knowledge organization systems such as thesauri, taxonomies and the concepts within. SKOS has a defined element set which is particularly relevant for controlled vocabularies Relationships between concepts in a concept scheme can be expressed (e.g. broader, narrower) and between concepts in different schemes Having a dereferencable URI for concepts and their concept schemes enhances the ability to provide web services for consumers of these standards Maintained by W3C Semantic Web Deployment Group NDIIPP Partners Meeting July 21, 2010
NDIIPP Partners Meeting “Linked Data” A feature of the “Semantic Web” where links are made between resources Goes beyond hypertext links (i.e. between web pages) but between any kind of object or concept From Wikipedia: "a term used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web” Users can use links to find similar resources and aggregate results Interaction between data relies on URIs NDIIPP Partners Meeting July 21, 2010
Reasons for developing a web service for vocabularies Facilitate development and maintenance process for vocabularies Make controlled lists openly available Provide comprehensive information about controlled terms Experiment with semantic web technologies and linked data Expose vocabularies to wider communities NDIIPP Partners Meeting July 21, 2010
NDIIPP Partners Meeting URIs in id.loc.gov Interaction with any given individual term and vocabulary is with its URI Some examples of URIs: http://id.loc.gov/vocabulary/relators/art http://id.loc.gov/vocabulary/graphicMaterials/tgm005222 http://id.loc.gov/vocabulary/preservationEvents/migration http://id.loc.gov/authorities/sh85063136 Known-label searches: use when you know the label but not the identifier http://id.loc.gov/vocabulary/relators/label/artist http://id.loc.gov/authorities/label/hunting%20dogs Link goes to RDFa It! Visualizations, “Folk music” Real hook of site is content negotiation NDIIPP Partners Meeting July 21, 2010
Technical infrastructure Django (Python) LCSH MySQL SKOS RDF generated at time of request Operates, more or less, as traditional relational DB MARC mapped to relational DB tables Everything else RDFlib (Python library, uses MySQL as triplestore) Runs on triples XML to SKOS RDF/XML before ingest XSL, Xquery used NDIIPP Partners Meeting July 21, 2010
NDIIPP Partners Meeting Next steps MADS OWL Schema to enable identification of facets e.g. Aeronautics--Soviet Union—History Enhance existing vocabularies to show relationships Broader/narrower relator terms Matches to other vocabulary terms (e.g. MARC vs. ISO 3166 country codes) Add new vocabularies PREMIS controlled vocabularies MARC country, geographic area, languages ISO 639-2 and 639-5 Name authorities Enhance PiM to validate PREMIS vocabulary terms NDIIPP Partners Meeting July 21, 2010