Unified Digital Format Registry a semantic registry for digital preservation Sustaining the Unified Digital Format Registry (UDFR) Stephen Abrams UC Curation.

Slides:



Advertisements
Similar presentations
September, 2005What IHE Delivers 1 Key Image Notes Evidence Documents Simple Image & Numeric Report Access to Radiology Information IHE Vendors Workshop.
Advertisements

June 28-29, 2005IHE Interoperability Workshop 1 Integrating the Healthcare Enterprise Cross-enterprise Document Sharing for Imaging (XDS-I) Rita Noumeir.
DAML Ontology Library Mike Dean OntoLog Forum 28 February
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
EPrints - Introducing EPrints 3 Software William J Nixon Digital Library Development Manager, University of Glasgow With many thanks to Les Carr and the.
Technical Highlights 25th August 2011 Sebastian Peters German National Library of Science and Technology.
Preserv Preservation Eprint Services Simple Preservation Services – towards Proactive Support for the Institutional Repository.
METS: Metadata Encoding & Transmission Standard Merrilee Proffitt Society of American Archivists August 2002.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
DRIVER Building a worldwide scientific data repository infrastructure in support of scholarly communication 1 JISC/CNI Conference, Belfast, July.
DRIVER Long Term Preservation for Enhanced Publications in the DRIVER Infrastructure 1 WePreserve Workshop, October 2008 Dale Peters, Scientific Technical.
Introduction to Planets Hans Hofman Nationaal Archief Netherlands Prague, 17 October 2008.
Creating Institutional Repositories Stephen Pinfield.
Overlay journals at UCL: the EPICURE project Martin Moyle LEAP Members Meeting, QMUL, 07 December 2011.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
Introduction to METS (Metadata Encoding and Transmission Standard) Jerome McDonough New York University
UKOLN, University of Bath
A centre of expertise in digital information management UKOLN is supported by: Memory institutions and the social fabric of the Web Dr.
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
Collaborating to Compile Information about Formats The vision, the current state, and the challenges for format registries Caroline R. Arms Library of.
A centre of expertise in data curation and preservation DigCCur2007 Symposium, Chapel Hill, N.C., April 18-20, 2007 Co-operation for digital preservation.
A centre of expertise in data curation and preservation DCC Workshop: Curating sApril 24 – 25, 2006 Funded by: This work is licensed under the Creative.
A centre of expertise in data curation and preservation London :: ARK Group Workshop: Archiving the Web :: 28 Sept 2006 Funded by: This work is licensed.
A centre of expertise in data curation and preservation SoA Annual Conference::York::August 2008 Funded by: This work is licensed under the Creative Commons.
A centre of expertise in data curation and preservation CETIS MDR SIG::28 June 2006::University of Bath Funded by: This work is licensed under the Creative.
Unified Digital Format Registry a semantic registry for digital preservation UDFR: A Semantic Registry for Format Representation Information Lisa Dawn.
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
W3C and RDF. Why OCLC is a W3C Member Access to networked information resources –the browser and online access –the breath and depth of networked information.
A Micro-Services-Based Approach for Curation and Preservation Solutions Stephen Abrams Patricia Cruse John Kunze Perry Willett University of California.
Cost Modeling for Sustainable Services Stephen Abrams Patricia Cruse John Kunze University of California Curation Center California Digital Library Preservation.
May 16, 2012EDMC Workshop in College Park MDDan Kowal Trusted Digital Repositories: A New Audit Standard A Follow-on to the OAIS Dan Kowal, Data Administrator,
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
OWASP Secure Coding Practices Quick Reference Guide
Unified Digital Format Registry a semantic registry for digital preservation Unified Digital Format Registry (UDFR) Overview and Next Steps to an Operational.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Bibliographic Framework Initiative Approach for MARC Data as Linked Data Sally McCallum Library of Congress.
SCAPE Carl Wilson Open Planets Foundation SCAPE Training Guimarães Characterisation An introduction to the identification and characterisation of.
OCLC Digital Archive Overview Judith Cobb LIPA Meeting July 2006.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Metadata Descriptions statements descriptions records.
Linked Library Data Miiya Holmes October 6-7, 2012.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
LIFECYCLE METADATA FOR DIGITAL OBJECTS Danielle Cunniff Plumer School of Information The University of Texas at Austin Summer 2014.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Data modeling Goal: Agree on data modeling process and ontology.
Unified Digital Format Registry (UDFR) Stakeholder Meeting Library of Congress Washington, DC April 13, 14, 2011.
Beyond Borders SAA Annual Meeting San Diego, August 5-9, 2012 University of California Curation Center California Digital Library Stephen Abrams Unified.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
3. Technical and administrative metadata standards Metadata Standards and Applications.
The NSDL Registry Diane Hillmann  Jon Phipps. What We’re Doing Received an NSF grant in Oct. 2006, to: Register metadata schemas, vocabularies, application.
Co-funded by the European Union under FP7-ICT Alliance Permanent Access to the Records of Science in Europe Network Co-ordinated by aparsen.eu #APARSEN.
The NSDL Registry Jon Phipps Stuart Sutton Diane Hillmann Ryan Laundry Cornell U. U. of Washington.
Digital Preservation Dale Flecker Stephen Abrams February 15, 2007 HUL University Library Council.
3. Technical and administrative metadata standards Metadata Standards and Applications Workshop.
Robert Sharpe, Tessella PRELIDA Workshop 2013 ENSURE Linked Data Registry.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Update on UDFR (Unified Digital Format Registry) NDIIPP Meeting June 25, 2009 Andrea Goethals.
Interoperability through Library APIs Library Technology Services Open House 7/30/15.
ESRI User Conference, August 8, 2006 Long-term archiving of geospatial data: the NGDA project Julie Sweetkind-Singer John Banning Stanford University.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
JH VE 2 The Fifth International Conference on Preservation of Digital Objects British Library, September 2008 What? So What? The Next-Generation.
The DPubS Development Project: Building an Open Source Electronic Publishing System David Ruddy Cornell University Library.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Global Digital Format Registry Progress Andrea Goethals, Harvard University Library NDIIPP Digital Preservation Partners’ Meeting Arlington, VA July 9,
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
The National Archives Washington DC July 10, 2008
Presentation transcript:

Unified Digital Format Registry a semantic registry for digital preservation Sustaining the Unified Digital Format Registry (UDFR) Stephen Abrams UC Curation Center California Digital Library Digital Preservation 2012 Library of Congress, July 24-25, 2012

Unified Digital Format Registry a semantic registry for digital preservation Agenda Background Current status Demonstration Next steps

Unified Digital Format Registry a semantic registry for digital preservation Why formats? “Format” is the dividing line between bits and information ffd8ffe000104a ffed0fb f746f73686f e d03e90a e e666f f40240ffeeffee fc d SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2...

Unified Digital Format Registry a semantic registry for digital preservation Why formats? There are many necessary preservation activities that can be usefully performed on bits qua bits to preserve information you most act on formatted bits and know what those formats represent  Preservation of content syntax and semantics (both the structure and meaning of the digital representation)

Unified Digital Format Registry a semantic registry for digital preservation Unified Digital Format Registry “A reliable, publicly accessible, and sustainable knowledge base of file format representation information for use by the digital preservation community”  “Unification” of the function and holdings of PRONOM and GDFR, available July 3,  Funded by the Library of Congress  Open source platform / GPL  Semantic wiki

Unified Digital Format Registry a semantic registry for digital preservation A bit of history … PRONOM – National Archives [UK],  “ready access to reliable technical information about the nature of electronic records” JHOVE – Harvard,  “digital object validation and characterization” Global Digital Format Registry (GDFR) – Harvard/OCLC,  “a distributed and replicated registry of format information populated and vetted by experts and enthusiasts world-wide”

Unified Digital Format Registry a semantic registry for digital preservation A bit of history … Proto-UDFR – Ad hoc stakeholder community, 2009  Resolve PRONOM IPR issues and develop a community- supported open source solution  Advance beyond legacy RDBMS (PRONOM) and XMLDB (GDFR) technology UDFR – CDL, January  “a semantic registry for digital preservation”  LC/NDIIPP funded  Stakeholder meeting, April 2011  Beta release, November 2011  Production release, July 2012

Unified Digital Format Registry a semantic registry for digital preservation Representation information What you need to know about something in order to exploit that thing meaningfully [OAIS/ISO 14720] Information that lets you answer important preservation questions (directly or indirectly)  What format is it?  What are its significant properties?  Is it valid?  Is it at risk?  How can I render/play/read it?  What can it be transformed into?

Unified Digital Format Registry a semantic registry for digital preservation Why semantic? The semantic web lets anyone say anything about anything  Understandable to both people and machines The web is (or soon will be) a semantic web  Linked Data interoperability

Unified Digital Format Registry a semantic registry for digital preservation Why semantic? Triples all the way down…  Data expressed as triples  Data definition (i.e., ontology) expressed as triples  Ontology definition expressed as triples  … Facilitates self-configuration and easy extension  However, the form and function of a semantic wiki may be unfamiliar

Unified Digital Format Registry a semantic registry for digital preservation Provenance Open contribution  Self-registration, but no further barriers  Complete change history at the assertion level ● Who made the assertion, and when ● Confidence based on individual/institutional reputation  Imprimatur of technically knowledgeable reviewers “ Trust, but verify ”

Unified Digital Format Registry a semantic registry for digital preservation Roles ConsumerAnonymous read ContributorRead + write  Self-registration ReviewerRead + write + review  Administratively granted AdministratorRead + write + review + administer

Unified Digital Format Registry a semantic registry for digital preservation Technology stack OntoWiki OntoWiki Virtuoso quadstore Virtuoso quadstore Zend framework Zend framework PHP PHP Apache httpd Apache httpd RDF RDF RDFauthor/ JavaScript RDFauthor/ JavaScript HTTP / SPARQL HTTP / SPARQL Erfurt API Erfurt API Noid NOID Noid NOID

Unified Digital Format Registry a semantic registry for digital preservation Code repository All code (and ontologies) managed in public repositories at GitHub  OntoWiki Forked from  Erfurt Forked from  RDFauthor Forked from All CDL development available under GPL license

Unified Digital Format Registry a semantic registry for digital preservation UDFR schema Abstract Base Abstract Product Abstract Format File Format Character Encoding Compression Algorithm Media Hardware Software Document File Agent IPR specification reference file holder owner creator maintainer ipr Controlled Vocabulary … … Holding Process embodies product input / output dependency Abstract Signature External Signature Internal Signature signature Digest digest Assessment Grammar grammar assessment holder

Unified Digital Format Registry a semantic registry for digital preservation Code repository All ontologies (and code) managed in public repositories at GitHub  Ontologies ● udfrs[onto.owl]UDFR schema ● udfr[udfr.owl]UDFR instance data ● profile[profile.owl]UDFR user profiles

Unified Digital Format Registry a semantic registry for digital preservation Initial data loads PRONOM as of  846 file formats 28 character encodings 17 compression algorithms 1,237 identifiers 1,006 external signatures 494 internal signatures 71 MIME types (not in Appspot) 156 agents 268 software packages 2,080 software processes 23 IPR statements 217 relationships 8,274 Special thanks to TNA ► Spencer Ross ► Tracey Powell ► Tim Gollins 548 7,816 dedupulicated, June 2012

Unified Digital Format Registry a semantic registry for digital preservation Initial data loads MIME types from Appspot as of  “Routinely scrapped from IANA using code in the mediatypes Google Code project”  809 application/* 125 audio/* 39 image/* 19 message/* 14 model/* 14 multipart/* 51 text/* 56 video/* 1,127  Plus 71 defined by PRONOM

Unified Digital Format Registry a semantic registry for digital preservation Data licensing PRONOM data contributed under UK Open Government License (OGL) Other submissions contributed under under Creative Commons Attribution license (CC-BY)

Unified Digital Format Registry a semantic registry for digital preservation UI layout OntoWiki pane Register/login/logout SPARQL query form Documentation Session reset Knowledge base pane Ontology browser pane Register/login pane Workspace pane Function dependent

Unified Digital Format Registry a semantic registry for digital preservation Contextual menus Contextual menu

Unified Digital Format Registry a semantic registry for digital preservation User’s Guide

Unified Digital Format Registry a semantic registry for digital preservation Demonstration

Unified Digital Format Registry a semantic registry for digital preservation Next steps Operational control  CDL will continue to host the UDFR for one year while a more permanent hosting strategy can be identified Administrative control  The “admin” role – necessary for adding user privileges, modifying the ontologies, and bulk imports – is held by CDL staff  How can this responsibility be shared? Technical control  How to share “committer” responsibility for the codebase?  How to coordinate additional development activity?

Unified Digital Format Registry a semantic registry for digital preservation Next steps Technical development  Synchronization with PRONOM and other external sources of bulk imports  UI enhancements to provide lower-barrier learning curve  RESTful API (in additional to SPARQL endpoint)  Replication to mirror sites  Others? Bring under the OPF code repository/issue tracking umbrella

Unified Digital Format Registry a semantic registry for digital preservation Next steps Import additional data sources  Library of Congress Sustainability of Digital Formats  IT History Society hardware database  NIST NSRL (National Software Reference Library)  Stanford CPUdb  TOTEM (Trustworthy Online Technical Environment Metadata) database  Other candidates?  How important is merging?

Unified Digital Format Registry a semantic registry for digital preservation Next steps Encourage adoption and use  Identify an evangelist  Marketing/outreach  Cf. Chris Rusbridge’s blog posing the question, “What was the problem” that UDFR was trying to solve? Enable the reviewer function  Who will review? What are the criteria? Sustainable community governance  Who will make the decisions?

Unified Digital Format Registry a semantic registry for digital preservation Questions and discussion

Unified Digital Format Registry a semantic registry for digital preservation For more information UDFR (to subscribe, mail “SUB UDFR-L ” to OntoWiki Erfurt RDFauthor Zend Virtuoso AKSW, Universität Leipzig Philipp Frischmuth Norman Heino Sebastian Tramp National Archives, UK Tim GollinsTracey Powell Spencer Ross Library of Congress Martha AndersonLeslie Johnston UC Curation Center Stephen AbramsLisa Dawn Colvin Patricia CruseJohn Kunze Margaret LowMark Reyes Abhishek SalveMarisa Strong