Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unified Digital Format Registry a semantic registry for digital preservation UDFR: A Semantic Registry for Format Representation Information Lisa Dawn.

Similar presentations


Presentation on theme: "Unified Digital Format Registry a semantic registry for digital preservation UDFR: A Semantic Registry for Format Representation Information Lisa Dawn."— Presentation transcript:

1 Unified Digital Format Registry a semantic registry for digital preservation UDFR: A Semantic Registry for Format Representation Information Lisa Dawn Colvin Abhishek Salve Stephen Abrams UC Curation Center California Digital Library Digital Library Federation Forum Baltimore, October 31-November 2, 2011

2 Unified Digital Format Registry a semantic registry for digital preservation Outline What Why How When

3 Unified Digital Format Registry a semantic registry for digital preservation Why formats? “Format” is the dividing line between bits and information ffd8ffe000104a ffed0fb f746f73686f e d03e90a e e666f f40240ffeeffee fc d SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2... SyntaxSemantics

4 Unified Digital Format Registry a semantic registry for digital preservation Why formats? There are many necessary preservation activities that can be usefully performed on bits qua bits But to preserve information you most act on formatted bits and know what those formats mean Preservation of syntax and semantics

5 Unified Digital Format Registry a semantic registry for digital preservation Unified Digital Format Registry “A reliable, publicly accessible, and sustainable knowledge base of file format representation information for use by the digital preservation community” “Unification” of the function and holdings of PRONOM and GDFR Open source platform / GPL Semantic wiki Funded by the Library of Congress

6 Unified Digital Format Registry a semantic registry for digital preservation Timeline PRONOM – National Archives [UK], “ready access to reliable technical information about the nature of electronic records” JHOVE – Harvard, “digital object validation and characterization” GDFR – Harvard/OCLC, “a distributed and replicated registry of format information populated and vetted by experts and enthusiasts world- wide”

7 Unified Digital Format Registry a semantic registry for digital preservation Timeline UDFR – Ad hoc stakeholder community, 2009 Resolve PRONOM IPR issues and develop a community- supported open source solution Advance beyond legacy RDBMS and XML database technology UDFR – CDL, January “ a semantic registry for digital preservation” Stakeholder meeting, April 2011 Beta release, November 2011 Production release, January 2012

8 Unified Digital Format Registry a semantic registry for digital preservation Representation information What you need to know about something in order to exploit that thing meaningfully [OAIS/ISO 14720] Information that lets you answer important preservation questions What format is it? What are its significant properties? Is it valid? Is it at risk? How can I render/play/read it? What can it be transformed into? And how?

9 Unified Digital Format Registry a semantic registry for digital preservation Why semantic? Everyone wants to say something about everything The semantic web lets anyone say anything about anything Understandable to both people and machines

10 Unified Digital Format Registry a semantic registry for digital preservation Data modeling Abstract Base Abstract Product Abstract Format File Format Character Encoding Compression Algorithm Media Hardware Software Document File Agent IPR specification reference file holder owner creator maintainer ipr Controlled Vocabulary … … Holding Process embodies product input / output dependency Abstract Signature External Signature Internal Signature signature Digest digest Assessment Grammar grammar assessment holder

11 Unified Digital Format Registry a semantic registry for digital preservation Provenance “Trust, but verify” Complete change history at the assertion level, including – Who made the assertion, and when? – Confidence based on personal and institutional reputation Imprimatur by technically knowledgeable reviewers

12 Unified Digital Format Registry a semantic registry for digital preservation Ontologies

13 Unified Digital Format Registry a semantic registry for digital preservation Technology stack Ontowiki Ontowiki Virtuoso 4store Virtuoso 4store Zend framework Zend framework PHP PHP Apache httpd Apache httpd RDF RDF JavaScript / CSS HTTP / SPARQL Erfurt / RDFAuthor https://github.com/AKSW/RDFauthor Erfurt / RDFAuthor https://github.com/AKSW/RDFauthor

14 Unified Digital Format Registry a semantic registry for digital preservation Initial population Export from PRONOM Working with TNA to identify appropriate subset Transform to cross-walk modeling differences

15 Unified Digital Format Registry a semantic registry for digital preservation Licensing Code is available under GPLv3 Hosted on BitBucket Data is contributed and available under CC-BY Consistent with UK open government license applicable to PRONOM data

16 Unified Digital Format Registry a semantic registry for digital preservation Demo

17 Unified Digital Format Registry a semantic registry for digital preservation Lessons learned People with semantic experience are scarce Too much time evaluating/prototyping potential technology choices More difficulty than anticipated integrating disparate open source products 0.x software is often numbered that for a reason Feature lists aren’t (always)

18 Unified Digital Format Registry a semantic registry for digital preservation Lessons learned Availability of a worldwide selection of products is a good thing Excellent support from AKWS/Universität Leipzig Modeling differences RDF (non-)standards VM deployment Disparate IT organizations supporting dev/prod instances (except when you don’t read German)

19 Unified Digital Format Registry a semantic registry for digital preservation Next steps Long-term governance and operational support Technical maintenance and enhancement Replication/synchronization Building contributor and reviewer communities

20 Unified Digital Format Registry a semantic registry for digital preservation For more information UDFR PRONOM GDFR OntoWiki Virtuoso Agile Knowledge and Semantic Web (AKSW), Universität Leipzig UC3 Stephen AbramsMark Reyes Lisa ColvinAbhishek Salve Patricia CruseTracy Seneca Scott FisherJoan Starr Erik HetznerCarly Strasser Greg JanéeMarisa Strong John KunzeAdrian Turner Margaret LowPerry Willett David Loy


Download ppt "Unified Digital Format Registry a semantic registry for digital preservation UDFR: A Semantic Registry for Format Representation Information Lisa Dawn."

Similar presentations


Ads by Google