SEMANTIC UNITS PERTAINING TO OBJECTS. Object entity  Aggregates characteristics relevant to preservation management that are properties of the object.

Slides:



Advertisements
Similar presentations
PRESERVATION METADATA: IMPLEMENTATION STRATEGIES Preservation Metadata: The PREMIS Experience Priscilla Caplan Florida Center for Library Automation (FCLA)
Advertisements

The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
PREMIS Conformance. Agenda 1.NLNZ and NLB conformance exercise 2.History of PREMIS Conformance 3.Current status 4.Mapping to functionality.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Data Model Changes IEs and Environments. The PREMIS Data Model Intellectual Entity Object Event Agent Rights identifiers identifier identifiers Slide.
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Preservation Metadata Extraction and Collection : Tools and Techniques Mat Black National Library of New Zealand Te Puna Matauranga o Aotearoa.
Common Use Cases for Preservation Metadata Deborah Woodyard-Robinson Digital Preservation Consultant Long-term Repositories:
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
3. Technical and administrative metadata standards Metadata Standards and Applications.
PREMIS What is PREMIS? – Preservation Metadata Implementation Strategies When is PREMIS use? – PREMIS is used for “repository design, evaluation, and archived.
US GPO AIP Independence Test CS 496A – Senior Design Fall 2010 Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
Descriptive Metadata o When will mods.xml be used by METS (aip.xml) ?  METS will use the mods.xml to encode descriptive metadata. Information that describes,
Cryptography and Network Security Chapter 15 Fourth Edition by William Stallings Lecture slides by Lawrie Brown.
Image Metadata Summary of 4/18/99 NISO/DLF Image Metadata Meeting ( Howard Besser UCLA School of Education & Information.
Long-term Archive Service Requirements draft-ietf-ltans-reqs-00.txt.
Digital Signature Xiaoyan Guo/ Xiaohang Luo/
PREMIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Secure Systems Research Group - FAU Patterns for Digital Signature using hashing Presented by Keiko Hashizume.
Lecture 9: Security via PGP CS 436/636/736 Spring 2012 Nitesh Saxena.
S/MIME and CMS Presentation for CSE712 By Yi Wen Instructor: Dr. Aidong Zhang.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
XML Signature Prabath Siriwardena Director, Security Architecture.
What Agencies Should Know About PDF/A September 20, 2005 Susan J. Sullivan, CRM
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
WS-Security: SOAP Message Security Web-enhanced Information Management (WHIM) Justin R. Wang Professor Kaiser.
Access Across Time: How the NAA Preserves Digital Records Andrew Wilson Assistant Director, Preservation.
SEMANTIC UNITS PERTAINING TO OBJECTS. Object entity  Aggregates characteristics relevant to preservation management that are properties of the object.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Provenance & Context Workshop - Guiding Documents.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Lifecycle Metadata for Digital Objects October 18, 2004 Transfer / Authenticity Metadata.
Conceptual Data Modelling for Digital Preservation Planets and PREMIS Angela Dappert.
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
IAD 2263: System Analysis and Design Chapter 7: Designing System Databases, Interfaces and Security.
AGENTS, RIGHTS, EVENTS. Agents  The Agent entity aggregates information about agents (persons, organizations, or software) associated with rights management.
PREMIS Controlled vocabularies Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair Vienna,
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Lifecycle Metadata for Digital Objects October 9, 2002 Transfer / Authenticity Metadata.
An Introduction to PREMIS Jenn Riley Metadata Librarian IU Digital Library Program.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
2/26/2004 Dan Swaney 1 Preservation Metadata and the OAIS Information Model A Metadata Framework to Support the Preservation of Digital Objects A review.
AXF – Archive eXchange Format Report of AXF WG to TC-31FS 6 December, 2012.
Joint Meeting of CSUL Committees,
FLORIDA CENTER FOR LIBRARY AUTOMATION
DAITSS: Dark Archive in the Sunshine State
Metadata for research outputs management
Implementing an Institutional Repository: Part II
Metadata in Digital Preservation: Setting the Scene
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Image Metadata Summary of 4/18/99 NISO/DLF Image Metadata Meeting
Presentation transcript:

SEMANTIC UNITS PERTAINING TO OBJECTS

Object entity  Aggregates characteristics relevant to preservation management that are properties of the object  Semantic units may not all be applicable to each type of object (representation, file, bitstream)  Main types of information identifier object characteristics creation information software and hardware environment digital signatures relationships to other objects links to other types of entity

preservationLevel and objectCategory  objectCategory Values: representation, file, bitstream preservationLevel What preservation treatment/strategy the repository plans for this object Varying preservation options dependent on factors such as value, uniqueness, preservability of format A business rule only relevant in a given repository Examples: full, bit-level Now mandatory, but revision will change to optional Revision is adding more structure to indicate context (role, rationale, date assigned)

objectCharacteristics  Applicable only to file and bitstream (although some have needed it for representation)  Technical properties common to all/most file formats, not format specific  Container for subunits: compositionLevel fixity size format significantProperties (to be moved in v. 2) inhibitors

fixity  Information used to verify whether an object has been altered; compare message digests (“checksums”) calculated at different times  Container for messageDigestAlgorithm, messageDigest, messageDigestOriginator  Automatically calculated and recorded by repository  messageDigestAlgorithm: controlled vocabulary, example: SHA-1  messageDigest: output of message digest algorithm  messageDigestOriginator: agent that created original message digest; could be a string or a pointer Example: fixity messageDigestAlgorithm=Adler-32 messageDigest=7c9b35da messageDigestOriginator=OCLC

format  Identifies the format of a file or bitstream  Container semantic unit  Preservation activities depend on detailed and accurate knowledge about formats  Should be ascertained by repository on ingest (for example, using JHOVE)  May be a format name (formatDesignation) or a pointer into a registry (formatRegistry)  Will be changed to repeatable in v. 2 to associate a format designation with a particular format registry)

formatDesignation and formatRegistry  formatDesignation Identifies the format of an object by name and version Format may be a matter of opinion: Is it text, xml, or METS? MIME type is most widely used authority list May need more granularity; may be multipart (tiff 6.0/geotiff)  formatRegistry Identifies format by reference to an entry in a format registry Detailed specifications on formats may be contained in a future format registry formatRegistryName, formatRegistryKey, formatRegistryRole Role includes purpose or expected use

Examples of format formatDesignation formatName=.eps formatVersion=2.0 formatRegistry formatRegistryName=PRONOM formatRegistryKey=eps formatRegistryRole=Basic formatDesignation formatName=PDF formatVersion=1.5 formatRegistry formatRegistryName=LC digital format descriptions formatRegistryKey=fdd formatRegistryRole=assessment

significantProperties  Characteristics of an object considered by a repository to be important to maintain through preservation actions  May apply to all objects of a certain class or may be unique to each individual object  May be determined by business rules of the repository  Not an intrinsic property of an object; a particular archive's assessment of which of the object's properties need to persist over time  Related to the preservation strategy chosen by the archive  Listing significant properties implies that the repository plans to preserve those properties and would note any modifications to them in eventOutcome  Revision is adding more structure to indicate aspects or facets of an object  Further work is needed in determining and describing significant properties

Examples of significantProperties  For a PDF with embedded links that are not essential use: “Content only”  For a TIFF file: “Color accuracy (Adobe RGB 1998)”  For a Web page: “One of two embedded FLASH files for splash page”  Revision in v. 2: Example 1: significantPropertiesType = “behavior” significantPropertiesValue = “editable” Example 2: significantPropertiesType = “page width” significantPropertiesValue = “210 mm”

inhibitors  Features of the object intended to inhibit access, use or migration  It is necessary to record the kind of encryption and the access key to allow future use of the object  Applicable to file and bitstream  inhibitorType Inhibitor method employed, e.g. “DES”, “password protection”  inhibitorTarget The content or function protected, e.g. “function: print”  inhibitorKey The decryption key or password Example: inhibitors inhibitorType=DES inhibitorTarget=all content inhibitorKey=[DES encryption key]

compositionLevel  An indication of whether the object is subject to one or more processes of decoding or unbundling  How to describe layers of encodings so they can be correctly reversed? Treat each layer as a “composition level” Repeat description of object characteristics for each composition level  A file with no compression and no encryption has compositionLevel 0 (zero)  Each layer of encoding results in new format and incremented compositionLevel  Only applies if object is encrypted or compressed  Value is an integer

Files again  FILE = a named and ordered sequence of bytes that is known by an operating system. chapter1.pdf photo.tiff mapofGlasgow.jp2  Can be zero or more bytes  Has a file format  Has access permissions and file system statistics such as size and modification date

Bitstreams again  BITSTREAM = contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes. the video stream within an AVI file an image within a TIFF file  Not known to operating system  Can be located by starting position within the file  Can not stand alone as a file without the addition of a header, other structure, or reformatting

But some files aren’t that simple  format = PDF  size = 500,000 bytes  messageDigest = [something] chapter1.pdf chapter1.gz Unix gzip utility  format = gzip  size = 324,876 bytes  messageDigest = [something else]

compositionLevel 0 fixitymessage Digest Algorithm SHA-1 fixitymessage Digest [big string] fixitymessage Digest Originato Submitter size format Designa- tion format Name PDF format Designa- tion format Version 1.2 chapter1.pdf composition Level 1 fixitymessage Digest Algorithm SHA-1 fixitymessage Digest [another string] fixitymessage Digest Originator Repository size format Designa- tion format Name gzip format Designa- tion format Version chapter1.pdf.gz

Ok, but what if you have this: package.tar file1.pdf file2.pdf Inside the TAR file, file1 and file2 are simple PDF files. Neither the containing TAR nor the contained PDFs are encrypted or compressed.

Then you have 3 objects! package.tar file1.pdf file2.pdf package.tar is a file object with compositionLevel 0 and a storageLocation in the file system file1.pdf is a file object with compositionLevel 0 and a storageLocation as an offset in package.tar file2.pdf is a file object with compositionLevel 0 and a storageLocation as an offset in package.tar

In conclusion  Remember: Composition level increments only when you have a single file object with multiple successive encodings.  Bonus question: why aren’t the PDF files within package.tar considered bitstream objects?  Because the PDFs inside the TAR are independently interpretable

Creation information  creatingApplication Information about application which created object Useful for later problem solving Container with 3 subunits: name, version, date Applies to objects created externally or by repository, e.g. by migration event Repeatable if more than one application processed it Example: MS Word 2000 [date created] In v. 2 moving under objectCharacteristics  originalName Name of object as submitted to or harvested by repository Supplements repository supplied names Only applicable to files (but may be extended to representations)

storage  How and where the object is stored  Container for contentLocation and storageMedium  May be repeated if more than one identical copy in a different location  contentLocation Information needed to retrieve a file from a system or a bitstream from within a file Subunits type and value Could be fully qualified path or identifier used by storage system; for bitstream a byte offset  storageMedium Physical medium on which the object is stored Useful for media management (e.g. media migration) May be name of system that knows the medium Examples: hard disk, TSM

Example of creation information and storage creatingApplication creatingApplicationName=Adobe Acrobat creatingApplicationVersion=5.0 dateCreatedByApplication=2004 originalName=main.pdf storage contentLocation contentLocationType=FDA contentLocationValue=fda/prod/data/out/classa/ DF storageMedium=3590 [a type of tape unit]

Environment  What is needed to render or use an object Operating system Application software Computing resources  Why is obligation optional? Preservation strategies may differ in need for this information (e.g., may be unneeded for bit-level preservation) We currently lack practical methods to collect and store this information  Relevance to long-term preservation: Ability to render an object and interact with its content may depend on knowing these technical details  Applies to all types of object (representation, file, bitstream)

Environment semantic units  environmentCharacteristic Multiple environments can support an object, but often not equally well Suggested values: unspecified, known to work, minimum, recommended Repository does not need to record all possible environments  environmentPurpose Use supported by the specified environment Suggested values: render, edit example: for x.pdf Adobe Acrobat (edit), Adobe Reader (render)

Environment semantic units (cont.)  software and hardware identify by name, version, type (broad category) Many may apply; at least one should be recorded  dependency non-software component or file needed dependency vs. swDependency e.g. fonts, schemas, stylesheets name and identifier  environmentNote Any additional information Should not be used as substitute for more rigorous description

Environment example: ETD (PDF file)  environmentCharacteristic=known to work  environmentPurpose=render  software/swName= Mozilla Firefox  software/swVersion= 1.0  software/swType=renderer  swOtherInformation=requires swDependencies as plug-ins  software/swDependency= Adobe Acrobat Reader 7.0  software/swDependency= RealPlayer 10  software/swName= Windows NT  software/swVersion=5.0  software/swType=operatingSystem  hardware/hwName=Intel Pentium II  hardware/hwType=processor  dependency/dependencyName=Mathematica 5.2 True Type math fonts

Environment registries  Information may be complex and increasingly granular  Information often applies to whole class of objects  PREMIS does not assume the existence of an environment registry, but defines the information that would be needed in one  PRONOM has some elements of environment registry for any file extension, gives list of software that can create render identify validate extract metadata from

Digital signatures  In a transaction, verifies the identify of the sender and that the file was unchanged in transmission.  Some archives sign stored objects for verification in the future.  PREMIS digital signature semantic units are based on W3C’s XML Signature Syntax and Processing de facto standard for encoding signature information PREMIS adopts structure/semantics where possible Some departures: e.g., PREMIS permits a given signature to be a property of only 1 object. Version 2 will use XML signatures for signature key

signatureInformation Container  Who signed it? signer (name or pointer to an Agent)  How was it signed? signatureInformationEncoding (e.g., Base64) signatureMethod (e.g., DSA-SHA1)  How can we validate it? signatureValidationRules (could be a pointer to documentation for the validation procedure) signatureProperties (additional information) keyInformation: the signer’s public key and other info Type: e.g., DSA, RSA, PGP, etc. Other info: e.g., certificate, revocation list, etc.  And of course, the signature itself

signatureInformation example signatureInformation signatureInformationEncoding=base64 signer=Florida Digital Archive signatureMethod=RSA-SHA1 signatureValue=MC0CFFrVLtRlkMc3Daon4BqqnkhCOTFEALE= signatureValidationRules=T1=C1 signatureProperties= T12:25:14-05:00 keyInformation keyType=x509v3-sign-rsa2 keyValue= keyvalue