Lifecycle Metadata for Digital Objects October 18, 2004 Transfer / Authenticity Metadata
Review of metadata seen Creation metadata Appraisal, records management, scheduling Transfer / authenticity not really covered except in terms of the ingest process
Transferring paper records I Metaphor for electronic process Metadata generated throughout Records Center Storage Approval Form –Agency approval signature –Description of materials Initial steps are significant for: –Setting up for secure transfer –Defining required metadata to make sense of records in storage Approval Number received for transmission –This step embeds schedule metadata
Transferring paper records II This stage defines formatting for: –Wrapper –Materials inside Pack and label correctly (agreed standard) –Use proper boxes –Label with identifiers (RM descriptors) –Pack in original order and approved arrangement –Number boxes in batch –Stack correctly Transmittal Form for batch –“Digest” of contents (this step a “handshake”) –Generates metadata for the transfer itself Access Codes received for boxes
The central problem: Security guaranteeing Authenticity Guarding the object (authenticity, integrity) Proving the identities of the people responsible for transferring the object (authentication, non-repudiation) Transferring the object in a secure way
Completeness and the moment of “recordness” Assertion that the object is complete (cf. UBC) Assertion that it is an archivable object Assertion that the asserter has the authority to create the record or archive it All these assertions may be system-supplied in the digital environment: –user logins –user role ID –identity of the workstation on the network –Creator’s action in performing a save
What is transfer about? First: it is a COPY What is a digital copy? What qualifies? –Data compression issues –Data segmentation issues –Creating application vs file-management application How can a digital copy be guaranteed accurate? Compare with original –Digital object as string of bits –Message digest of object as math on the bits –Ship the message digest with the object –Recalculate and compare at the other end
Moving from user to repository Using the public network securely Sending from user to repository –Virtual Private Network (VPN) –Secure Sockets Layer (SSL) “Secure drop-box” technology –Separate “hardened” server (between “DMZ”s) –Only A can deposit, only B can withdraw Repository harvests objects from user’s drop- box
Proving the identity of the sender (Authentication I: Identity) Assymetrical encryption –Private/public keys: reverse purposes Private = used by one juridical person Public = used by many persons Digital signature –Calculate message digest –Use one of asymmetric key pair to transform If recipient’s public key, only recipient can decode (using own private key) If sender’s private key, only sender can have sent (proved by sender’s public key) –Use second of assymetric key pair to decrypt –Check message digest against message
Proving the identity of the sender (Authentication II: Non- repudiation) Certification (PKI, “XKI”) –Connecting keys with juridical persons: third party certificators –External or internal (PKI can be managed for internal business, e.g. a state) –Endurance over time: What does CA say? System permissions and activity –Data collected from system/network operations logs –Necessity for collecting as archival!
Authenticity of the object (Authentication III: Integrity) Object as open or secret: two issues –Must we disguise/encrypt the object? –Can we move it around in clear? (Cryptographic) Message Digest (MD5) –Creates single 32-digit number: “one-way hash” –Number will change with the slightest change in the object on which it was calculated –Insecure for encryption Encryption (Confidentiality) –Asymmetric (now dominant) –Symmetric (issues of exchanging keys)
Proving the identity of the receiver How is this done in the paper/physical case? –Locations –Signatures –Other signs and proofs How done in the digital case? –Digital signature –System permissions –Recorded as part of repository operations records
Documenting the actual transfer Time-stamps on the copy System logs of the underlying transmitting and receiving systems –Desktop Windows systems have system logs but they are still fairly primitive –Server logs can be exremely elaborate –Repository/digital library logs can be designed to any requirement
Verifying the transfer Quality control: compare with paper process Verifying the message digest Checking the object against the wrapper –Use metadata to make sure you have all of what was sent and in the proper format –This is the most fundamental process carried out during ingest
XML and digital signatures XML wrapper for a set of objects permits individual or multiple objects to be signed: “subtree signing” –Objects can potentially be signed by different people in workflow –Thus a born-digital XML-wrapped object may already contain several digital signatures from different sources May require verification and resigning as a single object by record-asserting entity before transfer
XML Signature 32-bit value here 32-bit value here info about key here
What is canonicalization? Two XML documents may differ in their entity structure, attribute ordering, and character encoding, because the standard doesn’t care But a valid XML document has a precise logical structure related to its DTD or schema, no matter how it looks or what order things are in Canonicalization means processing the XML file to a single standard form (as defined by W3C): see xml-c14n #Introhttp:// xml-c14n #Intro What does this mean for “authenticity”?