Presentation on theme: "Digital Immortality Dr David Holdsworth Keeping Digital Data for Ever OR."— Presentation transcript:
Digital Immortality Dr David Holdsworth http://www.leeds.ac.uk/cedars/ Keeping Digital Data for Ever OR
Digital Immortality Obsolete(?) Data 1 Things that must be kept by law 2 Things that must be destroyed by law 3 Things that we choose to keep 4 Things that we are certain can be thrown away
Digital Immortality Obsolete(?) Data 5 Things that we would like to keep if we have room 6 Things that we would like to throw away, but are not sure about 7 Things that we think we have kept but cannot find 8 Things that we have kept but now cannot decypher 9 Things that we have not kept but now wish that we had
Digital Immortality What to Keep All of 1 and 3 – 1 Things that must be kept by law – 3 Things that we choose to keep As much of 5 and 6 as is cost-effective –5 Things that we would like to keep if we have room –6 Things that we would like to throw away, but are not sure about Data discarded from 5 and 6 has the potential to be in 9 in the future –9 Things that we have not kept but now wish that we had Minimise cost per item
Digital Immortality Some Pitfalls Errors are usually not correctable Failure to index adequately puts data into category 7 –7 Things that we think we have kept but cannot find Failure to know the format puts data into category 8 –8 Things that we have kept but now cannot decypher
Digital Immortality Curl Exemplars in Digital ARchiveS Collaborative project for libraries Funded by HEFCE/JISC Oxford, Cambridge and Leeds CEDARSCEDARS Personal Involvement
Digital Immortality C A Mi L E O N Creative Archiving at Michigan and Leeds Emulating the Old on the New Collaborative project on emulation Funded by NSF/JISC Personal Involvement - contd.
Digital Immortality Challenges to digital preservation Deteriorating media – Magnetic dropout – Obsolete equipment Obsolete data formats –EBCDIC –UNICODE has established itself –Machine code software is an extreme example
Digital Immortality Challenges to digital preservation Needles in haystacks –ISBN –Meta-data Deteriorating Institutions –Where are the digital legal deposits? –.. Or even Digital Equipment Corporation Proprietary systems become obsolete –leaving data inaccessible contd
Digital Immortality Compatibility - Friend or Foe e.g. OS/z evolves from OS/360 Windows Vista evolves from 16-bit Windows 3.1 Modern machines run old software …… but faster Who keeps old versions? –Computer Museum in California –Microsoft -- ?
Digital Immortality Times Change People don’t always want to process their old data using the tools of yesteryear
Digital Immortality THIS IS GEORGE 3 MARK 8.67 ON 31DEC99 10.19.03_ TIMED OUT 10.19.33 THE SYSTEM HAS TEMPORARILY CLOSED DOWN
Digital Immortality Times Change People don’t always want to process their old data using the tools of yesteryear Need to bridge the gap between data’s origins and the time of access
Digital Immortality Use the Past to Illuminate the Future In 1987 EDCDIC was king In 2007 UNICODE is heir apparent In 2027 ……. In 2038 UNIX time_t overflows 31 bits What has survived the decades?
Digital Immortality Survival of the Abstract Character sets Bytes Unstructured Files (stream of bytes) Hierarchical file tree Associative mappings Programming languages
Digital Immortality All is not lost We can keep a byte-stream for ever The abstract data separated from the medium is technology-neutral i.e. files can be kept for ever Copies are perfect File formats do not last for ever ….. Remember WORDSTAR
Digital Immortality Non-File Objects e.g. CDs, DVDs, magnetic tapes, web sites Map each digital object into a byte- stream and then preserve Multiple files (e.g. websites) can go in a ZIP or tar archive
Digital Immortality Abstraction Identify significant properties of the object represent them in a byte stream
Digital Immortality Example -- magnetic tape Significant properties –blocks of data –tape marks –start and end of tape Representation –block -- raw bytes, preceded by 32-bit byte count –tape mark -- 4 bytes all ones –start & end -- ends of stream
Digital Immortality When to convert Conversion is inevitable a) as soon as the format becomes obsolete b) only when we want to read the data c) never - emulate the original system
Digital Immortality Convert as soon as Obsolete Copying to new technology is no longer trivial Any errors are cast in stone Digital signatures are lost Only viable when the number of different formats is small
Digital Immortality Convert when we want to read Preserve the original by simply copying onto current technology Record the format of each stored object Keep an index of all the formats held Maintain access to conversion software from the old to the current Treasure open-source conversion software
Digital Immortality Format Registries National Archives PRONOM Harvard Global Digital Format Registry OAIS ISO14721:2003 Representation Information
Digital Immortality Emulation of Yesteryear Today’s desktop machine far exceeds the mainframe of the 1970s or even 80s George3 –Emulate the George3 executive i.e. order code + system calls + peripherals BBC micro –Publicly available emulation on WWW
Digital Immortality Abstraction for Emulation of 1900 system George3 sits on 1900 instruction set plus executive calls Executive sits on 1900 instruction set plus Fancy I/O stuff George3 provides lots of embellishment of 1900 instruction set Emulate executive + 1900 instruction set
Digital Immortality Malawi Census Data Data stored on ICL magnetic tapes Rescued by using emulated ICL 1900
Digital Immortality Standards Open Archival Information System –OAIS ISO14721:2003 –Originated by Space Data Community Proprietary “standards” –Big enough to be reverse engineered e.g. MS Word –XYZ Software Ltd Open standards, e.g. RFCs
Digital Immortality Really Long-Term Look back 20 years to see how things have changed Today’s Vista is not the final scene Ensure that systems can accommodate new formats Even the standards are likely to change
Digital Immortality Domesday 1986 900th anniversary of William the Conqueror’s version BBC collects data (inc pictures) Data written on 12" LaserVision discs Discs last 100 years, but not the drives Access is via BBC Master computer That won’t last 100 years either Can we preserve it until the 1000th anniversary?
Digital Immortality Stewardship Copies of the discs are lodged with: BBC British Library National Archives (ex PRO) Abstract data held by: DH / Leeds University Longlife Data Ltd
Digital Immortality Stewardship Current archival activity stresses retention of media Retention of digital media is useless Need digital safe deposits
Digital Immortality Keeping Digital Data for Ever Dr David Holdsworth http://www.leeds.ac.uk/cedars/ Digital Immortality OR