Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital preservation Michael Day UKOLN, University of Bath, UK University of Bristol, MSc in Library and Information.

Similar presentations


Presentation on theme: "Digital preservation Michael Day UKOLN, University of Bath, UK University of Bristol, MSc in Library and Information."— Presentation transcript:

1 http://www.ukoln.ac.uk/ Digital preservation Michael Day UKOLN, University of Bath, UK m.day@ukoln.ac.uk University of Bristol, MSc in Library and Information Management, Unit 6A: Advanced Information Systems Bristol, 15th October 2003

2 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Session overview The digital preservation problem Preservation strategies Preservation metadata –The OAIS model Non-technical issues –collection management, legal issues, costs, … Case study: the World Wide Web Selected projects and initiatives

3 http://www.ukoln.ac.uk/ The digital preservation problem

4 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Definitions (1) Preservation: –a management function “Its objective is to ensure that information survives in usable form for as long as it is wanted” - John Feather (1991) –not primarily about: conservation or restoration backups or storage concepts of “permanence”

5 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Definitions (2) Digital preservation: –digital information is different –technical problems with ensuring continued access –but also a managerial problem “... the planning, resource allocation, and application of preservation methods and technologies to ensure that digital information of continuing value remains accessible and usable” - Margaret Hedstrom (1998)

6 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Definitions (3) Potential confusion with: –“archiving” a term used in some computing contexts for the creation of secure backup copies – “archives” a well-understood term in archives and recordkeeping professions but also used to refer to almost any collection of data –e.g., e-print archives, image archives, etc.

7 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Definitions (4) Potential confusion (continued): –“digitisation” especially where the motive for digitisation is the preservation of original items

8 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Digital information (1) An increasing flood of data... The Web –Billions of pages –Internet Archive - >300 Terabytes (and growing @ 12 Tb. per month) –The "deep-Web" Scientific data –Wellcome Trust Sanger Institute - manages several hundred Terabytes of data per year, growing exponentially –Particle physics and astronomy - e-Science projects expected to generate Petabytes of data per year (e.g., CERN's Large Hadron Collider)

9 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Digital information (2) Sizes: Kilobyte:1,000 bytes Megabyte:1,000,000 bytes Gigabyte:1 billion bytes Terabyte:1,000 Gigabytes Petabyte:1,000 Terabytes

10 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Digital information (2) Sizes: Kilobyte:1,000 bytes Megabyte:1,000,000 bytes Gigabyte:1 billion bytes Terabyte:1,000 Gigabytes Petabyte:1,000 Terabytes Exabyte:1,000 Petabytes Zettabyte:1,000 Exabytes Yottabyte:1,000 Zettabytes

11 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Digital preservation (1) Media issues: currently magnetic or optical tape and disks –e.g., CD-ROM, DVD (optical), DAT, DLT (magnetic) unknown lifetimes –but relatively short compared to paper or good quality microform –probably years rather than decades

12 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Digital preservation (2) Media issues (continued): technical solutions –longer lasting media: »e.g. Norsam's High Density Rosetta system - analogue storage on nickel plates »COM (output to good-quality microform) »Keeping paper copies! –periodic copying of data bits on to new media (refreshing)

13 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Digital preservation (3) Dependence on particular hardware and software: the heart of the digital preservation problem relatively short obsolescence cycle for: –hardware »e.g., BBC Domesday Project (1986) used a special type of videodisc player developed by Philips –software »e.g., word-processing files http://www.atsf.co.uk/dottext/domesday.html

14 http://www.ukoln.ac.uk/ Digital preservation strategies

15 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Preservation strategies –Main proposed types: technology preservation emulation migration encapsulation others...

16 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Technology preservation The preservation of an information object together with all of the hardware and software needed to interpret it –preserves the look and feel and behaviour of whole system –but will lead to museums of “ageing and incompatible computer hardware” - Mary Feeney (1999) –storage space, maintenance, costs... –may have a short-term role in the rescue of digital objects (digital archaeology)

17 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Emulation (1) The preservation of original application software and to run this on emulators that mimic the behaviour of obsolete hardware and operating systems –preserves ‘look-and-feel’ –may be useful where the digital object is complex (e.g. multimedia) or cannot easily be migrated –development of ‘virtual machines’ that would have to be migrated to work on different platforms (Jeff Rothenberg)

18 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Emulation (2) –strategy has been tested in: »Camileon project (JISC/NSF) »NEDLIB experiments (European national libraries) –requires the maintenance of a huge (and growing) amount of information about platforms and operating systems –preserves the defects embedded in original software –Hard to know whether user experience has been accurately preserved

19 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Migration (1) Managed transformations: –The periodic transfer of digital information from one hardware and software configuration to another, or from one generation of computer technology to a subsequent one - CPA/RLG report (1996) –abandons attempts to keep old technology (or substitutes) working –a linear migration strategy is used by software vendors for some data types (e.g. Microsoft Excel files)

20 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Migration (2) –Migration can often be combined with some form of standardisation (e.g., on ingest) »ASCII »bit-mapped-page images »well-defined XML formats –Migration on Request »Camileon project proposal

21 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Encapsulation Encapsulating the digital object with information on how it should be interpreted –self-describing objects –the principle underlying the OAIS reference model –can also support emulation or migration on demand strategies –examples: »Universal Preservation Format (UPF) »“Buckets” (NASA Langley Research Center)

22 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Other strategies Digital archaeology –data recovery –time consuming process (expensive) “Persistent archives” –San Diego Supercomputer Center –research funded by NSF, DARPA, NARA –comprehensive strategy based on an information management architecture –infrastructure independent representations of digital objects (tagged in XML) –tested on an e-mail collection ( Reagan Moore, et al., 2000)

23 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Mixed strategies –Preservation strategies are not in competition different strategies can work together but have implications for: –the technical infrastructure required (and metadata) –collection management priorities »e.g., encouraging the consistent use of standards (migration), the collection of software and documentation (emulation) –rights management »e.g., holding the rights to re-engineer software –costs

24 http://www.ukoln.ac.uk/ Preservation metadata

25 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Preservation metadata (1) All digital preservation strategies depend - to some extent - on the creation, capture and maintenance of metadata –"Preserving the right metadata is key to preserving digital objects" (ERPANET Briefing Paper, 2003) Defined as: –The various types data that will allow the re- creation and interpretation of the structure and content of digital data over time (Ludäsher, Marciano & Moore, 2001)

26 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Preservation metadata (2) Metadata fulfil various roles, e.g.: –"… to find, manage, control, understand or preserve … information over time" (Cunningham, 2000) –Descriptive information; technical information about formats and structure; information about provenance and context; administrative information, e.g. for rights management –Current schemas either very complex or only provide a basic framework (sometimes both!) –Perception that different strategies and objects will need different metadata

27 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Preservation metadata - standards –Developed from many different perspectives: Digital libraries: –METS, NISO Z39.87 (to support digitisation initiatives) –OCLC/RLG Framework, Cedars, NEDLIB, NLA, NLNZ –OAIS influence has been greatest in this area Records management and archival description: –Pittsburgh BAC, RKMS, NAA, VERS, PRO, EAD, etc. –Also standards not specifically developed for preservation, but with some overlap: Multimedia –MPEG-7, SMPTE, etc Rights management: –, MPEG-21, etc.

28 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 The OAIS model –Reference Model for an Open Archival Information System (OAIS) –ISO 14721:2003 –Established a common framework of terms and concepts –Influential on the design of some schemas »e.g., OCLC/RLG Metadata Framework –Identified basic functions: »Ingest, Data Management, Archival Storage, Administration, Access, Preservation Planning

29 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 OAIS functional model Administration Ingest Archival Storage Access Data Management Descriptive info. PRODUCERPRODUCER CONSUMERCONSUMER MANAGEMENT queries result sets Descriptive info. Preservation Planning orders OAIS Functional Entities (Figure 4-1) SIP DIP AIP

30 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 OAIS information objects Information Object (basic concept) –Data Object (bit-stream) –Representation Information (permits “the full interpretation of Data Object into meaningful information”) Information Object Classes –Content Information –Preservation Description Information (PDI) –Packaging Information –Descriptive Information

31 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 OAIS information packages Information package: –Container that encapsulates Content Information and PDI –Packages for submission (SIP), archival storage (AIP) and dissemination (DIP) »AIP = “... a concise way of referring to a set of information that has, in principle, all of the qualities needed for permanent, or indefinite, Long Term Preservation of a designated Information Object” –PDI = other information (metadata) “which will allow the understanding of the Content Information over an indefinite period of time” »Reference, Provenance, Context, Fixity

32 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Preservation Description Information Reference Information Provenance Information Context Information Fixity Information Preservation Description Information: The OAIS model (4) OAIS Information Package Taxonomy (Figure 4-14)

33 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Metadata schema categorisation Earliest schemas were largely conceptual in nature: –e.g. Pittsburgh BAC model, Cedars outline specification, OCLC/RLG WG I Gradually moving towards a more practical focus: –e.g., VERS, NLNZ, METS, PREMIS WG –Convergence on XML (DTDs and Schemas) But there is an urgent need for all this practical experience to be shared –e.g., published schemas, advice on implementation, etc.

34 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Sustainability issues (1) Balance risks with costs: –There is a perception that metadata creation and maintenance will be expensive –But costs associated with data recovery are not trivial –Need to balance the risks of data loss with the cost of creating metadata »Cost/benefit analysis »Robust selection criteria »Co-operation between repositories »Re-use of existing metadata

35 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Sustainability issues (2) Avoid imposing unnecessary costs: –Avoid large schemas (?) –Need to identify the right metadata - 'core metadata' (?)

36 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Metadata creation issues Created by humans or captured automatically? –Some metadata already exists, e.g.: »Embedded within objects »In separate databases »Generated by particular processes –Need for this metadata to be captured at creation, ingest, migration, and at other appropriate points in object life-cycle

37 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Interoperability issues Benefits of interoperability –Support for ingest process –To support the management of multiple formats and metadata schema within a digital preservation system »Current metadata specifications not entirely clear on how this should be done –To support the exchange of information packages outside the repository, e.g. by converting to standard 'exchange formats' »Networks of 'trusted repositories'

38 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Format and metadata registries Format registries –There is "… a pressing need to establish reliable, sustained repositories of file format specifications, documentation, and related software" (Lawrence, et al., 2000) –DSpace 'bitstream format registry' –Digital Library Federation, et al. recently proposed a Global digital format registry Metadata registries –More research into these is required

39 http://www.ukoln.ac.uk/ Non-technical issues

40 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Collection management –Selection, storage, access, "de-selection" –Issues: Preservation issues need to be considered early in an object's life-cycle (the traditional 'transfer to repository' model will not work) An important role for creators (and funding bodies) –Guidance, documentation Sharing of responsibilities –A need for collaboration Digital storage costs are cheap, so should we keep everything?

41 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Legal issues (1) Institutions need to obtain the legal rights to preserve digital objects and make them accessible: –e.g., copying, the re-engineering of software –identify and negotiate with rights holders? »but difficult to identify all rights holders... –safeguard rights –part of legal deposit? –Monitoring legislation and case law

42 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Legal issues (2) Rights holders want increasing control over content –e.g., the extension of copyright periods, licensing of access –Digital Millennium Copyright Act (US) –European Union Copyright Directive Consideration of “dark archives” - repositories without access...

43 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Costs –Still very little known about costs: no widely used economic models no clear idea of who pays? Moore’s Law (technology) –digital storage densities increase while costs decrease –not necessarily applicable to Petabytes of data from e-science projects identification of cost elements is best approach

44 http://www.ukoln.ac.uk/ Capturing and preserving the World Wide Web

45 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Web archiving (1) Four main approaches (to date): –Crawler based (for surface Web) Internet Archive Swedish Royal Library (Kulturarw 3 ) Iceland, Finland, Austria, etc. –Selective approach National Library of Australia (PANDORA) British Library pilot –Direct deposit by creators –Combined approaches Bibliothèque nationale de France

46 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Web archiving (2) –an important response to the transitory nature of the Web –existing projects more concerned with collection strategies than access or preservation –major focus on events, e.g. national elections Internet Archive Special Collections NARA (US National Archives and Records Administration) snapshots of US federal agencies and departments in 2001 The National Archives (PRO) - capture of No. 10, Downing Street site (2001); current work with Internet Archive (UK Central Government Web Archive)

47 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Web archiving (3) –limited consideration of access issues, except for: –Internet Archive (Wayback Machine) –PANDORA Archive (NLA) –Nordic Web Archive project –A look at the Wayback Machine... –http://www.archive.org/

48 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003

49 http://www.ukoln.ac.uk/ Some projects and initiatives

50 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 The Cedars project CURL Exemplars in Digital Archives: Consortium of University Research Libraries (CURL) Funded by the JISC (1998-2002) Main partners: Universities of Cambridge, Leeds and Oxford; support from UKOLN for the work on metadata Final phase produced guides to collection development, intellectual property rights issues, metadata, etc. http://www.leeds.ac.uk/cedars/

51 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Digital Preservation Coalition formed in 2001 aims to foster joint action in the UK and internationally –Dissemination (handbook, bulletin, …) –getting digital preservation on the agenda of key stakeholders –members include BL, the e-Science core programme, JISC, OCLC, the National Archives, Resource, the BBC, etc.

52 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Digital Curation Centre An initiative of the JISC and the Research Councils e-Science Core Programme $1 million (per annum, for 3 years) Key objectives (simplified), to develop: –A research programme –A centre and repository for tools and documentation –Pilot services, e.g. format registries –Advisory services, identifying best practice, etc. Initial bids currently being evaluated Deadline for full proposals = November 2003

53 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 OCLC/RLG Working Groups –Preservation Metadata - Implementation Strategies: OAIS model based Metadata Framework (2002) PREMIS Working Group http://www.oclc.org/research/pmwg/ –Digital Archive Attributes Working Group: “Trusted digital repositories: attributes and responsibilities” (May 2002) http://www.rlg.org/longterm/

54 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 NDIIPP –National Digital Information Infrastructure Preservation Program Funded by the US Congress A national planning effort led by the Library of Congress, in co-operation with representatives of other federal, research, library, and business organisations $100 million Master plan approved by Congress, December 2002 NDIIPP Programme Announcement –For projects between $500K - $3 million –Proposals due 12 November 2003

55 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003

56 http://www.ukoln.ac.uk/ Summing up

57 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Summing up: Digital preservation is a managerial as well as a technical problem Technical agenda is being developed –there is much work being undertaken into developing sustainable preservation strategies and metadata schemas Co-operation is essential –some progress, e.g. the DPC, DCC, NDIIPP Many problems remain –costs, legal issues, etc.

58 http://www.ukoln.ac.uk/ Further information

59 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003

60 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003

61 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 More information Preserving Access to Digital Information (PADI) gateway: –http://www.nla.gov.au/padi/ DPC/PADI “What’s New” bulletin: –http://www.dpconline.org/graphics/whatsnew/

62 http://www.ukoln.ac.uk/ Unit 6A: Advanced Information Systems, 15 October 2003 Acknowledgements UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath, where it is based.


Download ppt "Digital preservation Michael Day UKOLN, University of Bath, UK University of Bristol, MSc in Library and Information."

Similar presentations


Ads by Google