Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preservation Seminar 8 Jan 2007 1 CASPAR: Long term preservation of digitally encoded information David Giaretta.

Similar presentations


Presentation on theme: "Preservation Seminar 8 Jan 2007 1 CASPAR: Long term preservation of digitally encoded information David Giaretta."— Presentation transcript:

1 Preservation Seminar 8 Jan 2007 1 CASPAR: Long term preservation of digitally encoded information David Giaretta

2 Preservation Seminar 8 Jan 2007 2 CASPAR aims Produce tools and techniques to support digital preservation and make it easier to share the cost –must be relatively easy to use –must have a low “buy-in” in terms of effort required for adoption –must avoid requiring wholesale change of everyone else’s systems –must be decentralised and reproducible so that it can live on after the formal end of the CASPAR project –must be “preservable” –must be open: open source, open standards Cannot do everything but should do something broadly useful Working closely with the UK Digital Curation Centre –http://www.dcc.ac.ukhttp://www.dcc.ac.uk

3 Preservation Seminar 8 Jan 2007 3 Digital Preservation… Easy to do… …as long as you can provide money forever Easy to test claims about tools… …as long as you live a long time

4 Preservation Seminar 8 Jan 2007 4 Validation Demonstrate theoretical basis “Accelerated lifetime” tests –Changes in hardware –Changes in environment –Changes in Designated Community Demonstrate increased trustworthiness –Measured using draft Certification Standard

5 Preservation Seminar 8 Jan 2007 5 Digital Preservation Need to preserve information & knowledge – not just “the bits” –Documents, videos are rendered – simple? –Data – must be processed – in new ways - harder Need to manage knowledge to keep archives alive through time –Preservation is a process, not a one-time event –Preservation is expensive – costs need to be shared The alternative is money – endless supplies of money Open Archival Information Systems Reference Model (ISO 14721) provides a general conceptual framework ( http://public.ccsds.org/publications/archive/650x0b1.pdf)

6 Preservation Seminar 8 Jan 2007 6 Disincentives for preservation: cost Money Time Budget available If cost of preserving old information increases… Need to show that costs are contained

7 Preservation Seminar 8 Jan 2007 7 Immediate benefits of Digital Preservation: Use of Unfamiliar Data Global Cyber-Infrastructures allow users to find and try to use data from many sources –Some sources will be familiar –Most available sources will be unfamiliar How can one be sure that the unfamiliar data is used correctly Garbage in – garbage out Need to be able to deal with unfamiliar data whether it is contemporary or old (preserved)

8 Preservation Seminar 8 Jan 2007 8 OAIS Reference Model ISO 14721 : Reference Model for an Open Archival Information Systems (OAIS). http://public.ccsds.org/publications/archive/650x0b1.pdf An OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. Long Term Preservation: The act of maintaining information, in a correct and Independently Understandable form, over the Long Term. Long Term is long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community. Designated Community: An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities. Has sufficient documentation to allow the information to be understood and used by the Designated Community without having to resort to special resources not widely available, including named individuals. OASISOAI XX

9 Preservation Seminar 8 Jan 2007 9 OAIS Reference Model – Functional Model

10 Preservation Seminar 8 Jan 2007 10 OAIS Information Model Information Object Representation Information 1+ interpreted using 1+ Data Object interpreted using Physical Object Digital Object Bit Sequence 1+ Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)

11 Preservation Seminar 8 Jan 2007 11 Rep.Info. Classification

12 Preservation Seminar 8 Jan 2007 12 FITS FILE FITS STANDARD PDF STANDARD FITS JAVA s/w JAVA VM PDF s/w FITS DICTIONARY SPECIFICATION UNICODE SPECIFICATION XML SPECIFICATION

13 Preservation Seminar 8 Jan 2007 13 Representation Information The Data Object is “interpreted using” the Representation Information (RepInfo) The Reference Model is designed to ensure that an OAIS is not set the impossible task of having to provide all possible RepInfo immediately Hence: –Take account of the Designated Community and its associated Knowledge Base The amount of RepInfo is not fixed –Additional RepInfo will be needed over time

14 Preservation Seminar 8 Jan 2007 14 Early Results High level architecture for sharing cost and access to Representation Information Detailed examinations of specific datasets to understand what is really needed to keep them understandable and usable

15 Preservation Seminar 8 Jan 2007 15 Rep. Info. Use and maintenance

16 Preservation Seminar 8 Jan 2007 16 Registry for Representation Info The Digital Object could have RepInfo packed with it, as well as CPID Support automated access & processing 1 – User gets data from archive. Data has associated Curation Persistent Identifier (CPID) 2 2 – User unfamiliar with data so requests Rep.Info.using CPID 1 3 3 – User receives Rep.Info – which has its own CPID in case it is not immediately usable

17 Preservation Seminar 8 Jan 2007 17 CASPAR information flow architecture Rep Info

18 Preservation Seminar 8 Jan 2007 18 CASPAR Testbeds Three testbeds –Cultural: UNESCO –Performing Arts: INA, IRCAM –Scientific: ESA and CCLRC Complex, multi-source, multifaceted data Many common preservation & evaluation & validation issues Some specific requirements on preservation (technical, delivery, legal) –Specific user communities/ Knowledge bases Also test the OAIS model

19 Preservation Seminar 8 Jan 2007 19 Science: CCLRC example World map of ionosondes

20 Preservation Seminar 8 Jan 2007 20 Laser facility produces Binary data normally used by proprietary software Describe using EAST data description language Use in generic application (shown here) to display/process Example of use of RepInfo

21 Preservation Seminar 8 Jan 2007 21 Some Issues Difficult to derive physical quantities from data –Can be analysed in multiple ways –Raises fundamental questions about Representation Information Common automated method is proprietary –Data structure also proprietary –Paper documentation - restricted access Provenance and trust

22 Preservation Seminar 8 Jan 2007 22 ESA example GOME Global Ozone Monitoring Instrument on ERS-2

23 Preservation Seminar 8 Jan 2007 23 GOME data processing

24 Preservation Seminar 8 Jan 2007 24 GOME Level 4 product: Integration of GOME, other data and models GOME Level 3 product: Integration of time and space data GOME Level 2 product: Ozone profile at given location

25 Preservation Seminar 8 Jan 2007 25 Some Issues Provenance and Context of processed data relationship to Representation Information of raw data and Knowledge base of Designated Community

26 Preservation Seminar 8 Jan 2007 26 UNESCO examples DATA: Scanned documents and maps Aerial and close range photography (Digital photogrammetry) Monument measurements (Laser scanning) Satellite images (Remote sensing and image processing) Multi-scale digital cartography (Geographic information systems (GIS) and CAD) 3D models, virtual tours (Computer visualization) Mandatory Documentation: Identification of property Description of property Justification of inscription State of conservation and factors affecting the property Protection and Management Monitoring Documentation Contact information of responsible authorities Signature on behalf of the State Party(ies) World Heritage List

27 Preservation Seminar 8 Jan 2007 27 Performing Arts examples Examples: Score MAX/MSP patches Additional instructions Figure 2: Preservation of interactive multimedia performances Motion Analysis and Recognition Motion- Multimedia Mapping Strategy Multimedia Generation GUI (For monitor & control) Motion Capture and Processing Motions 3D motion data Multimedia output Mapping Parameters

28 Preservation Seminar 8 Jan 2007 28 Some Issues What is Preservation of “performability”? –Composer’s intention Authenticity Proprietary software and hardware Copyright Digital Rights Management

29 Preservation Seminar 8 Jan 2007 29 Shared Infrastructure Registries of Representation Information Persistent Identifier name resolvers –DOI? ARK? URL? – none are guaranteed Interfaces – support preservation and interoperability Standards – Preservation Description Information –Fixity, Provenance, Reference, Context

30 Preservation Seminar 8 Jan 2007 30 Accreditation/Certification for repositories Long-standing demand for ability to measure Trustability of digital repositories Part of OAIS “roadmap” RLG/NARA working group –Version 1.0 Audit and Certification Checklist about to be released New open workgroup to produce ISO standard for Audit and Certification –See http://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-rac to join mailing listhttp://mailman.ccsds.org/cgi-bin/mailman/listinfo/moims-rac

31 Preservation Seminar 8 Jan 2007 31 Knowledge at the heart of preservation Knowledge driven approach Knowledge management to support long-term preservation of concepts/information including: –Single, complex, on demand, interactive objects –DRM –Authenticity –Access –Storage –Designated Community – descriptions Knowledge base definition ontologies

32 Preservation Seminar 8 Jan 2007 32 Possible Infrastructure Build-up European Preservation Infrastructure Task Force on Permanent Access Alliance Other Alliance Members CCLRC Curation Activities CASPAR Other CCLRC projects FP7 projects http://tfpa.kb.nl

33 Preservation Seminar 8 Jan 2007 33 WHEN Component architecture and prototypes by month 12 Framework architecture month 18 Component integration months 24-30 Testbed implementations months 30-36 Project completion month 42

34 Preservation Seminar 8 Jan 2007 34 www.casparpreserves.eu

35 Preservation Seminar 8 Jan 2007 35 Conclusions Information and Knowledge – needs more than just storing the “bits” Understanding and being able to process the vast amount of unfamiliar data which is available is hard It is expensive –Costs must be shared So far the Open Archival Information Systems Reference Model provides conceptual framework –Many similarities can be exploited –Many subtleties need to be explored Watch this space

36 Preservation Seminar 8 Jan 2007 36 BACKUP SLIDES

37 Preservation Seminar 8 Jan 2007 37 Example RepInfo Label A Label is itself RepInfo. It provides a way to collect together in a sensible way lots of individual pieces of RepInfo

38 Preservation Seminar 8 Jan 2007 38 Re-using RepInfo Existing RepInfo can be used to build up further RepInfo –E.g. refer to existing RepInfo in labels

39 Preservation Seminar 8 Jan 2007 39 Versioning and LID Each object has a unique identifier Versions of an object share a “logical ID” (LID) Simply using the LID gives the latest version Can specify a particular version

40 Preservation Seminar 8 Jan 2007 40 Clients DCC Registry: –Web browser –Thick client (http://registry.dcc.ac.uk)http://registry.dcc.ac.uk Any Registry –Applications using API

41 Preservation Seminar 8 Jan 2007 41 GUI access to Registry

42 Preservation Seminar 8 Jan 2007 42 Classifications Many Classification Schemes Help to find RepInfo

43 Preservation Seminar 8 Jan 2007 43 Initial RepInfo Simple text –ASCII –Unicode –UTF7/8 PDF, Word(!) FITS format FITS standard dictionaries Things that are “MISSING”

44 Preservation Seminar 8 Jan 2007 44 RepInfo entry Simple command line tool

45 Preservation Seminar 8 Jan 2007 45 Creating Repinfo There are many tools which can be used to create RepInfo: –Simple text editor to create text describing the data –Complex tools to capture data description e.g. EAST (see next slides) DFDL etc –Programming languages of various sorts

46 Preservation Seminar 8 Jan 2007 46 EAST descriptions

47 Preservation Seminar 8 Jan 2007 47 Snapshot d ’écran OASIS OASIS tool for creating EAST descriptions

48 Preservation Seminar 8 Jan 2007 48 Example of EAST description

49 Preservation Seminar 8 Jan 2007 49 Using RepInfo A pointer to RepInfo can be attached to data The RepInfo can be used to –Display –Examine –Process –Re-use the data

50 Preservation Seminar 8 Jan 2007 50 Laser facility produces Binary data normally used by proprietary software Describe using EAST data description language Use in generic application (shown here) to display/process Example of use of RepInfo

51 Preservation Seminar 8 Jan 2007 51 Simple Buy-In Need to add RepInfo to your Data Objects? Does the RepInfo already exist? –Yes: get its ID and put that in a label –No: register what you have – be assigned an ID. Add more details later when needed Or others can add more details

52 Preservation Seminar 8 Jan 2007 52 Preservation Issues Given a file or a stream of bits how does one know what Representation Information is needed (this question applies to Representation Information itself as well as to the digital objects we are primarily interested in preserving and using); how does one know, for example, if this thing is in FITS format? Someone may simply “know” what it is and how to deal with it i.e. the bits are within the Knowledge Base One may be able to recognise the format by looking for various types of patterns. One may feed the bits into all available interpreters to see which accept the data as valid Other means…. The only safe way: have an associated label which points to the appropriate Representation Information –Note this does not exclude the other methods e.g. for data rescue

53 Preservation Seminar 8 Jan 2007 53 Example Label:

54 Preservation Seminar 8 Jan 2007 54 Access to Registry Send a letter? Phone? Email? Read the Web page and copy the relevant information? Software Access? –URL –Web Service –Application?

55 Preservation Seminar 8 Jan 2007 55 Registries – software access Roll-your-own?

56 Preservation Seminar 8 Jan 2007 56 Lazy person’s Registry/Repository Use existing standards –UDDI No repository –ebXML Additional advantage: helps integration with the GRID

57 Preservation Seminar 8 Jan 2007 57 Registry/Repository access Interface and protocols – JAXR “standard” Can talk to UDDI and ebXML registries FreebXML implementation –many access methods URL, Web Services, API, Etc..

58 Preservation Seminar 8 Jan 2007 58 Persistent IDs Findability –Persistent IDs DOI, URN, ARK, PURL, etc What can we rely on? Don’t put all your eggs in one basket

59 Preservation Seminar 8 Jan 2007 59 Example e1fe9271-cd48-4418-a63e-b112ebf792c7 http://foobar.zaf.org/ark:/64269/ 10.123456/ For example the ARK identifier is created by appending the string in "value" to that in the resolver of resolverType="ark".

60 Preservation Seminar 8 Jan 2007 60 Registry/ Repository (regrep) Has to be a trusted repository (of RepInfo) –Authenticity of RepInfo –Access control –Certificates/Digests : (are they trustable over the long term?) Extensibility Distributed –Share the effort Notification Service

61 Preservation Seminar 8 Jan 2007 61 Operating Registries See http://dev.dcc.ac.uk/twiki/bin/view/Main/ RegistryProcedures http://dev.dcc.ac.uk/twiki/bin/view/Main/ RegistryProcedures


Download ppt "Preservation Seminar 8 Jan 2007 1 CASPAR: Long term preservation of digitally encoded information David Giaretta."

Similar presentations


Ads by Google