Presentation is loading. Please wait.

Presentation is loading. Please wait.

E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability.

Similar presentations


Presentation on theme: "E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability."— Presentation transcript:

1 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability

2 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Outline Unfamiliar Data Usability Link to Preservation OAIS Reference Model OAIS Information Model Representation Information Preservation and Virtualisation CASPAR project

3 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Unfamiliar Data E-Research/e-Infrastructures allow users to find and try to use data from many sources Some familiar sources Most available sources will be unfamiliar How can one be sure that the unfamiliar data is used correctly Garbage in – garbage out principle Various horror stories

4 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Usability Ability for the user to “do something” with the bits Preferably using software –Even better if software does not have to be specially written Better still if user does not have to guess what to do or trawl around looking for documentation Could use existing software to display and process – but how do we prevent nonsense being produced accidentally.

5 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Link to Preservation An archive is just another remote source of digitally encoded information –Preserved digital data was created some time ago – possibly a considerable time ago (decades) Digital Preservation can mean many things Simplest type is just keeping the “bits” and making sure they are available A more useful definition comes from OAIS

6 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 OAIS Reference Model ISO 14721 : Reference Model for an Open Archival Information System (OAIS). An OAIS is an archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community. Long Term Preservation: The act of maintaining information, in a correct and Independently Understandable form, over the Long Term. Long Term is long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community. Designated Community: An identified group of potential Consumers who should be able to understand a particular set of information. The Designated Community may be composed of multiple user communities. Has sufficient documentation to allow the information to be understood and used by the Designated Community without having to resort to special resources not widely available, including named individuals.

7 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Information Objects Information Object Representation Information 1+ interpreted using 1+ Data Object interpreted using Physical Object Digital Object Bit Sequence 1+ Recursion ends at KNOWLEDGEBASE (of whom?) (tacit knowledge)

8 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Representation Information The Data Object is “interpreted using” the Representation Information (RepInfo) The Reference Model is designed to ensure that an OAIS is not set the impossible task of having to provide all possible RepInfo immediately Hence: –Take account of the Designated Community and its associated Knowledge Base Note that RepInfo may itself need further RepInfo NB very important for CERTIFICATION

9 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Representation Information The Representation Information accompanying a physical object, like a moon rock, may give additional meaning –It typically is a result of some analysis of the physically observable attributes of the rock The Representation Information accompanying a digital object, or sequence of bits, is used to provide additional meaning. –It typically maps the bits into commonly recognized data types such as character, integer, and real and into groups of these data types. –It associates these with higher level meanings which can have complex inter-relationships that are also described

10 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Designated Community general English reading public educated to High School and above, with access to a Web Browser (HTML 4.0 capable) GIS data: GIS researchers - undergraduates and above, having an understanding of the concepts of Geographic data; having access to current (2005, USA) GIS tools/computer software e.g. ArcInfo (2005) Astronomer (undergraduate and above) with access to FITS software such as FITSIO, familiar with astronomical spectrographic instruments Student of Middle English with an understanding of TEI encoding and access to an XML rendering environment. –Variant 1: Cannot understand TEI –Variant 2: Cannot understand TEI and no access to XML rendering environment –Variant 3: No understanding of Middle English but does understand TEI and XML

11 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Rep.Info. Classification

12 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Structure Distinguish –formats which are used mainly for rendering – to be followed by human inspection, and –formats used for automated processing – particularly important for science data Distinguish: –Things with unknown structure – needs software proprietary software e.g. MS Word Open Source software e.g. CDF –Things with known/well described structure ASCII file, FITS file, telemetry etc –Document the format –Use description language if possible e.g. EAST, DFDL, –The EAST tools are themselves Representation Information which in due course will have to be fully defined – the closure of their Representation Nets will be the EAST standard Higher level definitions should include useful scientific objects and humanities objects

13 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Layered Model from OAIS

14 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Semantics –Meaning/ Relationships Data Dictionaries Thesauri Ontologies Semantic interoperability

15 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Time Dependent Information –Many, perhaps most, datasets change over time and the state at each particular moment in time may be important. It may be useful to break the issue into separate parts. at each moment in time we could, in principle, take a snapshot and store it. That snapshot has its associated Representation Net. efficient storage of a series of snapshots may lead one to store differences or include time tags in the data –Additional Representation Information would be needed which describes how to get to a particular time's snapshot from the efficiently encoded version. –Also applies to ANNOTATION – who said what about which and when did they say it

16 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Actions and Processes (Behaviour) Some information has, as an integral part of its content, an implicit or explicit process associated with it –An examples of this is a database or other time dependent or reactive system such as a Neural Net. Emulations –Limited – but may be adequate for rendered document-type data

17 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Sharing RepInfo RepInfo is needed RepInfo is extensive May need to “extend” RepInfo as Designated Community and/or its knowledgebase changes How can we avoid every Repository repeating the work –Need to control costs Need to share the effort

18 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Requirements Data users - need to be able to obtain pre-identified RepInfo Curators: need to be able to find suitable pre-existing RepInfo to re-use Or Create RepInfo

19 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Registry for Representation Info The Digital Object could have RepInfo packed with it, as well as CPID Support automated access & processing 1 – User gets data from archive. Data has associated Curation Persistent Identifier (CPID) 2 2 – User unfamiliar with data so requests Rep.Info.using CPID 1 3 3 – User receives Rep.Info – which has its own CPID in case it is not immediately usable

20 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Use of RepInfo CPID Structure = CPID Semantics = CPID Rendering s/w = CPID CPID Structure = CPID Semantics = CPID Rendering s/w = CPID External Registry Each “bag of bits” has an associated pointer (CPID) to a Label DCC Label – points to other RepInfo CPID copy

21 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR – EU FP6 Cultural, Artistic and Scientific knowledge for Preservation Access and retrieval Closely follows DCC Development ideas Approx 16 M Euro – 8.8M from EU 17 Partners Led by CCLRC –Co-ordinator: David Giaretta See http://www.casparpreserves.eu

22 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Consortium See http://www.casparpreserves.eu

23 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR information flow architecture Rep Info

24 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Integrated architecture See http://www.casparpreserves.eu

25 e-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 Possible Infrastructure Build-up European Preservation Infrastructure Task Force on Permanent Access Alliance Other Alliance Members CCLRC Curation Activities CASPAR Other CCLRC projects FP7 projects http://tfpa.kb.nl


Download ppt "E-IRG Open Workshop on e-Infrastructures 4-5 Oct 2006 CASPAR Project Digital Preservation and Digital interoperability."

Similar presentations


Ads by Google