Presentation is loading. Please wait.

Presentation is loading. Please wait.

Manfred Thaller Universität zu Köln Preserving for 2016, 2106, 3006 Or Is there a life for an object outside a digital library?

Similar presentations


Presentation on theme: "Manfred Thaller Universität zu Köln Preserving for 2016, 2106, 3006 Or Is there a life for an object outside a digital library?"— Presentation transcript:

1 Manfred Thaller Universität zu Köln manfred.thaller@uni-koeln.de Preserving for 2016, 2106, 3006 Or Is there a life for an object outside a digital library? DELOS International Summer School 2006 San Miniato 4-9 June 2006

2 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 2 A persistent object

3 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 3 A persistent object Authenticity Integrity Metadata Context Easily usable

4 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 4 A persistent object Authenticity Integrity Metadata Context Easily usable Discussable

5 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 5 A persistent object Authenticity Integrity Metadata Context Easily usable Discussable No

6 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 6 A persistent object Authenticity Integrity Metadata Context Easily usable Discussable No

7 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 7 A persistent object Authenticity Integrity Metadata Context Easily usable Discussable No

8 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 8 A persistent object Authenticity Integrity Metadata Context Easily usable Discussable No 1799 - 1821

9 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 9 Persistent until: 2016 No major breakdown of civil society. 'Library' system continues to function without serious interruption. No fundamental change in underlying technology. No major 'holes' in the relevant WWW.

10 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 10 Persistent until: 2016 Assumption therefore: Persistency is a function of the overall system.

11 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 11 Persistent until: 2106 No major breakdown of civil society. 'Library' system continues to function without serious interruption. No fundamental change in underlying technology. No major 'holes' in the relevant WWW.

12 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 12 Persistent until: 2106 No major breakdown of civil society. 'Library' changes functional assumptions without serious interruption of service. No fundamental change in underlying technology. No major 'holes' in the relevant WWW.

13 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 13 Persistent until: 2106 No major breakdown of civil society. 'Library' changes functional assumptions without serious interruption of service. Fundamental changes in underlying technology. No major 'holes' in the relevant WWW.

14 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 14 Persistent until: 2106 No major breakdown of civil society. 'Library' changes functional assumptions without serious interruption of service. Fundamental changes in underlying technology. Major 'holes' in the relevant WWW.

15 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 15 Persistent until: 2106 Assumption: Persistent storage media "around the corner". (Holographic storage, storage crystals.) Question: Can a digital object be revived in 2106, if library does not care for it after 2016?

16 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 16 Persistent until: 2106 Why not? Bit stream deterioration. Authenticity not guaranteed. Meta data get lost. Context gets lost.

17 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 17 Bit stream deterioration An Image file before ….

18 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 18 Bit stream deterioration... and after one byte is changed.

19 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 19 Bit stream deterioration... and after one byte is changed. Undetectable by software.

20 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 20 Bit stream deterioration Sketch of a technical solution. Underlying assumption: bit stream deterioration becomes less of a problem, if "files" are designed for persistency.

21 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 21 Bit stream deterioration Proposal 1: Measure File Robustness Proposed metric: A file is m / n robust, if you can change m arbitrarily selected bytes of the stored data without affecting more than n bytes of the payload bytes of the file. Background terminology: Any file format can be described as consisting of a processing dictionary (roughly: technical metadata) and a payload, which represents the information presented to the user. Proposed implementation: Apply at least one thousand / one million times random change to n randomly selected byte and get mean number of affected bytes..

22 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 22 Bit stream deterioration Proposal 2: Measure Error awareness Proposed metric: A file / file reader is n error aware, if you can change at the most n arbitrarily selected bytes of the stored data without it becoming detected during every attempt at reading. Background terminology: Any file format which has predicted lenghts for each reading operation plus some additional info on the result of the reading operation has this property to some degree. Proposed implementation: Experiment to understand the situation better.

23 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 23 Bit stream deterioration Proposal 3: Improve relevant file qualities - Hardening Proposed metric: A file is n hardened, if it contains n synchronized redundant copies of the processing dictionary. Background terminology: Two chunks of data are synchronized, if a processing environment guarantees, that they are always changed in parallel. Proposed implementation: Create TIFF / PNG writers / readers, which signal by additional tag / chunk the further copies of the processing dictionary.

24 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 24 Bit stream deterioration Proposal 4: Improve processing capabilities – Self repairing Definition: A file is self repairing, if a reader is able to recover, after discovering that internal data are missing. Example: PDF files tolerate modest distortions, as they are able to identify the beginning of major sections within the file.

25 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 25 Authenticity not guaranteed. Problem: While paper has physical properties, which can be evaluated, digital documents do not. Solution: Add digital signatures, recognisable by dedicated software, registered with suitable authority.

26 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 26 Authenticity not guaranteed. Problem: While paper has physical properties, which can be evaluated, digital documents do not. Solution: Add digital signatures, recognisable by dedicated software, registered with suitable authority. Violates assumption of change of framework.

27 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 27 Authenticity not guaranteed. Problem: While paper has physical properties, which can be evaluated, digital documents do not. Proposal: Insert fingerprint of institution (potentially individual PC) Implicitly into every file generated.

28 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 28 Authenticity not guaranteed. Binary file sealing: 1) Modify payload to provide parity within small form in byte stream. 2) Select arbitrary start address within payload. 3) Build path of parity forms.

29 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 29 Authenticity not guaranteed. Problem: While paper has physical properties, which can be evaluated, digital documents do not. Proposal: Insert fingerprint of institution (potentially individual PC) Implicitly into every file generated. Problem: Incompatible with logic of storing text as XML.

30 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 30 Meta data get lost. "Metadata" and data are stored separately in current Information system designs. Take an image data base as example.

31 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 31 Meta data get lost.

32 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 32 Meta data get lost.

33 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 33 Meta data get lost. "thumbs.db, but more so"

34 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 34 Meta data get lost. MA thesis Jan Schnasse: http://lehre.hki.uni-koeln.de/~schnasse/ediod/; schnasse@gmx.de

35 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 35 Meta data get lost. MA thesis Jan Schnasse: http://lehre.hki.uni-koeln.de/~schnasse/ediod/; schnasse@gmx.de

36 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 36 Meta data get lost. MA thesis Jan Schnasse: http://lehre.hki.uni-koeln.de/~schnasse/ediod/; schnasse@gmx.de

37 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 37 Meta data get lost. MA thesis Jan Schnasse: http://lehre.hki.uni-koeln.de/~schnasse/ediod/; schnasse@gmx.de

38 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 38 Meta data get lost. MA thesis Jan Schnasse: http://lehre.hki.uni-koeln.de/~schnasse/ediod/; schnasse@gmx.de

39 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 39 Context gets lost... More on properties of preservation aware files: Localized Definition: A file is localized, if a reader can process it without accessing a remote server. Counterexample: Virtually all XML-based standards of the DL community assume, that a program processing the file has access to a fully operational web, preferably in the structure of 2005, and / or the functioning of authorities like a URN resolution mechanism. Solution: Snapshot of refered components.

40 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 40 Context gets lost... More on properties of preservation aware files: Autonomous Definition: A file is autonomous, if a reader can process it without accessing another file. Counterexample: A PDF is usually not autonomous in the strict sense, as it assumes that font information is available. Could be discussed at length, as it comes down to the question, which resources are defined as part of the processing environment. Solution: "Discuss at length".

41 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 41 Context gets lost... More on properties of preservation aware files: Selfdocumenting Definition: A file is selfdocumenting, if it contains as part of the processing dictionary a complete set of metadata. Solution: Register appropriate tags / chunks with the TIFF / PNG authorities.

42 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 42 Context gets lost... More on properties of preservation aware files: Preservation encapsulated Definition: A file is preservation encapsulated, if it starts with a preservation header, acting as processing dictionary for a subset of the capabilities defined from "hardening" to "self documenting", and continues with a standard file of a recognized standard format. Solution: Well, if we can register URNs, why should we not be able to maintain an encapsulation format?

43 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 43 Context gets lost... More on properties of preservation aware files: Preservation encapsulated Definition: A file is preservation encapsulated, if it starts with a preservation header, acting as processing dictionary for a subset of the capabilities defined from "hardening" to "self documenting", and continues with a standard file of a recognized standard format. Solution: Well, if we can register URNs, why should we not be able to maintain an encapsulation format? Registry violates autonomy, however.

44 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 44 Group Exercise Background: (1) Assume that sometime between now and 2106 your institution gets into a serious financial crisis. ("Serious" are e.g. budget cuts to, not by, 5 % of previous year.) (2) Your digital data have to survive for 30 years on their own under extremely bad conditions, where random deletions are a certainty.

45 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 45 Group Exercise Planning for that now: (1)What would be the most sensible autonomous unit to divide your holdings into, making each unit able to survive on its own. (2) Units which are to small are disastrously redundant; units which are to big, are disastrously vulnerable.

46 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 46 Group Exercise Notice to people with IT training: The usual warnings against redundancy in data base design are related to "living" data bases, avoiding "anomalies", which can not occur in longterm storage. (If you do not understand the above, you may safely ignore it.)

47 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 47 Persistent until: 2016 No major breakdown of civil society. 'Library' system continues to function without serious interruption. No fundamental change in underlying technology. No major 'holes' in the relevant WWW.

48 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 48 Persistent until: 3006 Major breakdown of civil society. 'Library' system continues to function without serious interruption. No fundamental change in underlying technology. No major 'holes' in the relevant WWW.

49 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 49 Persistent until: 3006 Major breakdown of civil society. 'Library' system seriously interrupted. No fundamental change in underlying technology. No major 'holes' in the relevant WWW.

50 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 50 Persistent until: 3006 Major breakdown of civil society. 'Library' system seriously interrupted. 'n' fundamental changes in underlying technology. No major 'holes' in the relevant WWW.

51 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 51 Persistent until: 3006 Major breakdown of civil society. 'Library' system seriously interrupted. 'n' fundamental changes in underlying technology. WWW completely replaced by another type of connectivity

52 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 52 Persistent until: 3006 Any chance at all?

53 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 53 Persistent until: 3006 Any chance at all? No real answer, but some stuff for thinking.

54 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 54 Persistent until: 3006 Is this Information ?

55 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 55 Persistent until: 3006 Is this Information ?

56 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 56 Persistent until: 3006 Is this Information ?

57 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 57 Persistent until: 3006 Is this Information ?

58 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 58 Persistent until: 3006 1.Recognizing information 2.Technological assumptions 3.Cultural assumptions 4.Processing assumptions

59 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 59 Persistent until: 3006 1.Announcement headers introducing hierarchically into the assumptions? 2.Preservation encapsulation with different horizons?

60 DELOS Int SS2006, San Miniato, © Manfred Thaller, Universität zu Köln 60 Group Exercise (1) List all the assumptions, the material which is stored in printed form in your institution makes of the background knowledge if the reader. (2) Design an "announcement header" for such information.


Download ppt "Manfred Thaller Universität zu Köln Preserving for 2016, 2106, 3006 Or Is there a life for an object outside a digital library?"

Similar presentations


Ads by Google