Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.

Similar presentations


Presentation on theme: "1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University."— Presentation transcript:

1 1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University

2 2 Motivation Digital information already lost: –Early NASA records –U.S. Census Information –Toxic Waste records Decay Time for common media: –Magnetic Tapes:10-20 years –CD-ROM:5-50 years –Hard Drive:3-5 years Obsolescence of Digital Media is even faster

3 3 Preservation of Digital Objects Data Preservation Meaning Preservation Our work only addresses Data Preservation

4 4 A Case Study: Stanford/MIT CSTR StanfordMIT CSTR Scenario: –Need for on-line access of documents –But also for long-term archival of document

5 5 Is This a Solved Database Problem? Database systems can reliably store objects However: –Need same or compatible system –Migration is problematic Our architecture coordinates database systems, it does not replace them.

6 6 Contribution An architecture and algorithms for: –Long-term Archival Storage of Digital Objects –Allowing on-line access to Digital Objects –Preserving data as technology and organizations evolve

7 7 Key Concepts Signatures as Object Handles Deletions are not allowed Reliability is achieved through Replication Layered Architecture Awareness Everywhere Disposable Auxiliary Structures

8 8 Signatures as Object Handles Object Handles identify objects –Internal to the Digital Library Repository –Users may need high level naming facilities –Traditional approaches Signatures: –Checksum or CRC of the object f ( ) signature object

9 9 Properties of Signatures as Object handles Each site can generate handles independently Handles can be reconstructed from the object Copies automatically have same handle Objects with different content have high probability of having different handles Cannot modify objects s1 s2  s1s3  s2 s4 = s1

10 10 Signature Collisions A very rare event if signatures are 128 bits or more. Assumes uniform distribution of handles and objects bigger than signatures Collection Size Probability of having Collisions Signature Size 10 7 10 -9 76 bits 10 7 10 -24 128 bits 10 10 -18 128 bits 10 10 -57 256 bits

11 11 No Deletions Objects are never (voluntarily) deleted This simplifies many things: –Distinguishes between deleted and corrupted objects (improving reliability) –Easier recovery from failures However, it complicates others: –“Wasted” space –Version management

12 12 No Deletions No deletion rule is natural in Digital Libraries Wasted space is not critical as: –Storage cost is low –Only archiving library objects, not all possible data

13 13 Version Management “Natural” structure for versions Object O2 Object O1 Version Object

14 14 Versions How can we find the latest version? Object O1 Object O2 Version2 (latest) Version1 Version Object tuple

15 15 Sets Object O1 Object O2Member 1 Member 2 Set Object tuple

16 16 Reliability Service Long term persistence is achieved by replication Sites establish Replication Agreements to maintain copies of objects in a given Replication Group StanfordMIT

17 17 Reliability Service Initial State: StanfordMIT V Version Object V1

18 18 Reliability Service StanfordMIT V Version Object V1 Version Object

19 19 Reliability Service StanfordMIT V Version Object V1 V Version Object V1

20 20 Reliability Service StanfordMIT V Version Object V1V2 V Version Object V1V3

21 21 Reliability Service Stanford V Version Object V1V2V3 MIT V Version Object V1V2V3

22 22 Layered Architecture User Access Security and Accounting Import Metadata and Indexing Reliability Complex Object Identity Object Store Data Store

23 23 Awareness Everywhere Awareness services: standing orders, subscriptions, alerts, etc. Critical for Digital Libraries Should be part of the interface of every layer.

24 24 Disposable Auxiliary Structures Auxiliary Structures can be reconstructed from the Digital Objects –Avoid potential inconsistencies –Easier to migrate objects Example: Index of disk-ids to object handles D1D2 Identity Layer HandleD1 Index

25 25 Related Work Other Digital Library architectures Report of the task force on preserving digital information Petal and Frangipani projects COLD systems

26 26 Conclusions Architecture for long-term archiving of digital objects Allows efficient on-line access Simple, yet powerful repository built by: –using signatures as handles –not allowing deletions –having awareness services everywhere –using only disposable auxiliary structures


Download ppt "1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University."

Similar presentations


Ads by Google