Download presentation
Presentation is loading. Please wait.
1
1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University
2
2 Motivation Digital information already lost: –Early NASA records –U.S. Census Information –Toxic Waste records Decay Time for common media: –Magnetic Tapes:10-20 years –CD-ROM:5-50 years –Hard Drive:3-5 years Obsolescence of Digital Media is even faster
3
3 Preservation of Digital Objects Data Preservation Meaning Preservation Our work only addresses Data Preservation
4
4 A Case Study: Stanford/MIT CSTR StanfordMIT CSTR Scenario: –Need for on-line access of documents –But also for long-term archival of document
5
5 Is This a Solved Database Problem? Database systems can reliably store objects However: –Need same or compatible system –Migration is problematic Our architecture coordinates database systems, it does not replace them.
6
6 Contribution An architecture and algorithms for: –Long-term Archival Storage of Digital Objects –Allowing on-line access to Digital Objects –Preserving data as technology and organizations evolve
7
7 Key Concepts Signatures as Object Handles Deletions are not allowed Reliability is achieved through Replication Layered Architecture Awareness Everywhere Disposable Auxiliary Structures
8
8 Signatures as Object Handles Object Handles identify objects –Internal to the Digital Library Repository –Users may need high level naming facilities –Traditional approaches Signatures: –Checksum or CRC of the object f ( ) signature object
9
9 Properties of Signatures as Object handles Each site can generate handles independently Handles can be reconstructed from the object Copies automatically have same handle Objects with different content have high probability of having different handles Cannot modify objects s1 s2 s1s3 s2 s4 = s1
10
10 Signature Collisions A very rare event if signatures are 128 bits or more. Assumes uniform distribution of handles and objects bigger than signatures Collection Size Probability of having Collisions Signature Size 10 7 10 -9 76 bits 10 7 10 -24 128 bits 10 10 -18 128 bits 10 10 -57 256 bits
11
11 No Deletions Objects are never (voluntarily) deleted This simplifies many things: –Distinguishes between deleted and corrupted objects (improving reliability) –Easier recovery from failures However, it complicates others: –“Wasted” space –Version management
12
12 No Deletions No deletion rule is natural in Digital Libraries Wasted space is not critical as: –Storage cost is low –Only archiving library objects, not all possible data
13
13 Version Management “Natural” structure for versions Object O2 Object O1 Version Object
14
14 Versions How can we find the latest version? Object O1 Object O2 Version2 (latest) Version1 Version Object tuple
15
15 Sets Object O1 Object O2Member 1 Member 2 Set Object tuple
16
16 Reliability Service Long term persistence is achieved by replication Sites establish Replication Agreements to maintain copies of objects in a given Replication Group StanfordMIT
17
17 Reliability Service Initial State: StanfordMIT V Version Object V1
18
18 Reliability Service StanfordMIT V Version Object V1 Version Object
19
19 Reliability Service StanfordMIT V Version Object V1 V Version Object V1
20
20 Reliability Service StanfordMIT V Version Object V1V2 V Version Object V1V3
21
21 Reliability Service Stanford V Version Object V1V2V3 MIT V Version Object V1V2V3
22
22 Layered Architecture User Access Security and Accounting Import Metadata and Indexing Reliability Complex Object Identity Object Store Data Store
23
23 Awareness Everywhere Awareness services: standing orders, subscriptions, alerts, etc. Critical for Digital Libraries Should be part of the interface of every layer.
24
24 Disposable Auxiliary Structures Auxiliary Structures can be reconstructed from the Digital Objects –Avoid potential inconsistencies –Easier to migrate objects Example: Index of disk-ids to object handles D1D2 Identity Layer HandleD1 Index
25
25 Related Work Other Digital Library architectures Report of the task force on preserving digital information Petal and Frangipani projects COLD systems
26
26 Conclusions Architecture for long-term archiving of digital objects Allows efficient on-line access Simple, yet powerful repository built by: –using signatures as handles –not allowing deletions –having awareness services everywhere –using only disposable auxiliary structures
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.