Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

Similar presentations


Presentation on theme: "Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University."— Presentation transcript:

1 Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University

2 Data trading 2 Problem: Fragile Data Data: easy to create, hard to preserve Broken tapes Human deletions Going out of business

3 Data trading 3 Replication-based preservation

4 Data trading 4 Replication-based preservation

5 Data trading 5 Motivation Several systems use replication Preserve digital collections SAV, others Archival part of digital library Individual organizations cooperate Not a lot of money to spend

6 Data trading 6 Goal Reliable replication of digital collections Given that Resources are limited Sites are autonomous Not all sites are equal Traditional methods Central control Random Replicate popular Metric Reliability Not necessarily “efficiency”

7 Data trading 7 Our solution Data trading “I’ll store a copy of your collection if you’ll store a copy of mine” Sites make local decisions Who to trade with How many copies to make How much space to provide Etc.

8 Data trading 8 Trading network A series of binary, peer-to-peer trading links A D B H C E G F

9 Data trading 9 Reliability layer Archived data Architecture Users Filesystem InfoMonitor SAV Archive Archived data Internet Local archive Remote archive Reliability layer

10 Data trading 10 Overview Trading model Trading algorithm Simulating trading Simulation results

11 Data trading 11 Trading model

12 Data trading 12 Trading model Archive site: an autonomous archiving provider

13 Data trading 13 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials

14 Data trading 14 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections

15 Data trading 15 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections Archiving client: deposit and retrieve materials

16 Data trading 16 Trading model Archive site: an autonomous archiving provider Digital collection: a set of related digital materials Archival storage: stores locally and remotely owned digital collections Archiving client: deposit and retrieve materials Data reliability: probability that data is not lost

17 Data trading 17 Deeds A right to use space at another site Bookkeeping mechanism for trades Used, saved, split, or transferred Trading algorithm Sites trade deeds Sites exercise deeds to replicate collections Deed for space For use by: Library of Congress or for transfer 623 gigabytes Stanford University

18 Data trading 18 CA B Deed trading Collection 1 Collection 2 Collection 3

19 Data trading 19 C The challenge A B Collection 3 Collection 1 Collection 2Collection 1 Collection 2 Collection 3

20 Data trading 20 C The challenge A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3

21 Data trading 21 Alternative solutions Are there other ways besides trading?

22 Data trading 22 Other solutions: central control C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3

23 Data trading 23 Other solutions: client-based C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3

24 Data trading 24 Other solutions: random C A B Collection 3 Collection 1 Collection 2 Collection 1 Collection 3 Collection 2 Collection 3

25 Data trading 25 Why is trading good? High reliability Framework for replication Site autonomy Make local decisions No submission to external authority Fairness Contribute more = more reliability Must contribute resources A D B H C E G F

26 Data trading 26 Decisions facing an archive Who to trade with Providing space Advertising space Picking a number of copies Joining a cluster Coping with varying site reliabilities

27 Data trading 27 How do we evaluate policies? Trading simulator Generate scenario Simulate trading with different policies Evaluate reliability for each policy Compare each policy

28 Data trading 28 Simulation parameters Number of sites2 to 15 Site reliability0.5 to 0.8 Collections per site4 to 25 Data per collection50 Gb to 1000 Gb Space per site2x data to 7x data Replication goal2 to 15 copies Scenarios per simulation 200

29 Data trading 29 Reliability Site reliability Will a site fail? Example: 0.9 = 10% chance of failure Data reliability How safe is the data? Despite site failures Example: 320 year MTTF

30 Data trading 30 Example: trading strategy Who should we try to trade with? The most reliable sites? Sites with reliability close to ours? The sites we have traded with before? Some other policy (like random)?

31 Data trading 31 Example: trading strategy R=0.8

32 Data trading 32 Example: reliability estimates Cannot predict when a site will fail Estimate site reliabilities Past performance Reputation Components Arturo Crespo’s work How does that impact policies? Estimate error affects resulting data reliability

33 Data trading 33 Example: reliability estimates Ignore reliability

34 Data trading 34 Results How much space? How many copies? Related questions More space = more copies Result For n copies, provide n + 1 space No need for central control, lots of space

35 Data trading 35 Results Clusters of sites? Social or political clusters E.g. all universities within a particular state Is the cluster big enough? What if it isn’t? Result A few archives are sufficient E.g. 5 archives to make 3 copies Too many sites is counter-productive

36 Data trading 36 Trading clusters

37 Data trading 37 Trading strategy Goal: pick a good trading partner Strategy = order to contact remote sites Strategies Clustering: trade with previous partners Best fit: trade with the site whose free space “best fits” the collection

38 Data trading 38 Trading strategy New strategies to deal with reliability Highest reliability Lowest reliability Closest reliability Weighted strategies Weighted clustering Weighted best fit

39 Data trading 39 Current and future work Bidding versus direct trading Local site holds an auction Bids = size of local site’s deed “Deviant” sites Greedy sites Follow protocol but do not play nice Access Support searching over collections Distribute indexes via trading

40 Data trading 40 Current and future work Security Will sites actually preserve data? Will they give it to others? Can I protect sensitive information? What if I fail and lose my keys? Can I authenticate myself?

41 Data trading 41 Other parts of SAV project SAV data model Write-once objects Signature-based naming How to get objects into SAV InfoMonitor – filesystem Other inputs (Web, DBMS, etc.) Modeling archival repositories Arturo Crespo Choose best components and design

42 Data trading 42 Related work Peer-to-peer replication SAV, Intermemory, LOCKSS, OceanStore… Fault tolerant systems RAID, mirrored disks, replicated databases Caching systems (Andrew, Coda) Barter/auction based systems ContractNet Distributed resource allocation File Allocation Problem

43 Data trading 43 Conclusion Important, exciting area Preservation critical Difficult to accomplish Many decisions are ad hoc today An effective framework is needed Scientific evaluation of decisions Trading networks replicate data Model for trading networks Trading algorithm Simulation results A D B H C E G F

44 Data trading 44 For more information cooperb@stanford.edu http://www-diglib.stanford.edu/


Download ppt "Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University."

Similar presentations


Ads by Google