1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Storage management and caching in PAST, a large-scale, persistent peer- to-peer storage utility Antony Rowstron, Peter Druschel.
Peer-to-Peer Structured Overlay Networks
Scalable peer-to-peer substrates: A new foundation for distributed applications? Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge,
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Bowstron & Peter Druschel Presented by: Long Zhang.
Secure routing for structured peer-to-peer overlay networks M. Castro, P. Druschel, A. Ganesch, A. Rowstron, D.S. Wallach 5th Unix Symposium on Operating.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
Secure routing for structured peer-to-peer overlay networks Miguel Castro, Ayalvadi Ganesh, Antony Rowstron Microsoft Research Ltd. Peter Druschel, Dan.
Pastry Partially borrowed for Gabi Kliot. Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems  Antony Rowstron.
1 Pastry and Past Based on slides by Peter Druschel and Gabi Kliot (CS Department, Technion) Alex Shraer.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Distributed Lookup Systems
Secure routing for structured peer-to-peer overlay networks (by Castro et al.) Shariq Rizvi CS 294-4: Peer-to-Peer Systems.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (Antony Rowstron and Peter Druschel) Shariq Rizvi First.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Mobile Ad-hoc Pastry (MADPastry) Niloy Ganguly. Problem of normal DHT in MANET No co-relation between overlay logical hop and physical hop – Low bandwidth,
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Storage Management and Caching in PAST A Large-scale persistent peer-to-peer storage utility Presented by Albert Tannous CSE 598D: Storage Systems – Dr.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
DHT-based unicast for mobile ad hoc networks Thomas Zahn, Jochen Schiller Institute of Computer Science Freie Universitat Berlin 報告 : 羅世豪.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Pastry Antony Rowstron and Peter Druschel Presented By David Deschenes.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Squirrel: A decentralized peer-to- peer web cache Paper by Sitaram Iyer, Antony Rowstron and Peter Druschel (© 2002) Presentation* by Alexander Prohaska.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony I.T.
Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Peer-to-Peer Networks 05 Pastry Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
CS791Aravind Elango Maintenance-Free Global Data Storage Sean Rhea, Chris Wells, Patrick Eaten, Dennis Geels, Ben Zhao, Hakim Weatherspoon and John Kubiatowicz.
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Controlling the Cost of Reliability in Peer-to-Peer Overlays
Accessing nearby copies of replicated objects
PASTRY.
Applications (2) Outline Overlay Networks Peer-to-Peer Networks.
Presentation transcript:

1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics in Reliable Distributed Computing 21/11/2004 Partially borrowed from Peter Druschel ’ s presentation

2 Outline  Introduction  Pastry overview  PAST Overview  Storage Management  Caching  Experimental Results  Conclusion

3 “ Storage management and caching in PAST, a large-scale persistent peer-to-peer storage utility ” Antony Rowstron (Microsoft Research) Peter Druschel (Rice University) “ Pastry: scalable, decentralized object location and routing for large-scale peer-to- peer systems ” Antony Rowstron (Microsoft Research) Peter Druschel (Rice University) Sources

4 PASTRY

5 Pastry Generic p2p location and routing substrate (DHT) Self-organizing overlay network (join, departures, locality repair) Consistent hashing Lookup/insert object in < log 2 b N routing steps (expected) O(log N) per-node state Network locality heuristics Scalable, fault resilient, self-organizing, locality aware, secure

6 Pastry: API nodeId=pastryInit(Credentials, Applicaton): join local node to Pastry network route(M, X): route message M to node with nodeId numerically closest to X Application callbacks: deliver(M): deliver message M to application forwarding(M, X): message M is being forwarded towards key X newLeaf(L): report change in leaf set L to application

7 Pastry: Object distribution objId/key Consistent hashing 128 bit circular id space nodeIds (uniform random) objIds/keys (uniform random) Invariant: node with numerically closest nodeId maintains object nodeIds O

8 Pastry: Object insertion/lookup X Route(X) Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible O

9 Pastry: Routing Tradeoff O(log N) routing table size 2 b * log 2 b N + 2l O(log N) message forwarding steps

10 Pastry: Routing table (# ) L nodes in leaf set log 2 b N Rows (actually log 2 b = 128/b) 2 b columns L neighbors

Pastry: Leaf sets Each node maintains IP addresses of the nodes with the L numerically closest larger and smaller nodeIds, respectively.  routing efficiency/robustness  fault detection (keep-alive)  application-specific local coordination

12 Pastry: Routing procedure If (destination is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if (R l d exists) forward to R l d else forward to a known node (from ) that (a) shares at least as long a prefix (b) is numerically closer than this node

13 Pastry: Routing Properties log 2 b N steps O(log N) state d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1

14 Pastry: Routing Integrity of overlay: guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: No failures: < log 2 b N expected, 128/b + 1 max During failure recovery: O(N) worst case, average case much better

15 Pastry: Locality properties Assumption: scalar proximity metric e.g. ping/RTT delay, # IP hops traceroute, subnet masks a node can probe distance to any other node Proximity invariant: Each routing table entry refers to a node close to the local node (in the proximity space), among all nodes with the appropriate nodeId prefix.

16 Pastry: Geometric Routing in proximity space d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 d467c4 65a1f c d13da3 d4213f d462ba Proximity space  The proximity distance traveled by message in each routing step is exponentially increasing (entry in row l is chosen from a set of nodes of size N/2 bl )  The distance traveled by message from its source increases monotonically at each step (message takes larger and larger strides) NodeId space

17 Pastry: Locality properties Each routing step is local, but there is no guarantee of globally shortest path Nevertheless, simulations show: Expected distance traveled by a message in the proximity space is within a small constant of the minimum Among k nodes with nodeIds closest to the key, message likely to reach the node closest to the source node first

18 Pastry: Self-organization Initializing and maintaining routing tables and leaf sets Node addition Node departure (failure) The goal is to maintain all routing table entries to refer to a near node, among all live nodes with appropriate prefix

19 New node X contacts nearby node A A routes “ join ” message to X, which arrives to Z, closest to X X obtains leaf set from Z, i ’ th row for routing table from i ’ th node from A to Z X informs any nodes that need to be aware of its arrival X also improves its table locality by requesting neighborhood sets from all nodes X knows In practice: optimistic approach Pastry: Node addition

20 Pastry: Node addition X=d46a1c Route(d46a1c) d462ba d4213f d13da3 A = 65a1fc Z=d467c4 d471f1 New node: X=d46a1c

21 d467c4 65a1f c d13da3 d4213f d462ba Proximity space Pastry: Node addition New node: d46a1c d46a1c Route(d46a1c) d462ba d4213f d13da3 65a1fc d467c4 d471f1 NodeId space X is close to A, B is close to B1. Why X is close to B1? The expected distance from B to its row one entries (B1) is much larger than the expected distance from A to B (chosen from exponentially decreasing set size)

22 Node departure (failure) Leaf set repair (eager – all the time): Leaf set members exchange keep-alive messages request set from furthest live node in set Routing table repair (lazy – upon failure): get table from peers in the same row, if not found – from higher rows Neighborhood set repair (eager)

23 Pastry: Security Secure nodeId assignment Randomized routing – pick random node among all potential Byzantine fault-tolerant leaf set membership protocol

24 Pastry: Distance traveled |L|=16, 100k random queries Proximity in emulated network. Nodes paced randomly

25 Pastry: Summary Generic p2p overlay network Scalable, fault resilient, self-organizing, secure O(log N) routing steps (expected) O(log N) routing table size Network locality properties

26 PAST

27 INTRODUCTION  PAST system  Internet-based, peer-to-peer global storage utility  Characteristics:  strong persistence, high availability (by using k replicas)  scalability (due to efficient Pastry routing)  short insert and query paths  query load balancing and latency reduction (due to wide dispersion, Pastry locality and caching)  security  Composed of nodes connected to internet, each node has 128-bit nodeId  Use Pastry for efficient routing scheme  No support for mutable files, searching, directory lookup

28 INTRODUCTION  Function of nodes :  store replicas of files  initiate and route client requests to insert or retrieve files in PAST  File-related property :  Inserted files have quasi-unique fileId,  File is replicated across multiple nodes  To retrieve file, client must know fileId and decryption key (if necessary)  fileId : 160-bit computed as SHA-1 of file name, owner ’ s public key, random salt number

29 PAST Operation  Insert: fileId = Insert(name, owner-credentials, k, file) 1. fileId computed (hash code of file name, public key, etc.) 2. Request Message reaches one of k nodes closest to fileId 3. Node accepts a replica of the file, forwards message to k-1 nodes existing in leaf set 4. Once k nodes accept, ‘ack’ message with store receipt is passed to client  Lookup: file = Lookup(fileId)  Reclaim: Reclaim(fileId, owner-credentials)

30 STORAGE MANAGEMENT why?  Responsibility  Replicas of files be maintained by k nodes with nodeId closest to fileId  Balance free storage space among nodes in PAST  Conflict : K nodes having insufficient storage vs. neighbor nodes having sufficient storage  Cause of load imbalance : 3 differences  Number of files assigned to each node  Size of each inserted file  Storage capacity of each node  Resolution : Replica diversion, File diversion

31 STORAGE MANAGEMENT Replica Diversion  GOAL : balance the remaining free storage space among nodes in leaf set  Diversion steps of node A (that received insertion request but has insufficient space) 1. choose node B among nodes in leaf set except k closest, s.t. B does not already holds diverted replica 2. ask B to store a copy 3. enter an file entry in table with pointer to B 4. send store receipt as usual

32 STORAGE MANAGEMENT Replica Diversion  Policy for accepting a replica by node  Node rejects file if file_size/remaining_storage > t  Threshold t -> t pri (in primary replica), t div (in diverted replica)  Avoids unnecessary diversion when node still has space  Prefer diverting large files – minimize number of diversions  Prefer accepting primary replicas than diverted replicas

33 STORAGE MANAGEMENT File Diversion  GOAL : balancing the remaining free storage space among nodes in PAST network  When all k nodes and their leaf sets have insufficient space  Client node generate new fileId using different salt value  Repeats limit : 3 times  Fourth fail -> make smaller file size by fragmenting

34 STORAGE MANAGEMENT node strategy to maintain k replicas  In Pastry, neighboring nodes exchange keep- alive message  If period T is passed, leaf nodes  removes the failed node from leaf set  includes a live node with next closest noidId  File strategy for node joining and dropping in leaf sets  if failed node is one of k nodes for certain files (primary or diverted replica holder), re-creating replicas held by failed node  To cope with diverter failure – replicate diversion pointers  Optimization – joining node may, instead of requesting all its replicas, install a pointer to the previous replica holder in file table (like replica diversion). Than gradual migration

35 STORAGE MANAGEMENT Fragmenting and File encoding  In Reed-Solomon encoding, to increase high availability  Fragmentation:  improves equal disk utilization  improves bandwidth – parallel download  Higher latency to contact several nodes for retreaval

36 CACHING  GOAL : minimizing client access latency, maximizing query throughput, balancing query load  Create and maintain additional copies of highly popular file in “ unused ” disk space of nodes  During successful insertion and lookup, on all routed nodes  GreedyDual-Size (GD-S) policy for replacement  Applying H f (=cost(f)/size(f)) value to each cached file  File with lowest H f is replaced

37 Security in PAST  Smartcard – private/public key scheme  ensure nodeId / fileId assignment integrity  Against a malicious node  Getting store receipt – prevent fewer than k replicas  File certificate – verify the authenticity of file content  File privacy by clients encryption  Signing routing tables entries  Randomizing the routing scheme, to prevent DOS  Can not completely prevent malicious node to suppress valid entries

38 EXPERIMENTAL RESULTS Effects of Storage Management  No diversion (t pri = 1, t div = 0):  max utilization 60.8%  51.1% inserts failed - leaf set size : effect of local load balancing  Replica/file diversion (t pri = 0.1, t div =.05):  max utilization > 98%  < 1% inserts failed -Policy- Accept a file if file_size / free_space < t

39 EXPERIMENTAL RESULTS Determine Threshold Values  Insertion Statistics and Utilization as t pri varied, t div = 0.05  Insertion Statistics and Utilization as t div varied, t pri = 0.1 -Policy- Accept a file if file_size / free_space < t  As t pri increases, fewer files are successfully inserted, but higher storage utilization is achieved  The lower t pri, the less likely that large file can be stored, therefore many small files can be stored instead. Util drops, cause large files are rejected at low utilization levels  As t div increases, storage utilization improves, but fewer files are successfully inserted,

40 EXPERIMENTAL RESULTS Impact of file and replica diversion  File diversions are negligible for storage utilization below 83%  Number of replica diversions is small even at high utilization:  at 80% utilization less than 10% replicas are diverted  => The overhead imposed by replica and file diversions is small as long as utilization is below 95%

41 EXPERIMENTAL RESULTS File Insertion Failure  File insertion failures vs. storage utilization  Utilization vs. Smaller files’ failure  Failure ratio increases from 90% Utilization  Failed insertions are heavily biased towards large files

42 EXPERIMENTAL RESULTS Caching  Global cache hit ratio and average number of message hops  Dropping hit ratio : Storage Util. and file number increases, replace files in caches  hit ratio ↓ -> routing hops ↑ log = 3

43 CONCLUSION  Design and evaluation of PAST  Storage Management, Caching  Nodes and files are assigned uniformly distributed ID  Replicas of file stored at k nodes closest to fildId  Experimental results  Achieve storage utilization of 98%  Low file insertion failure ratio at high storage utilization  Effective caching achieves load balancing

44 Weakness  Does not support mutable files – read only  No searching, directory lookup  Local fault in segment of network may cause functioning node not to be able to contact outside world, since its routing table is mainly local  No direct support for anonymity or confidentiality  Breaking large node apart – is it good or bad?  Simulation is too sterile  No experimental comparison of PAST to other systems

45 Comparison to other systems

46 Comparison  PASTRY compared to Freenet and Gnutella:  Guaranteed answer in bounded number of steps, while retaining scalabilty of Freenet and self-organization of Freenet and Gnutella  PASTRY Compared to Chord  Chord makes no explicit effort to achieve good network locality  PAST compared to OceanStore  PAST has no support for mutable files, searching, directory lookup  more sophisticated storage semantics could be build on top of PAST  Pastry (and Tapestry) are similar to Plaxton:  routing based on prefixes, generalization of hypercube routing  Plaxton is not self organizing; one node associated per file, thus single point of failure

47 Comparison PAST compared to FarSite FarSite has traditional file system semantics, distributed directory service to locate content. Every node maintains partial list of live nodes, from which it chooses nodes to store replicas LAN assumptions of FarSite may not hold in a wide-area environment PAST compared to CFS CFS built on top of Chord File sharing medium, block oriented, read only Each block is stored on multiple nodes with adjacent Chord nodeIds, caching of popular blocks Increased file retrieval overhead Parallel block retrieval good for large files CFS assumes abundance of free disk space Relies on hosting multiple logical nodes in one physical Chord node, with separate ids, in order to accommodate nodes with big storage capacity => increasing query overhead

48 Comparison PAST compared to LAND Expected constant number of outgoing links in each node Constant number of pointers to each object Constant bound on distortion (stretch): accumulative route cost divided by distance cost Links choice enforces distance upper bound on each stage of the route LAND uses two tier architecture: super-nodes

49 The END