Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
TAP: A Novel Tunneling Approach for Anonymity in Structured P2P Systems Yingwu Zhu and Yiming Hu University of Cincinnati.
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Storage management and caching in PAST, a large-scale, persistent peer- to-peer storage utility Antony Rowstron, Peter Druschel.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Applications over P2P Structured Overlays Antonino Virgillito.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Pastry Partially borrowed for Gabi Kliot. Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems  Antony Rowstron.
1 Pastry and Past Based on slides by Peter Druschel and Gabi Kliot (CS Department, Technion) Alex Shraer.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
IPv6 Mobility David Bush. Correspondent Node Operation DEF: Correspondent node is any node that is trying to communicate with a mobile node. This node.
peer-to-peer file systems
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Pastry And Squirrel Presented by Eirik T. Laberg Håvard Semundseth Orri G. Pálsson.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems (Antony Rowstron and Peter Druschel) Shariq Rizvi First.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
Storage Management and Caching in PAST A Large-scale persistent peer-to-peer storage utility Presented by Albert Tannous CSE 598D: Storage Systems – Dr.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
P2p file storage and distribution Team: Brian Smith, Daniel Suskin, Dylan Nunley, Forrest Vines Mentor: Brendan Burns.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
Pastry Antony Rowstron and Peter Druschel Presented By David Deschenes.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Outline for Today’s Lecture Administrative: –Happy Thanksgiving –Sign up for demos. Objective: –Peer-to-peer file systems Mechanisms employed Issues Some.
Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Peer-to-Peer Networks 05 Pastry Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Distributed Hash Tables
Controlling the Cost of Reliability in Peer-to-Peer Overlays
Accessing nearby copies of replicated objects
PASTRY.
Peer-to-Peer Storage Systems
Applications (2) Outline Overlay Networks Peer-to-Peer Networks.
Consistent Hashing and Distributed Hash Table
Presentation transcript:

Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel (Rice Univ) Presented by: Rama Alebouyeh

Outline Goals PAST Security Storage management Caching Experimental results Notes

Goals Strong persistence by providing persistent storage for replicated read-only files High availability through replication and caching Scalability by obtaining high storage utilization via local cooperation Security by using smart cards and store receipts PAST is archival storage and content distribution utility PAST is not a replacement for traditional file systems but it assumes that traditional FSs could be used as local cache for PAST.

PAST overview PAST is built on PASTRY fileId=160 bits nodeId=128 bits fileId and nodeIds are uniformly distributed in their respective domains. fileId is computed as a secure hash (SHA-1) of the file’s name, the owner public key, and a salt. Stores the file on k PAST nodes with numerically closest nodeIds to the 128 msb of fileId.

PAST operations fileId=Insert (name, owner-credentials, k,file) k is user specified number of file replicas k replica is maintained over the life time of the file file= Lookup (fileId) Client must provide fileId Retrieve form live node closest to client Reclaim (fileId, owner-credentials) Does not guarantee deletion of all replicas Does not guarantee return from Lookup

PAST operations (2) Insert: File certificate is issued and signed by owner’s private key. File certificate contains fileId, SHA-1 of file content, k, salt, date, file meta data. File and its associate certificate will be routed to node with closest nodeId to 128 msb of fileId. On success, store receipt will be sent back to the client, other wise an error will be reported to the client

PAST operations (3) Lookup: Sends a request message with fileId as the destination As soon as request reaches a node with the file, node sends the file and its certificate and stop forwarding the request. Reclaim: Analogous to Insert Client issues a reclaim certificate

PAST Security PAST provides security by: Smart cards (node and user) File and reclaim certificates Store and reclaim receipts Randomized PASTRY routing scheme Routing table entries signed by associated nodes

Storage management The goal is to achieve high global storage utilization and graceful degradation as system reaches its maximum utilization. The Responsibilities of storage management are to: Balance the remaining free space among nodes as utilization approaches its maximum. Maintain the invariant that copies of each file are maintained by k nodes with the closest nodeId to the fileId It relies on local coordination of nodes

Replica diversion If a node A can not store a replica, it chooses node B in its leaf-set to divert the replica B shouldn’t be among the k closest node B shouldn’t already hold a directed replica A keeps a pointer to B in its table and issue a store receipt A also enters a pointer on the k+1th closest node C If B fails a replacement replica created If C fails, A installs another pointer on the current k+1 th node

File diversion goal is to balance the remaining free storage space among different portions of nodeId space When a client receives a NACK back in response of Insert operation Create another fileId with different salt Retry Insert operation Try three time

Storage management policy File acceptance policy: if S D / F N <= t S D size of file D F N node N free storage space T pri : k closest node to fileId T div : nodes that are not among k

Maintaining replicas Nodes are aware of their neighbors by PASTRY leaf-set periodically keep-alive messages When a node joins or gets back on-line it enters a pointer to replica of the file and gradually transfer files Nodes also exchange explicit keep alive messages with the node that holds their replica In high utilization nodes may ask their the two most distant nodes in their leaf-set to locate a node in their leaf-set that can store the file. In high utilization is possible that number of replicas goes below k

Caching Goal is to minimize client access latency, maximize query throughput, and balance the query load in the system Unused portion of advertised storage is used as cache Cache files can be evicted at any time Cache when a file is routed through a node as part of lookup or insert File size is smaller than a fraction (c) of the node’s current cache size Cache replacement policy is GreedyDual-Size (GD-S)

Experimental Results Two sets of data: a data set from 8 web proxy logs, another data set from file system K=5, b=4 (PASTRY), N=2250 First experiment with no diversion T pri =1, t div =0 51.1% of file insertions failed Global storage utilization only 60.8 % Results obviates the need for storage management in a system like PAST

Experimental Results T pri =0.1, T div =0.05, l=16 or 32 l=16 utilization > 94% l=32 utilization > 98% Larger leaf set increases the scope for load balancing Larger l increases cost of node arrivals/departures

Experimental Results Varying t pri lower the value of t pri less likely a large file can be stored on a node Many small files can be stored, therefore number of files stored increases as t pri decreases Utilization drops b/c large files are rejected at low utilization levels

Experimental Results Varying t div As t div is increased fewer successful files insertions but higher storage utilization

Impact of File and Replica Diversion File diversion negligible if storage utilization below 83 % Number of diverted replicas remain small even at high utilization 10 % at 80% util

Impact of caching

Notes Key lookup and directory search are needed Immutable file property and lack of directory search limit the applications of PAST File reclaim effect on performance is not measured