Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.

Slides:



Advertisements
Similar presentations
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Advertisements

Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Storage management and caching in PAST, a large-scale, persistent peer- to-peer storage utility Antony Rowstron, Peter Druschel.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Freenet A Distributed Anonymous Information Storage and Retrieval System Ian Clarke Oskar Sandberg Brandon Wiley Theodore W.Hong.
Applications over P2P Structured Overlays Antonino Virgillito.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Storage Management and Caching in PAST, a large-scale, persistent peer- to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel.
1 Pastry and Past Based on slides by Peter Druschel and Gabi Kliot (CS Department, Technion) Alex Shraer.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Protecting Free Expression Online with Freenet Presented by Ho Tsz Kin I. Clarke, T. W. Hong, S. G. Miller, O. Sandberg, and B. Wiley 14/08/2003.
peer-to-peer file systems
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Object Naming & Content based Object Search 2/3/2003.
1 Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Gabi Kliot, Computer Science Department, Technion Topics.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Freenet A Distributed Anonymous Information Storage and Retrieval System I Clarke O Sandberg I Clarke O Sandberg B WileyT W Hong.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Tapestry: A Resilient Global-scale Overlay for Service Deployment Ben Y. Zhao, Ling Huang, Jeremy Stribling, Sean C. Rhea, Anthony D. Joseph, and John.
1 Freenet  Addition goals to file location: -Provide publisher anonymity, security -Resistant to attacks – a third party shouldn’t be able to deny the.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Presentation by Theodore Mao CS294-4: Peer-to-peer Systems August 27, 2003.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
Peer-to-Peer Computing CS587x Lecture Department of Computer Science Iowa State University.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Presenter: Chris Grier ECE 598nb Spring 2006.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Ian Clarke, Oskar Sandberg, Brandon Wiley,Theodore W. Hong Presented by Zhengxiang.
Freenet File sharing for a political world. Freenet: A Distributed Anonymous Information Storage and Retrieval System I. Clarke, O. Sandberg, B. Wiley,
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Josh Colvin CIS 590, Fall 2011.
Security Michael Foukarakis – 13/12/2004 A Survey of Peer-to-Peer Security Issues Dan S. Wallach Rice University,
Storage Management and Caching in PAST A Large-scale persistent peer-to-peer storage utility Presented by Albert Tannous CSE 598D: Storage Systems – Dr.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Freenet “…an adaptive peer-to-peer network application that permits the publication, replication, and retrieval of data while protecting the anonymity.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Pastry Antony Rowstron and Peter Druschel Presented By David Deschenes.
Peer to Peer Network Design Discovery and Routing algorithms
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Freenet: Anonymous Storage and Retrieval of Information
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Peer-to-Peer Networks 11 Past Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
Fabián E. Bustamante, Fall 2005 A brief introduction to Pastry Based on: A. Rowstron and P. Druschel, Pastry: Scalable, decentralized object location and.
CS 268: Lecture 22 (Peer-to-Peer Networks)
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Presentation by Theodore Mao CS294-4: Peer-to-peer Systems
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Applications (2) Outline Overlay Networks Peer-to-Peer Networks.
Presentation transcript:

Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004

Introduction P2P sharing systems are very popular In P2P, all nodes have identical capabilities and responsibilities Popular approaches are partially centralized, do not scale well, or do not provide desired anonymity Scalability of systems critical Need for decentralized, load-balancing architectures

Features desired in a P2P sharing system Decentralized architecture – no single point of failure Scalability – bandwidth and load balancing Fault tolerance – content replication Anonymity for users – posters, readers, storers Resilient against DoS attacks

Freenet provides anonymity No requester, provider information implicit in communication Presence of a file in a node does not imply authorship Popular files are replicated to improve locality Does not intend to provide permanent storage

Freenet Queries Files receive FileIDs (160- bit SHA-1 hash of “file identifier”) Queries have pseudo- unique random identifiers (QueryIDs) and hops-to-live count. Routing tables contain table of previously retrieved FileIDs and their locations Queries are routed to location with closest FileID at each stage; loops are detected with QueryID FileIDNode Address ?

Freenet Queries: Lookups and Stores Copies of the file are stored at all nodes File record for a is added to routing tables Writes perform lookup, insert file along path if no match found a e b

Freenet Properties FileID-based clustering allows for improved routing as usage increases LRU-like capacity management: rarely used files are purged from the system Random nature of FileIDs allow for diversity of information at nodes Attempts to supplant existing files will lead to real file propagation Anonymity features: File ownership assumed randomly by other nodes Minimal routing information necessary at each hop Hops-to-live count of 1 updated randomly

Freenet Problems Files that are stored in the network may not be found. Freenet does not provide reliable storage No notion of locality in routing Simulations do not involve file insertion or node discovery

PAST: Reliable Distributed Storage Customizable file persistence High availability and load balancing Efficient Routing and Storage Allocation Uses FileIDs generated from hashes like in Freenet Uses owner credentials to verify identity of authors Interface: Insert, Lookup, Reclaim

PAST Architecture FileID computed from hash of filename, owner’s public key and a random salt. Each node receives a pseudorandom NodeID, independent of the node properties. Owner specifies number k of replicas of a file to store in the system on insert. File is stored in the k nodes with NodeIDs closest to the FileID. Routing provided by Pastry.

Pastry: Routing for P2P Networks Paths with less than hops Delivery guaranteed under at most node failures Flexible proximity metric. Each node contains: Leaf set – l nodes with closest NodeIDs Routing table – set of neighbors organized by NodeIDs Neighborhood set – l closest nodes Each NodeID is paired with its network address Direct routes to neighbors and l closest NodeIDs

Pastry: Example Routing table organized by similarity to NodeID. Neighborhood set used for node addition/recovery. Queries are forwarded to a numerically closer node (by shared NodeID header, and NodeID proximity).

Pastry Routing Table 0=2 M Leaf Set Neighborhood Set

Pastry Routing Example 0=2 M ? Other nodes exist but are not shown

Pastry Node Insertion Example 0=2 M Neighborhood Set Leaf Set

Pastry Node Removal Example 0=2 M

PAST Insertions 0=2 M Insert File, FileID 3130 Owner 3130: File, Certificate 3130: File, Certificate 3130: File, Certificate fileID = Insert(name, owner-credentials, k, file) Insert File K times

PAST Insertions 0=2 M Owner k Store Receipts k Store Receipts k Store Receipts fileID = Insert(name, owner-credentials, k, file)

PAST Semantics fileID = lookup(fileID) Routed to NodeID = FileID First of k closest nodes found returns file, credentials Reclaim(fileID, owner-credentials) Same semantics as Insert Owner issues Reclaim Certificate Storing nodes issue Reclaim Receipt Changes in leaf sets will trigger changes in replica locations A new node creates “pointers” to files it should contain; migration is gradual

Load Balancing in PAST: Replica Diversion 3130 Leaf Set 3201Leaf Set

Load Balancing in PAST: File Diversion 3130 Leaf Set 3201Leaf Set Change ID by changing salt Policies for acceptance of replicas and diverted replicas, and selection of diverted replica node. Maximum ratio of file size to free space for insertion t pri, t div

Caching in PAST Highly popular files might demand more replicas than specified. Files located “far away” only need to be fetched once locally Unused disk space is allocated as cache. Caching performance degrades gradually with increased utilization Cache insertion policy similar to diversion policies.

PAST Performance: t pri comparison, t div =0.05

PAST Performance: Ratio of File Diversions

PAST Performance: Ratio of Replica Diversions

PAST Performance: Failed Insertions

PAST Performance: Cache Hits

Conclusions Content based routing improves scalability of distributed storage systems. Need for user authentication in distributed systems. Caching is crucial for system performance. Diversion allows for graceful performance degradation. Need file mutability, file search or indexing services