1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Application Layer Overlays IS250 Spring 2010 John Chuang.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
1 School of Computing Science Simon Fraser University CMPT 880: Peer-to-Peer Systems Mohamed Hefeeda 17 January 2005.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
1 School of Computing Science Simon Fraser University CMPT 880: Peer-to-Peer Systems Mohamed Hefeeda 10 January 2005.
Wide-area cooperative storage with CFS
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
1 P2P Computing. 2 What is P2P? Server-Client model.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 School of Computing Science Simon Fraser University CMPT 880: Internet Architectures and Protocols Introduction to Peer-to-Peer Systems Instructor: Dr.
Jonathan Walpole CSE515 - Distributed Computing Systems 1 Teaching Assistant for CSE515 Rahul Dubey.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
HUAWEI TECHNOLOGIES CO., LTD. Page 1 Survey of P2P Streaming HUAWEI TECHNOLOGIES CO., LTD. Ning Zong, Johnson Jiang.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Peer-to-Peer Systems: An Overview Hongyu Li. Outline  Introduction  Characteristics of P2P  Algorithms  P2P Applications  Conclusion.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
1 School of Computing Science Simon Fraser University CMPT 765/408: P2P Systems Instructor: Dr. Mohamed Hefeeda.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Peer-to-Peer Data Management
EE 122: Peer-to-Peer (P2P) Networks
CMPT 765/408: P2P Systems Instructor: Dr. Mohamed Hefeeda
Presentation transcript:

1 School of Computing Science Simon Fraser University CMPT 771/471: Overlay Networks and P2P Systems Instructor: Dr. Mohamed Hefeeda

2 P2P Computing: Definitions  Peers cooperate to achieve desired functions -Peers: End-systems (typically, user machines) Interconnected through an overlay network Peer ≡ Like the others (similar or behave in similar manner) -Cooperate: Share resources, e.g., data, CPU cycles, storage, bandwidth Participate in protocols, e.g., routing, replication, … -Functions: File-sharing, distributed computing, communications, content distribution, streaming, …  Note: the P2P concept is much wider than file sharing

3 When Did P2P Start?  Napster (Late 1990’s) -Court shut Napster down in 2001  Gnutella (2000)  Then the killer FastTrack (Kazaa,...)  Now BitTorrent, and many others  Accompanied by significant research interest  Claim -P2P is much older than Napster!  Proof -The original Internet! -Remember UUCP (unix-to-unix copy)?

4 What IS and IS NOT New in P2P?  What is not new -Concepts!  What is new -The term P2P (may be!) -New characteristics of Nodes which constitute the System that we build

5 What IS NOT New in P2P?  Distributed architectures  Distributed resource sharing  Node management (join/leave/fail)  Group communications  Distributed state management  ….

6 What IS New in P2P?  Nodes (Peers) -Quite heterogeneous Several order of magnitudes difference in resources Compare bandwidth of dial-up peer vs high-speed LAN peer -Unreliable Failure is the norm! -Offer limited capacity Load sharing and balancing are critical -Autonomous Rational, i.e., maximize their own benefits! Motivations should be provided to peers to cooperate in a way that optimizes the system performance

7 What IS New in P2P? (cont’d)  System -Scale Numerous number of peers (millions) -Structure and topology Ad-hoc: No control over peer joining/leaving Highly dynamic -Membership/participation Typically open  -More security concerns Trust, privacy, data integrity, … -Cost of building and running Small fraction of same-scale centralized systems How much would it cost to build/run a super computer with processing power of that 3 Million PCs?

8 What IS New in P2P? (cont’d)  So what?  We need to design new lighter-weight algorithms and protocols to scale to millions (or billions!) of nodes given the new characteristics  Question: why now, not two decades ago? -We did not have such abundant (and underutilized) computing resources back then! -And, network connectivity was very limited

9 Why is it Important to Study P2P?  P2P traffic is a major portion of Internet traffic (50+%), current killer app  P2P traffic has exceeded web traffic (former killer app)!  Direct implications on the design, administration, and use of computer networks and network resources -Think of ISP designers or campus network administrators  Many potential distributed applications

10 Sample P2P Applications  File sharing -BitTorrent, Overnet, eDonkey, Gnutella,, …  Distributed cycle sharing …  File and storage systems -OceanStore, CFS, Freenet, Farsite, …  Media streaming and content distribution -SopCast, CoolStreaming, … -SplitStream, CoopNet, PeerCast, Bullet, Zigzag, NICE, …

11 P2P vs. its Cousin (Grid Computing)  Common Goal: -Aggregate resources (e.g., storage, CPU cycles, and data) into common pool and provide efficient access to them  Differences along five axes -Target communities and applications -Type of shared resources -Scalability of the system -Services provided -Software required

12 P2P vs Grid Computing (cont’d) IssueGridP2P Communities and Applications  Established communities, e.g., scientific institutions  Computationally- intensive problems  Grass-root communities (anonymous)  Mostly, file- swapping Resources Shared  Powerful and Reliable machines, clusters  High-speed connectivity  Specialized instruments  PCs with limited capacity and connectivity  Unreliable  Very diverse

13 P2P vs Grid Computing (cont’d) IssueGridP2P System Scalability  Hundreds to thousands of nodes  Hundreds of thousands to Millions of nodes Services Provided  Sophisticated services: authentication, resource discovery, scheduling, access control, and membership control  Members usually trust each others  Limited services: resource discovery  limited trust among peers Software required  Sophisticated suit: e.g., Globus, Condor Simple:, e.g., BitTorrent, (screen saver)

14 P2P vs Grid Computing: Discussion  The differences mentioned are based on the traditional view of each paradigm -It is conceived that both paradigms will converge and will complement each other  Target communities and applications -Grid: is going open  Type of shared resources -P2P: is to include various and more powerful resources  Scalability of the system -Grid: is to increase number of nodes  Services provided -P2P: is to provide authentication, data integrity, trust management, …

15 P2P Systems: Simple Model P2P Substrate Operating System Hardware Middleware P2P Application Software architecture model on a peer System architecture: Peers form an overlay according to the P2P Substrate

16 Overlay Network  An abstract layer built on top of physical network  Neighbors in overlay can be several hops away in physical network

17 Overlay Network (cont’d)

18 Overlay Network (cont’d)  Why do we need overlays?  Flexibility in -Choosing neighbors -Forming and customizing topology to fit application’s needs (e.g., short delay, reliability, high BW, …) -Designing communication protocols among nodes  Get around limitations in legacy networks  Enable new (and old!) network services

19 Overlay Network (cont’d)  Overlay design issues -Select neighbors -Handle node arrivals, departures -Detect and handle failures (nodes, links) -Monitor and adapt to network dynamics -Match with the underlying physical network

20 Overlay Network (cont’d)  Some applications that use overlays -Application level multicast, e.g., ESM, Zigzag, NICE, … Build multicast tree(s) or mesh(es) in the application (not network) layer -Reliable inter-domain routing, e.g., RON Improves BGP by finding robust routes faster -Content Distribution Networks (CDN) To distribute bandwidth intensive content (software updates,…) -Peer-to-peer file sharing File exchange among peers -P2P streaming Real time streaming

21 Overlay Network (cont’d)  Example application …  Application Level Multicast (ALM)  Let us first see IP Multicast

22 Overlay Network (cont’d) Recall: IP Multicast source

23 Overlay Network (cont’d)  IP Multicast -Most efficient (packets trave rse each link only once)  What is wrong with IP Multicast? -Not enabled in many routers -Not scalable (core routers need to maintain state for multicast sessions)  Now let us see ALM …

24 Overlay Network (cont’d) Application Level Multicast (ALM) source

25 Overlay Network (cont’d)  Several algorithms have been proposed to improve the efficiency of ALM -Get it as close as possible to IP Multicast -See ESM, NICE, Zigzag papers

26 Peer Software Model  A software client installed on each peer  Three components: -P2P Substrate -Middleware -P2P Application P2P Substrate Operating System Hardware Middleware P2P Application Software model on peer

27 Peer Software Model (cont’d)  P2P Substrate (key component) -Overlay management Construction Maintenance (peer join/leave/fail and network dynamics) -Resource management Allocation (storage) Discovery (routing and lookup)  Ex: Pastry, CAN, Chord, …  More on this later

28 Peer Software Model (cont’d)  Middleware -Provides auxiliary services to P2P applications: Peer selection Trust management Data integrity validation Authentication and authorization Membership management Accounting (Economics and rationality) … -Ex: CollectCast, EigenTrust, Micro payment

29 Peer Software Model (cont’d)  P2P Application -Potentially, there could be multiple applications running on top of a single P2P substrate -Applications include File sharing File and storage systems Distributed cycle sharing Content distribution -This layer provides some functions and bookkeeping relevant to target application File assembly (file sharing) Buffering and rate smoothing (streaming)  Ex: Promise, Bullet, CFS

30 P2P Substrate  Key component, which -Manages the Overlay -Allocates and discovers objects  P2P Substrates can be -Structured -Unstructured -Based on the flexibility of placing objects at peers

31 P2P Substrates: Classification  Structured (or tightly controlled, DHT) −Objects are rigidly assigned to specific peers −Looks like as a Distributed Hash Table (DHT) −Efficient search & guarantee of finding −Lack of partial name and keyword queries −Maintenance overhead −Ex: Chord, CAN, Pastry, Tapestry, Kademila (Overnet)

32 P2P Substrates: Classification  Unstructured (or loosely controlled) −Objects can be anywhere −Support partial name and keyword queries −Inefficient search & no guarantee of finding −Some heuristics exist to enhance performance −Ex: Gnutella, Kazaa (super node), GIA [Chawathe et al. 03]

33 Structured P2P Substrates  Objects are rigidly assigned to peers −Objects and peers have IDs (usually by hashing some attributes) −Objects are assigned to peers based on IDs  Peers in overlay form specific geometrical shape, e.g., -tree, ring, hypercube, butterfly network  Shape (to some extent) determines −How neighbors are chosen, and −How messages are routed

34 Structured P2P Substrates (cont’d)  Substrate provides a Distributed Hash Table (DHT)-like interface −InsertObject (key, value), findObject (key), … −In the literature, many authors refer to structured P2P substrates as DHTs  It also provides peer management (join, leave, fail) operations  Most of these operations are done in O(log n) steps, n is number of peers

35 Structured P2P Substrates (cont’d)  DHTs: Efficient search & guarantee of finding  However, −Lack of partial name and keyword queries −Maintenance overhead, even O(log n) may be too much in very dynamic environments  Ex: Chord, CAN, Pastry, Tapestry, Kademila (Overnet)

36 Example: Content Addressable Network (CAN) [Ratnasamy 01] − Nodes form an overlay in d-dimensional space −Node IDs are chosen randomly from the d-space −Object IDs (keys) are chosen from the same d-space − Space is dynamically partitioned into zones − Each node owns a zone − Zones are split and merged as nodes join and leave − Each node stores −Portion of the hash table that belongs to its zone −Information about its immediate neighbors in the d- space

37 2-d CAN: Dynamic Space Division n1 n2 n3 n n5

38 2-d CAN: Key Assignment n1 n2 n3 n K1 K2 K3 K4 n5

39 2-d CAN: Routing (Lookup) n1 n2 n3 n K1 K2 K3 K4 n5 K4?

40 CAN: Routing − Nodes keep 2d = O(d) state information (neighbor coordinates, IPs) −Constant, does not depend on number of nodes n − Greedy routing -Route to the node that is closest to the destination -On average, is done in O(n 1/d ) = O(log n) when d = log n /2

41 CAN: Node Join − New node finds a node already in the CAN −(bootstrap: one (or a few) dedicated nodes outside the CAN maintain a partial list of active nodes) − It finds a node whose zone will be split −Choose a random point P (will be its ID) −Forward a JOIN request to P through the existing node − The node that owns P splits its zone and sends half of its routing table to the new node − Neighbors of the split zone are notified

42 CAN: Node Leave, Fail − Graceful departure −The leaving node hands over its zone to one of its neighbors − Failure −Detected by the absence of heart beat messages sent periodically in regular operation −Neighbors initiate takeover timers, proportional to the volume of their zones −Neighbor with smallest timer takes over zone of dead node −notifies other neighbors so they cancel their timers (some negotiation between neighbors may occur) −Note: the (key, value) entries stored at the failed node are lost −Nodes that insert (key, value) pairs periodically refresh (or re-insert) them

43 CAN: Discussion − Scalable −O(log n) steps for operations −State information is O(d) at each node − Locality −Nodes are neighbors in the overlay, not in the physical network −Suggestion (for better routing) −Each node measures RTT between itself and its neighbors −Forwards the request to the neighbor with maximum ratio of progress to RTT

44 CAN: Discussion − What is wrong with CAN (and DHTs in general)? − Maintenance cost −Although logarithmic in number of nodes, still too much for very dynamic P2P systems −Peers are joining and leaving all the time

45 Unstructured P2P Substrates − Objects can be anywhere  Loosely-controlled overlays − The loose control −Makes overlay tolerate transient behavior of nodes −For example, when a peer leaves, nothing needs to be done because there is no structure to restore −Enables system to support flexible search queries −Queries are sent in plain text and every node runs a mini- database engine − But, we loose on searching −Usually using flooding, inefficient −Some heuristics exist to enhance performance −No guarantee on locating a requested object (e.g., rarely requested objects) − Ex: Gnutella, Kazaa (super node), GIA [Chawathe et al. 03]

46 Example: Gnutella − Peers are called servents − All peers form an unstructured overlay − Peer join −Find an active peer already in Gnutella (e.g., contact known Gnutella hosts) −Send a Ping message through the active peer −Peers willing to accept new neighbors reply with Pong − Peer leave, fail −Just drop out of the network! − To search for a file −Send a Query message to all neighbors with a TTL (=7) −Upon receiving a Query message −Check local database and reply with a QueryHit to requester −Decrement TTL and forward to all neighbors if nonzero

Flooding in Gnutella  Scalability Problem 47

48 Heuristics for Searching [Yang and Garcia-Molina 02] − Iterative deepening −Multiple BFS with increasing TTLs −Reduce traffic but increase response time − Directed BFS −Send to “good” neighbors (subset of your neighbors that returned many results in the past)  need to keep history − Local Indices −Keep a small index over files stored on neighbors (within number of hops) −May answer queries on behalf of them −Save cost of sending queries over the network −Index currency?

49 Heuristics for Searching: Super Node − Used in recent Gnutella-like networks − Relatively powerful nodes play special role −maintain indexes over other peers Super Node (SN) Ordinary Node (ON)

50 Super Node Systems − File search −ON sends a query to its SN −SN replies with a list of IPs of ONs that have the file −SN may forward the query to other SNs − Parallel downloads take place between ONs

51 Super Node Systems − Two types of traffic −Signaling −Handshaking, connection establishment, uploading metadata, … −Over TCP connections between SN—SN and SN—ON −Content traffic −Files exchanged −Mostly through HTTP between ON—ON

52 Lessons from Deployed P2P Systems − Distributed design − Exploit heterogeneity − Load balancing − Locality in neighbor selection − Connection Shuffling −If a peer searches for a file and does not find it, it may try later and gets it! − Efficient gossiping algorithms −To learn about other SNs and perform shuffling − Consider peers behind NATs and Firewalls −They are everywhere!

53 P2P Systems: Summary  P2P is an active research area with many potential applications in industry and academia  In P2P computing paradigm: -Peers cooperate to achieve desired functions  New characteristics -heterogeneity, unreliability, rationality, scale, ad hoc -  new and lighter-weight algorithms are needed  Simple model for P2P systems: -Peers form an abstract layer called overlay -A peer software client may have three components P2P substrate, middleware, and P2P application Borders between components may be blurred

54 Summary (cont’d)  P2P substrate: A key component, which -Manages the Overlay -Allocates and discovers objects  P2P Substrates can be -Structured (DHT) Example: CAN -Unstructured Example: Gnutella