CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
An Overview of Peer-to-Peer Networking CPSC 441 (with thanks to Sami Rollins, UCSB)
Peer-to-Peer Networks as a Distribution and Publishing Model Jorn De Boever (june 14, 2007)
Application Layer Overlays IS250 Spring 2010 John Chuang.
CS 34701: Large-Scale Networked Systems Professor: Ian Foster TA: Adriana Iamnitchi
P2P technologies, PlanetLab, and their relevance to Grid work Matei Ripeanu The University of Chicago.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Copyright 2002 Ellis Horowitz A look at Peer-to-Peer File Sharing with Gnutella Prof. Ellis Horowitz November 25, 2002.
Peer-to-Peer Networking By: Peter Diggs Ken Arrant.
1 Denial-of-Service Resilience in P2P File Sharing Systems Dan Dumitriu (EPFL) Ed Knightly (Rice) Aleksandar Kuzmanovic (Northwestern) Ion Stoica (Berkeley)
P2P Network is good or bad? Sang-Hyun Park. P2P Network is good or bad? - Definition of P2P - History of P2P - Economic Impact - Benefits of P2P - Legal.
Matei Ripeanu EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) Matei Ripeanu
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
1 Client-Server versus P2P  Client-server Computing  Purpose, definition, characteristics  Relationship to the GRID  Research issues  P2P Computing.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
SESSION 9 THE INTERNET AND THE NEW INFORMATION NEW INFORMATIONTECHNOLOGYINFRASTRUCTURE.
P2P technologies, PlanetLab, and their relevance to Grid work Matei Ripeanu The University of Chicago.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Bandwidth DoS Attacks and Defenses Robert Morris Frans Kaashoek, Hari Balakrishnan, Students MIT LCS.
Analyzing Peer-to-Peer Traffic Across Large Networks Jia Wang Joint work with Subhabrata Sen AT&T Labs - Research.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
P2P File Sharing Systems
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
Gordon Kass CEO & President 919/ x26 Porivo Technologies Inc. Measuring end-to-end web performance.
P2P Architecture Case Study: Gnutella Network
1 Reading Report 4 Yin Chen 26 Feb 2004 Reference: Peer-to-Peer Architecture Case Study: Gnutella Network, Matei Ruoeanu, In Int. Conf. on Peer-to-Peer.

Peer-to-Peer Overlay Networks. Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks – Structured.
Peer-to-Peer Networking. Presentation Introduction Characteristics and Challenges of Peer-to-Peer Peer-to-Peer Applications Classification of Peer-to-Peer.
Introduction of P2P systems
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
The application of P2P technology. Team Member: LIU Chang, ZHANG jianing Presentation: LIU Chang.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Freenet File sharing for a political world. Freenet: A Distributed Anonymous Information Storage and Retrieval System I. Clarke, O. Sandberg, B. Wiley,
Peer-to-Pee Computing HP Technical Report Chin-Yi Tsai.
Mapping the Gnutella Network Presented By: Tony Young M.Math Candidate October 7th, 2004.
ECEN “Internet Protocols and Modeling”, Spring 2012 Slide 2.
03/19/02Scalab Seminar Series1 Mapping the Gnutella Network Macroscopic Properties of Large Scale P2P Systems Ramaswamy N.Vadivelu Scalab, ASU.
Peer-to-Peer Network Tzu-Wei Kuo. Outline What is Peer-to-Peer(P2P)? P2P Architecture Applications Advantages and Weaknesses Security Controversy.
FastTrack Network & Applications (KaZaA & Morpheus)
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer to Peer Computing. What is Peer-to-Peer? A model of communication where every node in the network acts alike. As opposed to the Client-Server model,
Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design Authors: Matei Ripeanu Ian Foster Adriana.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
Topologies and behavioral properties of the network Yvon Kermarrec Based on tml.
CHAPTER 3 Architectures for Distributed Systems
Peer to Peer Networking and Application
A look at Peer-to-Peer File Sharing with Gnutella
Presentation transcript:

CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu

CSPP – February 7, 2003 P2P Definition(s) A number of definitions coexist: Def 1: “A class of applications that takes advantage of resources — storage, cycles, content, human presence — available at the edges of the Internet.” Edges often turned off, without permanent IP addresses Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” Lots of other definitions that fit in between

CSPP – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Principles More in-depth Case Studies Gnutella CDN: P2P, Web and Akamai traffic analysis DHTs Your Role

CSPP – February 7, 2003 P2P Impact: Widespread adoption KaZaA – 170 millions downloads (3.5M/week) the most popular application ever! Number of users for file-sharing applications ( 2/4/2003) FastTrack4,114,120 iMesh1,375,199 eDonkey569,097 DirectConnect136,552 Blubster97,128 Gnutella92,678 Cvernet91,750

CSPP – February 7, 2003 P2P Impact (2): Huge traffic P2P generated traffic now dominates the Internet load Internet2 traffic statistics Internet2 Cornell.edu (March ’02): 60% P2P UChicago estimate (March ‘01): Gnutella control traffic about 1% of all Internet traffic.

CSPP – February 7, 2003 P2P Impact (3) Shows a huge pool of underutilized resources that can be put to work statistics Entropia statistics :statistics TotalLast 24 Hours Users 4,236,0902,365 Results received764M1.13M Total CPU time1.3 M years1.3 K years Floating Point Operations e e+18 (51.40 TeraFLOPs/sec)

CSPP – February 7, 2003 P2P Impact (4) Might force a few companies to change their business models Data copying and distribution carries zero almost cost now  this might impact copyright laws New research domain  grants and PhD theses

CSPP – February 7, 2003 Week 5: P2P Definitions Impact Applications Mechanisms More in-depth Case Studies Your Role

CSPP – February 7, 2003 Applications: Number crunching Examples: Entropia, UnitedDevices, DistributedScience, many Approach suitable for a particular class of problems. Some characteristics (for Massive parallelism Low bandwidth/computation ratio Fixed-rate data processing task Error tolerance Users do donate *real* resources What are the problems? Centralized. Does it scale? How to prevent cheating? How to extend the model to problems that are not massively parallel $1.5M / year extra consumed power

CSPP – February 7, 2003 Applications: File sharing The ‘killer application’ to date Too many to list them all: Napster, FastTrack (KaZaA, KazaaLite), Gnutella (LimeWire, Morpheus, BearShare), iMesh

CSPP – February 7, 2003 Applications: Content distribution uServ project: your own Web Server at no (or little) cost uServ Goal: Provide a system and software that users run to serve content on the web with their own machines Alternatives: Run your own webserver Hosting Services (free or paid) Other P2P applications Not just a webserver, but a webserving community: Users cooperatively improve availability (through site replication/mirroring) Users cooperatively liberate fire-walled content (through P2P proxying/relaying)

CSPP – February 7, 2003 Applications: Content Distribution Streaming: the user plays the data as as it arrives source Oh, I am exhausted! Client/server approach P2P approach Possible solution: If A is the first user, A gets the stream from the server Else, A can get the stream from the central site or a set of users who have already been receiving the stream

CSPP – February 7, 2003 Applications: Measurements Evaluate the performance of your Web site form end-user perspective (example: Porivo Networks) Multiple views on your site performance Generating Internet statistics Connectivity statistics Routing errors

CSPP – February 7, 2003 Measurements: The Performance “Blind Spot” Back-end Infrastructure Firewall Network Landscape Backbone ISP Consumer User 3 rd party content Web server App server Backbone Regional Network Enterprise Provider ISP Major Provider Local ISP T1 Corporate User Corporate Network Component Testing Datacenter Monitoring Database BMC Mercury Interactive Tivoli ProactiveNet HP OpenView Computer Associates Keynote Systems Mercury Interactive BMC/SiteAngel Service Metrics Datacenter Testing “Beacon” Critical to estimate end-to-end performance Last-mile “Blind Spot” Slide source:

CSPP – February 7, 2003 Measurements: End-to-end Performance Back-end Infrastructure Firewall Network Landscape Backbone ISP Consumer User 3 rd party content Web server App server Backbone Regional Network Enterprise Provider ISP Major Provider Local ISP T1 Corporate User Corporate Network Component Testing Datacenter Monitoring End-to-end Web Performance Testing Database Slide source:

CSPP – February 7, 2003 More applications … Instant messaging (Yahoo, AOL) Collaborative environments (Groove) Backup storage (HiveNet, OceanStore) Spam filtering Anonymous Censorship-resistant publishing systems (Ethernity, Freenet)

CSPP – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Your Role

CSPP – February 7, 2003 Mechanisms (1) To obtain a resilient system  integrate multiple components with uncorrelated failure curves. Use data and service replication. To improve quality of service delivered  integrate multiple providers with uncorrelated demand curves (this way less over-provisioning is necessary for each of them)  Move service delivery closer to the user

CSPP – February 7, 2003 Mechanisms (2) To provide anonymity  use large number of independent components, “hide in the crowd” and make search impossible (or costly) Detect anomalies, generate good statistics  Use multiple views Other facts It is not clear what’s the QoS acceptable for users (sometime they are willing to accept a degradation in service quality if they get it for free) Social engineering – building communities

CSPP – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Gnutella Traffic analysis: P2P vs. Web vs. Akamai DHTs Your Role

CSPP – February 7, 2003 Gnutella Network Why analyze Gnutella network?  Large scale – up to 500k nodes, 100TB data, 10M files today  Self-organizing network  Fast growth in its early stages – more than 50 times during first half of 2001  Open architecture, simple and flexible protocol  Interesting mix of social and technical issues

CSPP – February 7, 2003 Gnutella protocol overview  P2P file sharing app. on top of an overlay network Nodes maintain open TCP connections Messages are broadcasted (flooded) or back-propagated  (Initial) protocol  Protocol refinements (2001 and later)  Ping messages used more efficiently, Vendor specific extensions, GWebCaches, XML searches, super-nodes (2- layer hierarchy). Broadcast (Flooding) Back- propagated Node to node MembershipPINGPONG QueryQUERYQUERY HIT File downloadGET, PUSH

CSPP – February 7, 2003 Gnutella search mechanism A Steps: Node 2 initiates search for file A

CSPP – February 7, 2003 Gnutella search mechanism A Steps: Node 2 initiates search for file A Sends message to all neighbors A A

CSPP – February 7, 2003 Gnutella search mechanism A Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message A A A

CSPP – February 7, 2003 Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message A:5 A A:7 A A

CSPP – February 7, 2003 Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated A:5 A:7 A A

CSPP – February 7, 2003 Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated A:5 A:7

CSPP – February 7, 2003 Gnutella search mechanism Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated File download download A

CSPP – February 7, 2003 Gnutella: tools for network exploration  Eavesdropper - modified node inserted into the network to log traffic.  Crawler - connects to all active nodes and uses the membership protocol to discover graph topology.  Client-server approach.

CSPP – February 7, 2003 Gnutella: Network size  High user interest  Users tolerate high latency, low quality results  Better resources  DSL and cable modem nodes grew from 24% to 41% over first 6 months. Explosive growth in 2001, but slowly shrinking after

CSPP – February 7, 2003 Gnutella: Growth Invariants  (1) Unchanged average node connectivity  3.4 links/node on average

CSPP – February 7, 2003 Gnutella: Growth Invariants  (1) Unchanged average node connectivity  (2) Node-to-node distance maintains similar distribution Average node-to-node distance varied only 25% while the network grew 50 times over 6 months

CSPP – February 7, 2003 Is Gnutella a power-law network? November 2000 Power-law networks: the number of links per node follows a power-law distribution N = L -k Examples:  The Internet,  In/out links to/from HTML pages,  Citations network,  US power grid,  Social networks. Implications: High tolerance to random node failure but low reliability when facing of an ‘intelligent’ adversary

CSPP – February 7, 2003 Is Gnutella a power-law network?  Later, larger networks display a bimodal distribution  Implications:  High tolerance to random node failures preserved  Increased reliability when facing an attack. May 2001

CSPP – February 7, 2003 Gnutella: Query distribution Similar to Web pages popularity: Zipf distribution for query popularity Significance: caching will work well

CSPP – February 7, 2003 Gnutella: Traffic analysis   6-8 kbps per link over all connections  Traffic structure changed over time

CSPP – February 7, 2003 Gnutella: Total generated traffic 1Gbps (or 330TB/month)! Note that this estimate excludes actual file transfers Q: Does it matter? Compare to 15,000TB/month estimated in US Internet backbone (Dec. 2000)

CSPP – February 7, 2003 Gnutella: Topology mismatch A DB C E H G F Perfect mapping! Physical links Logical (overlay) links

CSPP – February 7, 2003 Gnutella: Topology mismatch Inefficient mapping Link D-E needs to support six times higher traffic. A DB C E H G F

CSPP – February 7, 2003 Gnutella: Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology!  40% of all nodes are in the 10 largest Autonomous Systems (AS)  Only 2-4% of all TCP connections link nodes within the same AS  Largely ‘random wiring’

CSPP – February 7, 2003 Gnutella: Free Riding  More than 25% of Gnutella clients share no files; 75% share 100 files or less  Conclusion: Gnutella has a high percentage of free riders If only a few individuals contribute to the public good, these few peers effectively act as centralized servers. Adar and Huberman (Aug ’00)

CSPP – February 7, 2003 Gnutella: Summary  Gnutella: self-organizing, large-scale, P2P application based on overlay network. It works!  Discovered growth invariants specific to large- scale systems that:  Help predict resource usage.  Give hints for better search and resource organization techniques.  Growth hindered by the volume of generated traffic and inefficient resource use.

CSPP – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Gnutella Content Distribution Systems traffic analysis: P2P vs. Web vs. Akamai DHTs Your Role

CSPP – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Your Role

CSPP – February 7, 2003 Your Role Network administrator Lawyer Application designer/developer Entrepreneur User

CSPP – February 7, 2003 More information Links to reports, papers, slides used to prepare this presentation:

CSPP – February 7, 2003 Course Outline (Subject to Change) 1. (January 9 th ) Internet design principles and protocols 2. (January 16 th ) Internetworking, transport, routing 3. (January 23 rd ) Mapping the Internet and other networks 4. (January 30 th ) Security 5. (February 6 th ) P2P technologies & applications (Matei Ripeanu) (plus midterm) 6. (February 13 th ) Optical networks (Charlie Catlett) 7. *(February 20 th ) Web and Grid Services (Steve Tuecke) 8. (February 27 th ) Network operations (Greg Jackson) 9. *(March 6 th ) Advanced applications (with guest lecturers: Terry Disz, Mike Wilde) 10. (March 13 th ) Final exam * Ian Foster is out of town.