Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu.

Similar presentations


Presentation on theme: "CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu."— Presentation transcript:

1 CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu

2 CSPP 54001 – February 7, 2003 P2P Definition(s) A number of definitions coexist: Def 1: “A class of applications that takes advantage of resources — storage, cycles, content, human presence — available at the edges of the Internet.” Edges often turned off, without permanent IP addresses Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” Lots of other definitions that fit in between

3 CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Principles More in-depth Case Studies Gnutella CDN: P2P, Web and Akamai traffic analysis DHTs Your Role

4 CSPP 54001 – February 7, 2003 P2P Impact: Widespread adoption KaZaA – 170 millions downloads (3.5M/week) the most popular application ever! Number of users for file-sharing applications (www.slyck.com,www.slyck.com 2/4/2003) FastTrack4,114,120 iMesh1,375,199 eDonkey569,097 DirectConnect136,552 Blubster97,128 Gnutella92,678 Cvernet91,750

5 CSPP 54001 – February 7, 2003 P2P Impact (2): Huge traffic P2P generated traffic now dominates the Internet load Internet2 traffic statistics Internet2 Cornell.edu (March ’02): 60% P2P UChicago estimate (March ‘01): Gnutella control traffic about 1% of all Internet traffic.

6 CSPP 54001 – February 7, 2003 P2P Impact (3) Shows a huge pool of underutilized resources that can be put to work Seti@Home statistics Seti@Home Entropia statistics :statistics TotalLast 24 Hours Users 4,236,0902,365 Results received764M1.13M Total CPU time1.3 M years1.3 K years Floating Point Operations 2.564760e+21 4.441277e+18 (51.40 TeraFLOPs/sec)

7 CSPP 54001 – February 7, 2003 P2P Impact (4) Might force a few companies to change their business models Data copying and distribution carries zero almost cost now  this might impact copyright laws New research domain  grants and PhD theses

8 CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Applications Mechanisms More in-depth Case Studies Your Role

9 CSPP 54001 – February 7, 2003 Applications: Number crunching Examples: Seti@Home, Entropia, UnitedDevices, DistributedScience, many othersSeti@Home Approach suitable for a particular class of problems. Some characteristics (for Seti@Home): Massive parallelism Low bandwidth/computation ratio Fixed-rate data processing task Error tolerance Users do donate *real* resources What are the problems? Centralized. Does it scale? How to prevent cheating? How to extend the model to problems that are not massively parallel $1.5M / year extra consumed power

10 CSPP 54001 – February 7, 2003 Applications: File sharing The ‘killer application’ to date Too many to list them all: Napster, FastTrack (KaZaA, KazaaLite), Gnutella (LimeWire, Morpheus, BearShare), iMesh

11 CSPP 54001 – February 7, 2003 Applications: Content distribution uServ project: your own Web Server at no (or little) cost uServ Goal: Provide a system and software that users run to serve content on the web with their own machines Alternatives: Run your own webserver Hosting Services (free or paid) Other P2P applications Not just a webserver, but a webserving community: Users cooperatively improve availability (through site replication/mirroring) Users cooperatively liberate fire-walled content (through P2P proxying/relaying)

12 CSPP 54001 – February 7, 2003 Applications: Content Distribution Streaming: the user plays the data as as it arrives source Oh, I am exhausted! Client/server approach P2P approach Possible solution: If A is the first user, A gets the stream from the server Else, A can get the stream from the central site or a set of users who have already been receiving the stream

13 CSPP 54001 – February 7, 2003 Applications: Measurements Evaluate the performance of your Web site form end-user perspective (example: Porivo Networks) Multiple views on your site performance Generating Internet statistics Connectivity statistics Routing errors

14 CSPP 54001 – February 7, 2003 Measurements: The Performance “Blind Spot” Back-end Infrastructure Firewall Network Landscape Backbone ISP Consumer User 3 rd party content Web server App server Backbone Regional Network Enterprise Provider ISP Major Provider Local ISP T1 Corporate User Corporate Network Component Testing Datacenter Monitoring Database BMC Mercury Interactive Tivoli ProactiveNet HP OpenView Computer Associates Keynote Systems Mercury Interactive BMC/SiteAngel Service Metrics Datacenter Testing “Beacon” Critical to estimate end-to-end performance Last-mile “Blind Spot” Slide source: www.porivo.com

15 CSPP 54001 – February 7, 2003 Measurements: End-to-end Performance Back-end Infrastructure Firewall Network Landscape Backbone ISP Consumer User 3 rd party content Web server App server Backbone Regional Network Enterprise Provider ISP Major Provider Local ISP T1 Corporate User Corporate Network Component Testing Datacenter Monitoring End-to-end Web Performance Testing Database Slide source: www.porivo.com

16 CSPP 54001 – February 7, 2003 More applications … Instant messaging (Yahoo, AOL) Collaborative environments (Groove) Backup storage (HiveNet, OceanStore) Spam filtering Anonymous email Censorship-resistant publishing systems (Ethernity, Freenet)

17 CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Your Role

18 CSPP 54001 – February 7, 2003 Mechanisms (1) To obtain a resilient system  integrate multiple components with uncorrelated failure curves. Use data and service replication. To improve quality of service delivered  integrate multiple providers with uncorrelated demand curves (this way less over-provisioning is necessary for each of them)  Move service delivery closer to the user

19 CSPP 54001 – February 7, 2003 Mechanisms (2) To provide anonymity  use large number of independent components, “hide in the crowd” and make search impossible (or costly) Detect anomalies, generate good statistics  Use multiple views Other facts It is not clear what’s the QoS acceptable for users (sometime they are willing to accept a degradation in service quality if they get it for free) Social engineering – building communities

20 CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Gnutella Traffic analysis: P2P vs. Web vs. Akamai DHTs Your Role

21 CSPP 54001 – February 7, 2003 Gnutella Network Why analyze Gnutella network?  Large scale – up to 500k nodes, 100TB data, 10M files today  Self-organizing network  Fast growth in its early stages – more than 50 times during first half of 2001  Open architecture, simple and flexible protocol  Interesting mix of social and technical issues

22 CSPP 54001 – February 7, 2003 Gnutella protocol overview  P2P file sharing app. on top of an overlay network Nodes maintain open TCP connections Messages are broadcasted (flooded) or back-propagated  (Initial) protocol  Protocol refinements (2001 and later)  Ping messages used more efficiently, Vendor specific extensions, GWebCaches, XML searches, super-nodes (2- layer hierarchy). Broadcast (Flooding) Back- propagated Node to node MembershipPINGPONG QueryQUERYQUERY HIT File downloadGET, PUSH

23 CSPP 54001 – February 7, 2003 Gnutella search mechanism 1 2 3 4 5 6 7 A Steps: Node 2 initiates search for file A

24 CSPP 54001 – February 7, 2003 Gnutella search mechanism 1 2 3 4 5 6 7 A Steps: Node 2 initiates search for file A Sends message to all neighbors A A

25 CSPP 54001 – February 7, 2003 Gnutella search mechanism 1 2 3 4 5 6 7 A Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message A A A

26 CSPP 54001 – February 7, 2003 Gnutella search mechanism 1 2 3 4 5 6 7 Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message A:5 A A:7 A A

27 CSPP 54001 – February 7, 2003 Gnutella search mechanism 1 2 3 4 5 6 7 Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated A:5 A:7 A A

28 CSPP 54001 – February 7, 2003 Gnutella search mechanism 1 2 3 4 5 6 7 Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated A:5 A:7

29 CSPP 54001 – February 7, 2003 Gnutella search mechanism 1 2 3 4 5 6 7 Steps: Node 2 initiates search for file A Sends message to all neighbors Neighbors forward message Nodes that have file A initiate a reply message Query reply message is back- propagated File download download A

30 CSPP 54001 – February 7, 2003 Gnutella: tools for network exploration  Eavesdropper - modified node inserted into the network to log traffic.  Crawler - connects to all active nodes and uses the membership protocol to discover graph topology.  Client-server approach.

31 CSPP 54001 – February 7, 2003 Gnutella: Network size  High user interest  Users tolerate high latency, low quality results  Better resources  DSL and cable modem nodes grew from 24% to 41% over first 6 months. Explosive growth in 2001, but slowly shrinking after

32 CSPP 54001 – February 7, 2003 Gnutella: Growth Invariants  (1) Unchanged average node connectivity  3.4 links/node on average

33 CSPP 54001 – February 7, 2003 Gnutella: Growth Invariants  (1) Unchanged average node connectivity  (2) Node-to-node distance maintains similar distribution Average node-to-node distance varied only 25% while the network grew 50 times over 6 months

34 CSPP 54001 – February 7, 2003 Is Gnutella a power-law network? November 2000 Power-law networks: the number of links per node follows a power-law distribution N = L -k Examples:  The Internet,  In/out links to/from HTML pages,  Citations network,  US power grid,  Social networks. Implications: High tolerance to random node failure but low reliability when facing of an ‘intelligent’ adversary

35 CSPP 54001 – February 7, 2003 Is Gnutella a power-law network?  Later, larger networks display a bimodal distribution  Implications:  High tolerance to random node failures preserved  Increased reliability when facing an attack. May 2001

36 CSPP 54001 – February 7, 2003 Gnutella: Query distribution Similar to Web pages popularity: Zipf distribution for query popularity Significance: caching will work well

37 CSPP 54001 – February 7, 2003 Gnutella: Traffic analysis   6-8 kbps per link over all connections  Traffic structure changed over time

38 CSPP 54001 – February 7, 2003 Gnutella: Total generated traffic 1Gbps (or 330TB/month)! Note that this estimate excludes actual file transfers Q: Does it matter? Compare to 15,000TB/month estimated in US Internet backbone (Dec. 2000)

39 CSPP 54001 – February 7, 2003 Gnutella: Topology mismatch A DB C E H G F Perfect mapping! Physical links Logical (overlay) links

40 CSPP 54001 – February 7, 2003 Gnutella: Topology mismatch Inefficient mapping Link D-E needs to support six times higher traffic. A DB C E H G F

41 CSPP 54001 – February 7, 2003 Gnutella: Topology mismatch The overlay network topology doesn’t match the underlying Internet infrastructure topology!  40% of all nodes are in the 10 largest Autonomous Systems (AS)  Only 2-4% of all TCP connections link nodes within the same AS  Largely ‘random wiring’

42 CSPP 54001 – February 7, 2003 Gnutella: Free Riding  More than 25% of Gnutella clients share no files; 75% share 100 files or less  Conclusion: Gnutella has a high percentage of free riders If only a few individuals contribute to the public good, these few peers effectively act as centralized servers. Adar and Huberman (Aug ’00)

43 CSPP 54001 – February 7, 2003 Gnutella: Summary  Gnutella: self-organizing, large-scale, P2P application based on overlay network. It works!  Discovered growth invariants specific to large- scale systems that:  Help predict resource usage.  Give hints for better search and resource organization techniques.  Growth hindered by the volume of generated traffic and inefficient resource use.

44 CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Gnutella Content Distribution Systems traffic analysis: P2P vs. Web vs. Akamai DHTs Your Role

45 CSPP 54001 – February 7, 2003 Week 5: P2P Definitions Impact Uses and Examples Mechanisms More in-depth Case Studies Your Role

46 CSPP 54001 – February 7, 2003 Your Role Network administrator Lawyer Application designer/developer Entrepreneur User

47 CSPP 54001 – February 7, 2003 More information Links to reports, papers, slides used to prepare this presentation: http://www.cs.uchicago.edu/~matei/P2P/

48 CSPP 54001 – February 7, 2003 Course Outline (Subject to Change) 1. (January 9 th ) Internet design principles and protocols 2. (January 16 th ) Internetworking, transport, routing 3. (January 23 rd ) Mapping the Internet and other networks 4. (January 30 th ) Security 5. (February 6 th ) P2P technologies & applications (Matei Ripeanu) (plus midterm) 6. (February 13 th ) Optical networks (Charlie Catlett) 7. *(February 20 th ) Web and Grid Services (Steve Tuecke) 8. (February 27 th ) Network operations (Greg Jackson) 9. *(March 6 th ) Advanced applications (with guest lecturers: Terry Disz, Mike Wilde) 10. (March 13 th ) Final exam * Ian Foster is out of town.


Download ppt "CSPP 54001: Large-Scale Networked Systems Week 5: P2P Technologies and Applications Matei Ripeanu."

Similar presentations


Ads by Google