Presentation is loading. Please wait.

Presentation is loading. Please wait.

Matei Ripeanu http://www.ece.ubc.ca/~matei EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) Matei Ripeanu http://www.ece.ubc.ca/~matei.

Similar presentations


Presentation on theme: "Matei Ripeanu http://www.ece.ubc.ca/~matei EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) Matei Ripeanu http://www.ece.ubc.ca/~matei."— Presentation transcript:

1 Matei Ripeanu http://www.ece.ubc.ca/~matei
EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) Matei Ripeanu EECE 411: Design of Distributed Software Applications

2 EECE 411: Design of Distributed Software Applications
Today’s Objectives Class mechanics Understand real-world applications in terms of: Motivation and objectives Resource requirements: compute/storage/network resources Architecture (“distributed systems” part) Examples: Recent p2p applications Start thinking of computer networks from the perspective of a networked-application Why? More intuitive EECE 411: Design of Distributed Software Applications

3 EECE 411: Design of Distributed Software Applications
P2P Definition(s) Def 1: “A class of applications that takes advantage of resources — storage, cycles, content, human presence — available at the edges of the Internet.” Edges often turned on/off, without permanent IP addresses Def 2: “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.” Lots of other definitions that fit in between Lots of (P2P?) systems that fit nowhere … Def1: Emphasis: on what resources are integrated Problem: it is vague what one means by ‘edges’. A core network person would consider everything except routers and wires to be sitting on the edges of the network. Example: Def2: Emphasis: how resources are integrated architectural / organizational solution chosen to integrate resources Problem: quite restrictive: (1) most people use P2P term for a lot of applications that do not fit this definition; (2) applications –like Gnutella- that were fitting this definition are moving away from it. Vague again: “most communication is symmetric” Example: Gnutella, DHTs (CAN, Chord, Tapestry, Pastry) EECE 411: Design of Distributed Software Applications

4 P2P Impact: Widespread adoption
Skype: 560M registered users (Q2’10) 120M active, 8M paying 15M user online Number of users for file-sharing applications (estimate Sept ‘06) P2P design techniques are now mainstream! eDonkey 3,108,909 FastTrack (Kazaa) 2,114,120 Gnutella 2,899,788 Cvernet 691,750  Filetopia 3,405 EECE 411: Design of Distributed Software Applications

5 P2P Impact (2): Huge resource users
P2P generated traffic now dominates the Internet load (30-50% of the traffic) Internet2 traffic statistics Cornell.edu (March ’02): 60% P2P EECE 411: Design of Distributed Software Applications

6 EECE 411: Design of Distributed Software Applications
P2P Impact (3) – Demonstrate that volatile, small, non-proprietary resources can be efficiently harnessed Resources: CPU, storage space, But also: network bandwidth, availability, user attention and expertise Boinc statistics EECE 411: Design of Distributed Software Applications

7 P2P Impact (4) – Social / Business
Data distribution at (almost) zero almost cost Forces companies to change their business models Digital content production and distribution Telecommunications companies New collaboration models Crowd-sourcing! EECE 411: Design of Distributed Software Applications

8 EECE 411: Design of Distributed Software Applications
Roadmap Definitions Impact Applications Mechanisms A case study EECE 411: Design of Distributed Software Applications

9 Applications: Number crunching
Examples: UnitedDevices, etc Characteristics (e.g., Massive parallelism Low bandwidth/computation ratio Error tolerance Users do donate *real* resources Problems Centralized. Does it scale? Cheating! Approach suitable for a particular class of problems. How to extend the model to problems that are not massively parallel $1.5M / year extra consumed power $1.5M per year in consumed power EECE 411: Design of Distributed Software Applications

10 Applications: Content distribution (files, video)
The ‘killer application’ to date Too many to list them all: BitTorrent, FastTrack (KaZaA, KazaaLite, iMesh), Gnutella (LimeWire,BearShare) Two independent problems Distributed index Fast content download Environment: unreliable, non-cooperative EECE 411: Design of Distributed Software Applications

11 Applications: Performance evaluation
Poor online performance costs businesses $25 billion per year (Zone Research) 28% of attempted online purchases fail (BCG) Slow page download is the primary reason for transaction abandonment Business transactions are at particular risk User expectations for page download are around 4 seconds Performance evaluation & monitoring requires multiple vantage points Connectivity statistics Routing errors Evaluate Web-site performance form end-user perspective EECE 411: Design of Distributed Software Applications

12 Measurements: The Performance “Blind Spot”
Back-end Infrastructure Network Landscape Last-mile “Blind Spot” Datacenter Testing “Beacon” Web server ISP Database Backbone Enterprise Provider Firewall T1 Corporate User Corporate Network ISP App server Backbone 3rd party content Major Provider Regional Network Local ISP Component Testing Internet latency exists and for online businesses you must measure it. Forrester, IDC, Gartner Research all say that many online customers will click out of your site or off of a page if that content does not down load within 5-8 seconds. specific research bullet Specific research bullet 50-80% of the internet latency impacting your customers occurs in the “last mile.” Research and IDC. End to end web performance testing describes the measurement of your sites quality of service from your customers browser to your origin server. Porivo’s distributed technology provides an active performance testing service that delivers true end to end web performance testing. Datacenter Monitoring BMC Mercury Interactive Tivoli ProactiveNet HP OpenView Computer Associates Consumer User Keynote Systems Mercury Interactive BMC/SiteAngel Service Metrics Critical to estimate end-to-end performance EECE 411: Design of Distributed Software Applications Slide source:

13 Measurements: End-to-end Performance
Back-end Infrastructure Network Landscape Web server ISP Database Backbone Enterprise Provider Firewall T1 Corporate User Corporate Network ISP App server Backbone 3rd party content Major Provider Regional Network Local ISP Component Testing Internet latency exists and for online businesses you must measure it. Forrester, IDC, Gartner Research all say that many online customers will click out of your site or off of a page if that content does not down load within 5-8 seconds. specific research bullet Specific research bullet 50-80% of the internet latency impacting your customers occurs in the “last mile.” Research and IDC. End to end web performance testing describes the measurement of your sites quality of service from your customers browser to your origin server. Porivo’s distributed technology provides an active performance testing service that delivers true end to end web performance testing. Datacenter Monitoring Consumer User End-to-end Web Performance Testing EECE 411: Design of Distributed Software Applications Slide source: Slide source:

14 EECE 411: Design of Distributed Software Applications
More applications … Backup storage (HiveNet, OceanStore) Collaborative environments Spam filtering Anonymous Censorship-resistant publishing systems (Ethernity, Freenet) EECE 411: Design of Distributed Software Applications

15 EECE 411: Design of Distributed Software Applications
Roadmap Definitions Impact Applications Mechanisms A Case Study EECE 411: Design of Distributed Software Applications

16 EECE 411: Design of Distributed Software Applications
Mechanisms (I) To obtain a resilient system: use redundancy for data and services integrate multiple components with uncorrelated failure curves. To reduce cost and improve the QoS delivered: move service delivery closer to the user integrate multiple clients with uncorrelated demand curves (lower over-provisioning at resource providers) EECE 411: Design of Distributed Software Applications

17 Example (I): Cooperative Web serving
Other Server Origin Server Problem: Flash-crowds! dnssrv DNS Query Resolver Browser EECE 411: Design of Distributed Software Applications

18 Example (I): Cooperative Web serving
Origin Server httpprx dnssrv httpprx Fetch data from nearby DNS Redirection Return proxy, preferably one near client Cooperative Web Caching Resolver Browser akamai.cnn.com EECE 411: Design of Distributed Software Applications

19 Example (II): Server consolidation
ibm.com external site (2001) Daily fluctuations (3x) Workday cycle Weekends off M T W Th F S S Light load: concentrate load on a minimal set of servers Step down surplus servers to low-power state Activate surplus servers on demand Optimization: place workload to optimize cooling efficiency CPU idle 93w CPU max 120w boot 136w disk spin 6-10w off/hib 2-3w work watts Idling consumes 60% to 70% of peak power demand. EECE 411: Design of Distributed Software Applications

20 EECE 411: Design of Distributed Software Applications
Dynamic Provisioning Static provisioning dedicates resources Typical of “co-lo” hosting Reprovision manually as needed But load is dynamic Must overprovision for surges High variable cost of capacity Need dynamic provisioning to achieve true economies of scale Load multiplexing Tradeoff cost vs. quality Service level agreements Dynamic resource acquisition EECE 411: Design of Distributed Software Applications

21 Power Management via MUSE: IBM Trace Run (Before)
Power draw (watts) Latency (ms*50) Throughput (requests/s) 1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003) EECE 411: Design of Distributed Software Applications

22 Power Management via MUSE: IBM Trace Run (After)
1 ms MUSE: Jeff Chase et al., Duke University (SOSP 2003) EECE 411: Design of Distributed Software Applications

23 EECE 411: Design of Distributed Software Applications
Mechanisms (II) To detect anomalies, to generate good statistics: Use multiple views Example: Web-server performance characterization To provide anonymity: use large number of independent components (“hide in the crowd”) and make search impossible (or at least costly) Example: onion routing EECE 411: Design of Distributed Software Applications

24 EECE 411: Design of Distributed Software Applications
Roadmap Definitions Impact Uses and Examples Mechanisms A case study File sharing: The Gnutella Network & BitTorrent EECE 411: Design of Distributed Software Applications

25 Basic Primitives for File Sharing
Join: How do I begin participating? Publish: How do I advertise my file(s)? Search: How do I find a file? Fetch: How do I retrieve a file? Lots of different solutions for each of these four primitives. EECE 411: Design of Distributed Software Applications

26 What makes these systems interesting?
Large scale Self-organizing networks Fast growth Gnutella: more than 50x during first half of 2001; 50x again 2001 to 2006 Open architecture, simple and flexible protocols Interesting mix of social and technical issues EECE 411: Design of Distributed Software Applications

27 Gnutella search mechanism
Boston Chicago MIT UBC Beatles: Yellow Submarine Q:Beatles Calgary Gnutella nodes TCP overlay tunnels Routers Search steps: Initiates search for “Yellow Submarine” Sends message to all neighbors Neighbors forward message Initiate reply message Reply message is back-propagated File download I want to explain you briefly how Gnutella network works: Gnutella nodes set up TCP tunnels to other existing Gnutella nodes. And all messages are forwarded on this overlay. If a node at UBC is looking for a Beatles album … creates a query message and the query is “flooded” into the network. We have build tools: to extract the topology of the Gnutella overlay, and to intercept the traffic. EECE 411: Design of Distributed Software Applications

28 EECE 411: Design of Distributed Software Applications
Gnutella: Overview Join: on startup, client contacts a few other nodes; these become its “neighbors” Publish: no need Search: Flooding: pass query to neighbors, who pass the query in turn to their own neighbors, and so on... Back-propagation in case of success Fetch: get the file directly from peer (HTTP) [Note: this was the original design. Later the network moved to a two-layer structure] EECE 411: Design of Distributed Software Applications

29 EECE 411: Design of Distributed Software Applications
BitTorrent Ingredients A “seed” node that has the file A “.torrent” meta-file is built for the file A web-sever (usually) to index torrents A “tracker” node is associated with each file Identified in the .torrent File is split into fixed-size segments (e.g., 256KB) EECE 411: Design of Distributed Software Applications

30 EECE 411: Design of Distributed Software Applications
How does it work Web page with link to .torrent A B C Peer Downloader “US” [Seed] [Leech] Tracker Web Server .torrent EECE 411: Design of Distributed Software Applications

31 Overview – system components
Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Get-announce Web Server EECE 411: Design of Distributed Software Applications

32 Overview – system components
Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Response-peer list Web Server EECE 411: Design of Distributed Software Applications

33 Overview – system components
Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Shake-hand Web Server EECE 411: Design of Distributed Software Applications

34 Overview – system components
Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker pieces Web Server EECE 411: Design of Distributed Software Applications

35 Overview – system components
Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker pieces Web Server EECE 411: Design of Distributed Software Applications

36 Overview – system components
Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Get-announce Response-peer list pieces Web Server EECE 411: Design of Distributed Software Applications

37 EECE 411: Design of Distributed Software Applications
BitTorrent: Overview Join: nothing just find a server/community Publish: create ‘tracker’, spread .torrent file Search: for file: (not included in the protocol) the community is supposed to provide search tools for segments: exchange segment IDs maps with other peers. Fetch: exchange segments with other peers (HTTP) EECE 411: Design of Distributed Software Applications

38 Gnutella vs. BitTorrent: Discussion
System properties Reliability? Scalability? Fairness? Overheads? Quality of Service Search coverage for content? Ability to download content fast? Ability to survive flash crowds? The rest of this course: How to build (distributed) systems with desirable characteristics. EECE 411: Design of Distributed Software Applications

39 EECE 411: Design of Distributed Software Applications
Assignment 0 To do: Subscribe to mailing list EECE 411: Design of Distributed Software Applications

40 EECE 411: Design of Distributed Software Applications

41 Gnutella -- Network Resilience
Topology Random 30% die Targeted 4% die from Saroiu et al., MMCN 2002 EECE 411: Design of Distributed Software Applications

42 Gnutella: Query distribution
Highly heterogeneous distribution for query popularity similar to Web pages popularity  caching will work well from Kunwadee et al., 2002 EECE 411: Design of Distributed Software Applications

43 Gnutella: Topology issues (1)
56kbps Modem 10Mbps LAN 1.5Mbps DSL EECE 411: Design of Distributed Software Applications

44 Gnutella Topology Mismatch
EECE 411: Design of Distributed Software Applications


Download ppt "Matei Ripeanu http://www.ece.ubc.ca/~matei EECE 411: Design of Distributed Software Applications (or Distributed Systems 101) Matei Ripeanu http://www.ece.ubc.ca/~matei."

Similar presentations


Ads by Google