Presentation on theme: "Characterizing Overlay Topologies & Dynamics in Peer-to-Peer Networks Daniel Stutzbach, Reza Rejaie University of Oregon Subhabrata Sen AT&T Labs IEEE."— Presentation transcript:
Characterizing Overlay Topologies & Dynamics in Peer-to-Peer Networks Daniel Stutzbach, Reza Rejaie University of Oregon Subhabrata Sen AT&T Labs IEEE Computer & Communications Workshop, Huntington Beach October 25 th, 2005
Slide 2/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Motivation P2P file-sharing systems are very popular in practice. Several million simultaneous users collectively. 60% of all Internet traffic [CacheLogic Research 2005] Most use an unstructured overlay. Understanding overlay properties & dynamics is important: Understanding how existing P2P systems function Developing and evaluating new systems Unstructured overlays are not well-understood. We characterized overlay topology in Gnutella because Size: one of the largest P2P systems; more than 1 million users Mature: In use for several years; older studies for comparisons Open: No reverse-engineering needed
Slide 3/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Defining the Problem Gnutella uses a two-tier overlay. Improves scalability. Ultrapeers form an unstructured mesh. Leaf peers connect to the ultrapeers. eDonkey, FastTrack are similar. Studying the overlay requires snapshots. Snapshots capture the overlay as a graph. Individual snapshots reveal graph properties. Consecutive snapshots reveal dynamics. However, capturing accurate snapshots is difficult. Top-level overlay Leaf Ultrapeer
Slide 4/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Challenges in Capturing Accurate Snapshots Snapshots are captured iteratively by a crawler. An ideal snapshot is instantaneous. But the overlay is large and rapidly changing. Captured snapshots are likely to be distorted. Previous studies captured either Complete snapshots with slow crawler => distorted Partial snapshots => less distorted, but unrepresentative Some types of analysis require the whole graph. Increasing crawler speed reduces distortion in captured snapshots.
Slide 5/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Cruiser: a Fast Gnutella Crawler Features: Distributed, highly parallelized implementation Dynamic adaptation to bandwidth & CPU constraints Cruiser is orders of magnitude faster than other P2P crawlers: Captures one million nodes in around 7 minutes 140,000 peers/min, compared to 2,500 peers/min [Saroiu 02] We investigated the effects of speed on distortion. 4% node distortion and 15% edge distortion Daniel Stutzbach and Reza Rejaie, Capturing Accurate Snapshots of the Gnutella Network, the Global Internet Symposium, March, 2005.
Slide 6/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Data Set More than 80,000 snapshots, over the past year. To examine static properties, we focus on four: To examine dynamic properties, we use slices: Each slice is 2 days of ~500 back-to-back snapshots Captured starting 10/14/04, 10/21/04, 11/25/04, 12/21/04, and 12/27/04 DateTotal NodesLeavesUltrapeersTop-level Edges 9/27/04725,120614,912110,2081,212,772 10/11/04779,535662,568116,9671,244,219 10/18/04806,948686,719120,2291,331,745 2/2/051,031,471873,130158,3451,964,121
Slide 7/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Summary of Characterizations Graph Properties Implementation heterogeneity Degree Distribution: Top-level degree distribution Ultrapeer-leaf connectivity Degree-distance correlation Reachability: Path lengths Eccentricity Small world properties Resiliency Dynamic Properties Existence of stable core: Uptime distribution Biased connectivity Properties of stable core: Largest connected component Path lengths Clustering coefficient
Slide 8/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Top-level Degree This is the degree distribution among ultrapeers. There are obvious peaks at 30 and 70 neighbors. A substantial number of ultrapeers have fewer than 30. What happened to the power-law reported by prior studies? Max 30 in most clients Max 75 in some clients Custom
Slide 9/18 CCW 2005http://mirage.cs.uoregon.edu/P2P What happened to power-law? When a crawl is slow, many short-lived peers report long-lived peers as neighbors. But those neighbors are not all present at the same time. Degree distribution from a slow crawl resembles prior results. [Ripeanu 02 ICJ]
Slide 10/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Shortest-Path Distances Distribution of distances among ultrapeers (left) 70% of distances are exactly 4 hops. Distribution of distances among all peers (right) Most distances are 5 or 6 hops. Shows the effect of the two-tier with multiple parents Despite large size, pair-wise distances are short.
Slide 11/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Small worlds arise naturally in many places. Movies actors, power grid, co-authors of papers Small world graphs have short distances, but significant clustering, compared to a similar random graph. Gnutella is a small world. Very high clustering adversely affects flooding queries. But Gnutella isnt too clustered to affect performance. Is Gnutella a Small World? Mean Distance Clustering Coefficient Gnutella Random
Slide 12/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Resiliency to Node Failure Ratio of connected peers after node failure. The Gnutella topology is extremely resilient to random node failure. Its resilient even when the highest-degree nodes are removed. Complex algorithms are not necessary to achieve resiliency. Random Highest degree first
Slide 13/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Dynamic Properties How does node churn affect overlay dynamics? Are some regions of the overlay more stable? How can we identify such a region? Methodology: Capture a long series of back-to-back snapshots Estimate the uptime of individual peers in the last snapshot Group peers with uptime higher than a threshold Examine biased connectivity within each group Newly arrived peer Departed peer Present for 2 snapshots Present for 5 snapshots Time
Slide 14/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Stable Core Most peers have a short uptime. Other peers have been around for a long time. Stable core: a set of peers with uptime higher than a threshold ( ). Higher threshold => more stable group of peers T > 20 h T > 10 h
Slide 15/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Biased Connectivity Hypothesis: long-lived nodes tend to be more connected to other long-lived nodes Rationale: Once connected, they stay connected. Long-lived peers have more opportunities to become neighbor. To quantify bias in the connectivity of the stable core: Randomize the edges to create a graph without biased connectivity. Compare the edges in the observed stable core with the randomized graph.
Slide 16/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Stable Core Edges 20%40% more edges in the stable core compared to random. Connectivity exhibits an onion-like biased connectivity where peers are more likely to connect to other peers with same/higher uptime. We examined other properties of the stable core. Despite high churn, there is a relatively stable backbone.
Slide 17/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Summary Characterizations of Gnutella overlay based on recent and accurate snapshots. Graph properties: The degree distribution in Gnutella is not power law. Gnutella exhibits small world characteristics. Gnutella is resilient. Dynamic properties: There is a stable core within the overlay topology. Peer churn causes the stable core to exhibit an onion-like biased connectivity. This effect is likely to occur in other unstructured P2P systems. Daniel Stutzbach, Reza Rejaie, Subhabrata Sen,Characterizing Unstructured Overlay Topologies in Modern P2P File-Sharing Systems, Internet Measurement Conference, Berkeley, 2005
Slide 18/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Future Work Examining underlying causes of the biased connectivity. Exploring long-term trends in overlay properties. Characterizing churn Characterizing properties of other widely- deployed P2P systems Kad (a DHT with more than 1 million users) BitTorrent Developing sampling techniques for P2P
Slide 19/18 CCW 2005http://mirage.cs.uoregon.edu/P2P Ultrapeer->Leaf Degree LimeWire ultrapeers have a limit of 30 leaf peers. BearShare ultrapeers have a limit of 45 leaf peers. There are distinct spikes at those points, with an even distribution of fewer leaf peers. LimeWire BearShare Other Custom