Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006.

Similar presentations


Presentation on theme: "Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006."— Presentation transcript:

1 Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006

2 2 Outline  Motivation  Related Work  Modeling P2P traffic  P2P Caching algorithm  Performance Evaluation  Conclusions & Future Work

3 3 Motivations  P2P traffic is a major fraction of Internet traffic  60% of Internet traffic is P2P [CacheLogic’04]  … and it is increasing [Karagiannis 04]  Negative consequences -increased load on networks  -higher cost on ISPs (and users!), and -more congestion  Can traffic caching help ?

4 4 Our Problem  Design an effective proxy caching scheme for P2P traffic  Main objective: -Reduce WAN traffic  reduce cost & congestion C AS P P P C P P P

5 5 Our Solution Approach  Measure and model P2P traffic characteristics relevant to caching, i.e., -seen by cache deployed in an autonomous systems (AS) -Characteristics include: popularity, size, and popularity dynamics.  Then, develop a caching algorithm

6 6 Why not use Web/Video Caching Algorithms?  Different traffic characteristics: -P2P vs. Web: P2P objects are large, immutable and have different popularity models -P2P vs. Video: P2P objects do not impose any timing constraints  Different caching objectives: -Web: minimize latency, make users happy -Video: minimize start-up delay, and enhance quality -P2P: minimize bandwidth consumption

7 7 Related Work  Several P2P measurement studies, e.g., -[Gummadi 03]: Object popularity is not Zipf, but no closed-form model is given, conducted in one network domain -[Klemm 04]: Query popularity follows mixture of two Zipf distributions, we use popularity of actual object transfers -[Leibowitz 02] [Karagiannis 04]: highlight potential of caching P2P traffic, no caching algorithms presented -All provide useful insights, but they were not explicitly designed to study caching P2P traffic  P2P caching algorithms -[Wierzbicki 04]: proposed two P2P caching algorithms, we compare against the best of them (LSB) -We also compare against LRU, LFU, and GDS

8 8 Measurement Study  Modified Limewire (Gnutella) to: -Run in super peer mode -Maintain up to 500 concurrent connections (70% with other super nodes -Log all query and queryhit messages  Measure and model -Object popularity -Popularity dynamics -Object sizes  Why Gnutella? -Supports passive measurements -Open source: easy to modify -One of the top-three most popular protocols [Zhao 06]

9 9 Measurement Study: Stats  Is it representative for P2P traffic? We believe so. -Traffic characteristics are similar in different P2P systems [Gummadi 03]: Non-Zipf traffic in Kazza, same as ours [Saroiu 03]: Napster and Gnutella have similar session duration, host up time, #files shared [Pouwelse 04]: Similar host up time and object properties in BitTorrent Measurement PeriodJan 06 – Sep 06 Unique Objects17 M Unique IPs39 M ASes with more than 100,000 downloads 127 Total traffic volume6,262 Tera Bytes

10 10 Measuring Object Popularity  Organize traces into autonomous systems: -QueryHit message contains: list of objects on host & IP address of the host. -Record: (object, IP) pairs in trace files. -Group (object, IP) pairs into their autonomous systems (ASes) using GeoIP database.  Get object popularity: -We use URN (Uniform Resource Name) to identify unique objects. -For each unique object in an AS, count the number of IP address associated with it -count of unique objects = # of downloads = object popularity. -Rank objects popularity, and plot popularity vs. rank

11 11 Measurement Study: Object Popularity Notice the flattened head, unlike Zipf

12 12 Modeling Object Popularity  We propose a Mandelbrot-Zipf (MZipf) model for P2P object popularity: -α: skewness factor, same as Zipf-like distributions -q: plateau factor, controls the plateau shape (flattened head) near the lowest ranked objects -Larger q values  more flattened head  Validation across top 20 ASes (in terms of traffic) -Sample in previous slide

13 13 Zipf vs. Mandelbrot-Zipf  Zipf over-estimates popularity of objects at lowest ranks  Which are the good candidates for caching Zipf

14 14 Effect of MZipf on Caching Simple analysis using LFU policy Significant bye hit rate loss at realistic cache sizes (e.g., 10%) (H Zipf - H MZipf ) / H Zipf

15 15 Effect of MZipf on Caching (cont’d) Trace-based simulation using Optimal policy in two ASes larger q (more flattened head)  smaller byte hit rate

16 16 When is q large?  In ASes with small number of hosts & -Immutable objects  download at most once behavior  -Object popularity bounded by number of hosts  large  In ASes with large avg number of downloads/host -Download at most once behavior  download more unpopular objects  -Frequency of popular objects saturates, frequency of unpopular objects increases  large q

17 17 Popularity Dynamics  We trace the popularity of the top 100 object observed in the third month of our measurement as seen in: -Top 1 st AS -Top 2 nd AS -All ASes  Popularity is dynamic – objects enjoy around 3 months of popularity First AS All ASes

18 18 Object Size  Exhibits several peaks corresponding to different types of content.  Consequences: algorithms might be biased against certain workloads: -Recency-based algorithms: biased against large objects -Size-based algorithms: biased against smaller objects.

19 19 P2P Caching Algorithm: Basic Idea  Proportional Partial Caching -Cache fraction of the object proportional to its popularity -Motivated by the Mandelbrot-Zipf popularity model -Minimizes the effect of caching large unpopular objects  Segmentation -Divide objects into segments of different sizes -Motivated by the existence of multiple workloads  Replacement -Replace the objects with the least γ value : -γ i = the number of served from object i /cached size of i

20 20 P2P Caching Algorithm: Admission  Rank cached objects: γ 1 ≥ γ 2 ≥ γ 3 ≥ … ≥ γ n  Average size of workload w = μ w  Admit one segment of an object when it is first seen  After that the object deserves to cache [ max[1, ( γ i / γ 1 ) μ w ] segments  Catch: do not cache more than what is requested  Actual # of segments cached k

21 21 P2P Caching Algorithm (Pseudo-code)

22 22 Trace-based Performance Evaluation  Algorithms Implemented -Web policies: LRU, LFU, Greedy-Dual Size (GDS) -P2P policies: Least Sent Bytes (LSB) [Wierzbicki 04] -Offline Optimal Policy (OPT): looks at entire trace, caches objects that maximize byte hit rate  Scenarios -With and without aborted downloads -Various degrees of temporal locality (popularity, temporal correlation)  Performance -Byte Hit Rate (BHR) in top 10 ASes -Importance of partial caching -Sensitivity of our algorithm to: segment size, plateau and skewness factors

23 23 Byte Hit Rate: No Aborted Downloads  BHR of our algorithm is close to the optimal, much better than LRU, LFU, GDS, LSB AS 397

24 24 Byte Hit Rate: No Aborted Downloads (cont’d)  Our algorithm consistently outperforms all others in top 10 ASes Top 10 ASes

25 25 Byte Hit Rate: Aborted Downloads  Same traces as before, adding 2 partial transactions for every complete transaction [Gummadi 03], fail anywhere in the session [Wierzbicki 04]  Performance gap is even wider -BHR is at least 40% more, and -At most triple the BHR of other algorithms Top 10 ASesAS 397

26 26 Importance of Partial Caching (1)  Compare our algorithm with and without partial caching -Keeping everything else fixed  Performance of our algorithm degrades without partial caching in all top 10 ASes

27 27 Importance of Partial Caching (2)  Compare against an optimal policy that does not do partial caching  MKP = store Most K Popular full objects that fill the cache  Our policy outperforms MKP in 6 out of 10 top ASes, and close to it in the others  MKP: optimal, no partial caching  P2P: heuristic with partial caching

28 28 Importance of Partial Caching (3)  Now, given that our P2P partial caching algorithm -Outperforms LRU, LFU, GDS (all full caching) -Is close to the offline OPT (maximizes byte hit rate) -Outperforms the offline MKP (stores most K-popular objects) -Suffers when we remove partial caching  It is reasonable to believe that Partial caching is critical in P2P systems, because of large object sizes and MZipf popularity

29 29 Sensitivity to temporal locality (1)  Temporal locality = temporal correlations + popularity  Temporal correlations: how clustered requests to the same objects are.  To Study the combined effect, we use original traces.  LRU & GDS improve: they use request recency.  Our policy does not suffer too much [BHR reduction < 3%].  Reason: object size is the dominant factor in P2P traffic. AS397

30 30 Sensitivity to temporal locality (2)  Here we fix popularity, and vary correlation degree.  We use LRU stack Model, and generate synthetic traces with MZipf by modifying ProWGen  Conclusion: P2P algorithm is not very sensitive to temporal correlations.

31 31 Effect of skewness factor α, on performance of P2P algorithm  Large α means popular objects receive larger portion of the overall traffic  high BHR  ASes with large α benefit more from caching

32 32 Effect of plateau factor q on performance of P2P algorithm  Small q value  less flattened head  popular objects receive larger portion of the overall traffic.  ASes with large number of hosts; small average # of downloads per host  small q  Such ASes benefit more from caching.

33 33 Effect of segmentation on P2P caching  All objects same segments: -Small segments: too much overhead -Large segments: biased against large objects  Our segmentation: large segments for large objects, small segments for small objects  Fair: all objects get fair chance of being cached at a similar rate.  Less overhead  And with some performance gain as well

34 34 Conclusions  Conducted eight-month study to measure and model P2P traffic characteristics relevant caching  Found that object popularity can be modeled by Mandelbrot-Zipf distribution (flattened head)  Proposed a new proportional partial caching algorithm for P2P traffic -Outperforms other algorithms by wide margins, -Robust against different traffic patterns

35 35 Future Work  Implement a P2P proxy cache prototype  Extend measurement study to include other P2P protocols  Analytically analyze our P2P caching algorithm  Use cooperative caching between proxy caches at different ASes  Cache Zoning & Partitioning: different zones (algorithms?) for different workloads.

36 36 Thank You! Questions


Download ppt "Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006."

Similar presentations


Ads by Google