Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006.

Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006

2 Outline  Motivation  Related Work  Modeling P2P traffic  P2P Caching algorithm  Performance Evaluation  Conclusions & Future Work

3 Motivations  P2P traffic is a major fraction of Internet traffic  60% of Internet traffic is P2P [CacheLogic’04]  … and it is increasing [Karagiannis 04]  Negative consequences -increased load on networks  -higher cost on ISPs (and users!), and -more congestion  Can traffic caching help ?

4 Our Problem  Design an effective proxy caching scheme for P2P traffic  Main objective: -Reduce WAN traffic  reduce cost & congestion C AS P P P C P P P

5 Our Solution Approach  Measure and model P2P traffic characteristics relevant to caching, i.e., -seen by cache deployed in an autonomous systems (AS) -Characteristics include: popularity, size, and popularity dynamics.  Then, develop a caching algorithm

6 Why not use Web/Video Caching Algorithms?  Different traffic characteristics: -P2P vs. Web: P2P objects are large, immutable and have different popularity models -P2P vs. Video: P2P objects do not impose any timing constraints  Different caching objectives: -Web: minimize latency, make users happy -Video: minimize start-up delay, and enhance quality -P2P: minimize bandwidth consumption

7 Related Work  Several P2P measurement studies, e.g., -[Gummadi 03]: Object popularity is not Zipf, but no closed-form model is given, conducted in one network domain -[Klemm 04]: Query popularity follows mixture of two Zipf distributions, we use popularity of actual object transfers -[Leibowitz 02] [Karagiannis 04]: highlight potential of caching P2P traffic, no caching algorithms presented -All provide useful insights, but they were not explicitly designed to study caching P2P traffic  P2P caching algorithms -[Wierzbicki 04]: proposed two P2P caching algorithms, we compare against the best of them (LSB) -We also compare against LRU, LFU, and GDS

8 Measurement Study  Modified Limewire (Gnutella) to: -Run in super peer mode -Maintain up to 500 concurrent connections (70% with other super nodes -Log all query and queryhit messages  Measure and model -Object popularity -Popularity dynamics -Object sizes  Why Gnutella? -Supports passive measurements -Open source: easy to modify -One of the top-three most popular protocols [Zhao 06]

9 Measurement Study: Stats  Is it representative for P2P traffic? We believe so. -Traffic characteristics are similar in different P2P systems [Gummadi 03]: Non-Zipf traffic in Kazza, same as ours [Saroiu 03]: Napster and Gnutella have similar session duration, host up time, #files shared [Pouwelse 04]: Similar host up time and object properties in BitTorrent Measurement PeriodJan 06 – Sep 06 Unique Objects17 M Unique IPs39 M ASes with more than 100,000 downloads 127 Total traffic volume6,262 Tera Bytes

10 Measuring Object Popularity  Organize traces into autonomous systems: -QueryHit message contains: list of objects on host & IP address of the host. -Record: (object, IP) pairs in trace files. -Group (object, IP) pairs into their autonomous systems (ASes) using GeoIP database.  Get object popularity: -We use URN (Uniform Resource Name) to identify unique objects. -For each unique object in an AS, count the number of IP address associated with it -count of unique objects = # of downloads = object popularity. -Rank objects popularity, and plot popularity vs. rank

11 Measurement Study: Object Popularity Notice the flattened head, unlike Zipf

12 Modeling Object Popularity  We propose a Mandelbrot-Zipf (MZipf) model for P2P object popularity: -α: skewness factor, same as Zipf-like distributions -q: plateau factor, controls the plateau shape (flattened head) near the lowest ranked objects -Larger q values  more flattened head  Validation across top 20 ASes (in terms of traffic) -Sample in previous slide

13 Zipf vs. Mandelbrot-Zipf  Zipf over-estimates popularity of objects at lowest ranks  Which are the good candidates for caching Zipf

14 Effect of MZipf on Caching Simple analysis using LFU policy Significant bye hit rate loss at realistic cache sizes (e.g., 10%) (H Zipf - H MZipf ) / H Zipf

15 Effect of MZipf on Caching (cont’d) Trace-based simulation using Optimal policy in two ASes larger q (more flattened head)  smaller byte hit rate

16 When is q large?  In ASes with small number of hosts & -Immutable objects  download at most once behavior  -Object popularity bounded by number of hosts  large  In ASes with large avg number of downloads/host -Download at most once behavior  download more unpopular objects  -Frequency of popular objects saturates, frequency of unpopular objects increases  large q

17 Popularity Dynamics  We trace the popularity of the top 100 object observed in the third month of our measurement as seen in: -Top 1 st AS -Top 2 nd AS -All ASes  Popularity is dynamic – objects enjoy around 3 months of popularity First AS All ASes

18 Object Size  Exhibits several peaks corresponding to different types of content.  Consequences: algorithms might be biased against certain workloads: -Recency-based algorithms: biased against large objects -Size-based algorithms: biased against smaller objects.

19 P2P Caching Algorithm: Basic Idea  Proportional Partial Caching -Cache fraction of the object proportional to its popularity -Motivated by the Mandelbrot-Zipf popularity model -Minimizes the effect of caching large unpopular objects  Segmentation -Divide objects into segments of different sizes -Motivated by the existence of multiple workloads  Replacement -Replace the objects with the least γ value : -γ i = the number of served from object i /cached size of i

20 P2P Caching Algorithm: Admission  Rank cached objects: γ 1 ≥ γ 2 ≥ γ 3 ≥ … ≥ γ n  Average size of workload w = μ w  Admit one segment of an object when it is first seen  After that the object deserves to cache [ max[1, ( γ i / γ 1 ) μ w ] segments  Catch: do not cache more than what is requested  Actual # of segments cached k

21 P2P Caching Algorithm (Pseudo-code)

22 Trace-based Performance Evaluation  Algorithms Implemented -Web policies: LRU, LFU, Greedy-Dual Size (GDS) -P2P policies: Least Sent Bytes (LSB) [Wierzbicki 04] -Offline Optimal Policy (OPT): looks at entire trace, caches objects that maximize byte hit rate  Scenarios -With and without aborted downloads -Various degrees of temporal locality (popularity, temporal correlation)  Performance -Byte Hit Rate (BHR) in top 10 ASes -Importance of partial caching -Sensitivity of our algorithm to: segment size, plateau and skewness factors

23 Byte Hit Rate: No Aborted Downloads  BHR of our algorithm is close to the optimal, much better than LRU, LFU, GDS, LSB AS 397

24 Byte Hit Rate: No Aborted Downloads (cont’d)  Our algorithm consistently outperforms all others in top 10 ASes Top 10 ASes

25 Byte Hit Rate: Aborted Downloads  Same traces as before, adding 2 partial transactions for every complete transaction [Gummadi 03], fail anywhere in the session [Wierzbicki 04]  Performance gap is even wider -BHR is at least 40% more, and -At most triple the BHR of other algorithms Top 10 ASesAS 397

26 Importance of Partial Caching (1)  Compare our algorithm with and without partial caching -Keeping everything else fixed  Performance of our algorithm degrades without partial caching in all top 10 ASes

27 Importance of Partial Caching (2)  Compare against an optimal policy that does not do partial caching  MKP = store Most K Popular full objects that fill the cache  Our policy outperforms MKP in 6 out of 10 top ASes, and close to it in the others  MKP: optimal, no partial caching  P2P: heuristic with partial caching

28 Importance of Partial Caching (3)  Now, given that our P2P partial caching algorithm -Outperforms LRU, LFU, GDS (all full caching) -Is close to the offline OPT (maximizes byte hit rate) -Outperforms the offline MKP (stores most K-popular objects) -Suffers when we remove partial caching  It is reasonable to believe that Partial caching is critical in P2P systems, because of large object sizes and MZipf popularity

29 Sensitivity to temporal locality (1)  Temporal locality = temporal correlations + popularity  Temporal correlations: how clustered requests to the same objects are.  To Study the combined effect, we use original traces.  LRU & GDS improve: they use request recency.  Our policy does not suffer too much [BHR reduction < 3%].  Reason: object size is the dominant factor in P2P traffic. AS397

30 Sensitivity to temporal locality (2)  Here we fix popularity, and vary correlation degree.  We use LRU stack Model, and generate synthetic traces with MZipf by modifying ProWGen  Conclusion: P2P algorithm is not very sensitive to temporal correlations.

31 Effect of skewness factor α, on performance of P2P algorithm  Large α means popular objects receive larger portion of the overall traffic  high BHR  ASes with large α benefit more from caching

32 Effect of plateau factor q on performance of P2P algorithm  Small q value  less flattened head  popular objects receive larger portion of the overall traffic.  ASes with large number of hosts; small average # of downloads per host  small q  Such ASes benefit more from caching.

33 Effect of segmentation on P2P caching  All objects same segments: -Small segments: too much overhead -Large segments: biased against large objects  Our segmentation: large segments for large objects, small segments for small objects  Fair: all objects get fair chance of being cached at a similar rate.  Less overhead  And with some performance gain as well

34 Conclusions  Conducted eight-month study to measure and model P2P traffic characteristics relevant caching  Found that object popularity can be modeled by Mandelbrot-Zipf distribution (flattened head)  Proposed a new proportional partial caching algorithm for P2P traffic -Outperforms other algorithms by wide margins, -Robust against different traffic patterns

35 Future Work  Implement a P2P proxy cache prototype  Extend measurement study to include other P2P protocols  Analytically analyze our P2P caching algorithm  Use cooperative caching between proxy caches at different ASes  Cache Zoning & Partitioning: different zones (algorithms?) for different workloads.

36 Thank You! Questions

Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006.

Similar presentations

Presentation on theme: "Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006.

Similar presentations

Presentation on theme: "Modeling and Caching of P2P Traffic Osama Saleh Thesis Defense and Seminar 21 November 2006."— Presentation transcript:

Similar presentations

About project

Feedback