Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan.

Similar presentations


Presentation on theme: "1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan."— Presentation transcript:

1 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

2 2 Outline zIntroduction: Web Caching zProxy Workload Generator (ProWGen) zEvaluation of Single-Level Caches zEvaluation of Multi-Level Caches zConclusions and Future Work zQuestions?

3 3 Introduction z“The Web is both a blessing and a curse…” zBlessing: yInternet available to the masses ySeamless exchange of information zCurse: yInternet available to the masses yStress on networks, protocols, servers, users zMotivation: techniques to improve the performance and scalability of the Web

4 4 Why is the Web so slow? zThree main possible reasons: zClient-side bottlenecks (PC, modem) ySolution: better access technologies (TRLabs) zServer-side bottlenecks (busy Web site) ySolution: faster, scalable server designs zNetwork bottlenecks (Internet congestion) ySolutions: caching, replication; improved protocols for client-server communication

5 5 What is a Web proxy cache? zIntermediary between Web clients (browsers) and Web servers zControlled Internet access point for an institution or organization (e.g., firewall) zNatural point for Web document caching zStore local copies of popular documents zForward requests to servers only if needed

6 Web Caching Proxy C CC C Web Clients Proxy Web Server Web Server Internet Region or Organization Boundary

7 7 Some Technical Issues zSize of cache zReplacement policy when cache is full zCache coherence (Get-If-Modified) zSome content is uncacheable zMulti-cache coordination, peering (ICP) zSecurity and privacy; “hit metering” zOther issues...

8 8 Our Previous Work zCollaborative project with CANARIE, through the Advanced Networks Applications program (July’98-June’99) zDesign and evaluation of Web caching strategies for Canada’s CA*net II backbone (National Web Caching Infrastructure) zFor more information, see URL http://www.cs.usask.ca/faculty/carey/projects/nwci.html

9 CA*net II Web Caching Hierarchy (Dec 1998)

10 USask CANARIE (Ottawa) (selected measurement points for our traffic analyses; 3-6 months of data from each) To NLANR

11 Caching Hierarchy Overview C C CCCCC Proxy... Regional/Univ. (5-10 GB) National (10-20 GB) Top-Level/International (20-50 GB) Cache Hit Ratios 30-40% 15-20% 5-10% (empirically observed)

12 12 NWCI Project Contributions zWorkload characterization and evaluation of CA*net II Web caching hierarchy (IEEE Network, May/June 2000) zDeveloped Web proxy caching simulator for trace-driven simulation evaluation of Web proxy caching hierarchies zRecommendations for CANARIE NWCI about configuration of future caches

13 13 Overview of This Talk zConstructed synthetic Web proxy workload generation tool (ProWGen) that captures the salient characteristics of empirical Web proxy workloads zUse ProWGen to evaluate sensitivity of proxy caches to workload characteristics zUse ProWGen to evaluate effectiveness of multi-level Web caching hierarchies (and cache management techniques)

14 14 Research Methodology zDesign, construction, and parameterization of workload models zValidation of ProWGen (statistically, and versus empirical workloads) zSimulation evaluation of single cache ySensitivity to workload characteristics yDifferent cache sizes, replacement policies zSimulation evaluation of multi-level cache ySensitivity to workload characteristics yNovel (heterogeneous) cache management policies

15 15 Key Workload Characteristics z“One-timers” (60-70% useless!!!) zZipf-like document referencing popularity zHeavy-tailed file size distribution (i.e., most files small, but most bytes are in big files) zCorrelations (if any) between document size and document popularity (debate!) zTemporal locality (temporal correlation between recent past and near future references) [Mahanti et al. 2000]

16 16 ProWGen Conceptual View ProWGen Software 1ZacL Input Parameters Synthetic Workload

17 17 ProWGen Conceptual View ProWGen Software 1ZacL P r Zipf Input Parameters Synthetic Workload

18 18 ProWGen Conceptual View ProWGen Software 1ZacL P r Zipf Input Parameters Synthetic Workload

19 19 ProWGen Conceptual View ProWGen Software 1ZacL P r Zipf F s LLCD -1 0 +1 Correlation Input Parameters Synthetic Workload

20 20 ProWGen: Workload Modeling Details zModeled workload characteristics yOne-time referencing yZipf-like referencing behaviour (Zipf’s Law ) yFile size distribution Body – lognormal distribution Tail – Pareto Distribution yCorrelation between file size and popularity yTemporal locality Static probabilities in finite-size LRU stack model Dynamic probabilities in finite-size LRU stack model

21 21 Validation of ProWGen zTo establish that the synthetic workloads possess the desired characteristics (quantitative and qualitative), and that the characteristics are similar to those in empirical workloads Example: analyze 5 million requests from a proxy server trace and parameterize ProWGen to generate a similar workload

22 22 Parameter Value Total number of requests Unique documents (of total requests) One-timers (of unique documents) Zipf slope Tail Index Documents in the tail Beginning of the tail (bytes) Mean of the lognormal file size distribution Standard deviation Correlation between file size and popularity LRU Stack Model for temporal locality LRU Stack Size 5,000,000 34% 72% 0.807 1.322 22% 10,000 7,000 11,000 Zero Static and Dynamic 1,000 Workload Synthesis

23 23 Zipf-like Referencing Behaviour Empirical Trace Slope = 0.81 Synthetic Trace Slope = 0.83

24 24 Transfer Size Distribution References Bytes transferred

25 25 Research Questions: Single-Level Caches zIn a single-level proxy cache, how sensitive is Web proxy caching performance to certain workload characteristics (one-timers, Zipf- ness, heavy-tail index)? zHow does the degree of sensitivity change depending on the cache replacement policy?

26 26 Web Clients Web Servers Proxy server Aggregate Workload Simulation Model

27 27 Factors and Levels zCache size zCache Replacement Policy yRecency-based LRU yFrequency-based LFU-Aging ySize-based GD-Size zWorkload Characteristics yOne-timers, Zipf slope, tail index, correlation, temporal locality model

28 28 Performance Metrics zCache hit ratio yPercent of requested docs found in cache (HR) yPercent of requested bytes found in cache (BHR) zUser response time yEstimated analytically using request rates, cache hit ratios, and (relative) cache miss penalties

29 29 Simulation Results (Preview) zCache performance is very sensitive to: ySlope of Zipf-like doc referencing popularity yTemporal locality property yCorrelations between size and popularity zCache performance relatively insensitive to: yTail index of heavy-tailed file size distribution yOne-timers

30 30 Sensitivity to One-timers (LRU) (a) Hit Ratio(a) Byte Hit Ratio

31 31 Sensitivity to Zipf Slope (LRU) (a) Hit Ratio(b) Byte Hit Ratio Difference of 0.2 in Zipf slope impacts performance by as much as 10-15% in hit ratio and byte hit ratio

32 32 Sensitivity to Heavy Tail Index (LRU Replacement Policy) (a) Hit Ratio(b) Byte Hit Ratio

33 33 Sensitivity to Heavy Tail Index (GD-Size Replacement Policy) (a) Hit Ratio(a) Byte Hit Ratio Difference of 0.2 in heavy tail index impacts performance by less than 3%

34 34 Sensitivity to Correlation (LRU) (a) Hit Ratio(a) Byte Hit Ratio

35 35 (a) Hit Ratio(b) Byte Hit Ratio Sensitivity to Temporal Locality (LRU)

36 36 Summary: Single-Level Caches zCache performance is sensitive to: ySlope of Zipf-like document referencing popularity yTemporal locality yCorrelation between size and popularity zCache Performance is insensitive to: yTail index of heavy-tailed file size distribution yOne-timers

37 37 Multi-Level Caching... zWorkload characteristics change as you move up the Web caching hierarchy (due to filtering effects, aggregation, etc) zIdea #1: Try different cache replacement policies at different levels of hierarchy zIdea #2: Limit replication of cache content in overall hierarchy through “partitioning” (size, type, sharing,…)

38 38 Research Questions: Multi-Level Caches zIn a multi-level caching hierarchy, can overall caching performance be improved by using different cache replacement policies at different levels of the hierarchy? zIn a multi-level caching hierarchy, can overall performance be improved by keeping disjoint document sets at each level of the hierarchy?

39 39 Simulation Model Proxy server Web Servers Web Clients Proxy server Upper Level (Parent) Complete Overlap No Overlap Partial Overlap (50%) Lower Level (Children)

40 40 Experiment 1: Different Policies at Different Levels of the hierarchy (a) Hit Ratio (b) Byte Hit Ratio Parent Children

41 41 Experiment 2: Shared files at the upper level of the hierarchy (c) No Overlap Children Parent (a) Complete Overlap Children Parent (b) Partial Overlap Children Parent

42 42 Experiment 3: Size-based Partitioning zPartition files across the two levels based on sizes (e.g., keep small files at the lower level and large files at the upper level) (or vice versa) zThree size thresholds y5,000 bytes y10,000 bytes y100,000 bytes

43 43 Size threshold = 5,000 bytes Size threshold = 10,000 bytes Small files at the lower level; Large files at the upper level Parent Children

44 44 Size threshold = 5,000 bytes Size threshold = 10,000 bytes Children Parent Large files at the lower level; Small files at the upper level

45 45 Summary: Multi-Level Caches zDifferent Policies at different levels yLRU/LFU-Aging at the lower level + GD-Size at the upper level provided improvement in performance yGD-Size + GD-Size provided better performance in hit ratio, but with some penalty in byte hit ratio zSharing-based approach yno benefit compared to the other cases studied zSize-threshold approach ysmall files at the lower level + large files at the upper level provided improvement in performance yreversing this policy offered no perf advantage

46 46 Conclusions zProWGen is a valuable tool for the evaluation of Web proxy caching architectures, using synthetic workloads zExisting multi-level caching hierarchies are not always that effective z“Heterogeneous” caching architectures may better exploit workload characteristics and improve Web caching performance

47 47 Future Work zExtend the multi-level experiments ylook into configurations where there is communication between the lower level proxies yinvestigate configurations involving more levels and and more lower level proxies zExtend ProWGen ymodel response time ymodel file size modifications

48 48 For More Information... zM. Busari, “Simulation Evaluation of Web Caching Hierarchies”, M.Sc. Thesis, June 2000 zTwo papers available soon (under review) zProWGen tool is available now zEmail: carey@cs.usask.ca zhttp://www.cs.usask.ca/faculty/carey/


Download ppt "1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan."

Similar presentations


Ads by Google