Presentation is loading. Please wait.

Presentation is loading. Please wait.

Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.

Similar presentations


Presentation on theme: "Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer."— Presentation transcript:

1 Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer Science The Hebrew University of Jerusalem Supported by a grant from the Israel Internet Association

2 Capacity Planning Daily cycle of activity Utilized capacity Wasted capacity time capacity

3 Capacity Planning Flash crowd capacity time

4 Capacity Planning The problem: –Required capacity for flash crowds cannot be anticipated in advance –Even capacity for daily fluctuations is highly wasteful Academic solution: use admission control Business practice: unacceptable to reject any clients –Especially in cases of surge in traffic

5 Content Adaptation Trade off quality for throughput –Installed capacity matches normal load –Handle abnormal load by reducing quality –But still manage to provide meaningful service to all clients Assumes normal optimizations have been made already –Compress or combine images, promote caching, … –Empirically this usually is not the case

6 Content Adaptation smily Low load

7 Content Adaptation smily High load smily

8 Content Adaptation Maintain the invariant: Need to change quality (and cost!) of content –Prepare multiple versions in advance

9 The Questions What are the main costs in web service? –Bottleneck is CPU / network / disk? –What do we gain by eliminating HTTP requests? –What do we gain by reducing file sizes? What can realistically be done? –What is the structure of a “random” site? –How much can we reduce quality? Assumption: static web pages only

10 Costs of Serving Web Pages

11 Measuring Random Web Sites http://en.wikipedia.org/wiki/Special:Random Use title of page as input to Google search Extract domain of first link to get home page Retrieve it using IE Collect statistical data by intercepting system calls to send and receive

12 Retrieved Component Sizes This is only 0.02% of the components A ¼ of total data from components larger than 200 KB

13 Download Times Download time (and bandwidth requirements) roughly proportional to image size

14 Network Bandwidth Typical Ethernet packets are 1526 bytes –Ethernet and TCP/IP headers require 54 bytes –HTTP response headers require 280-325 Most components fit into few packets –43% fit into a single packet –24% more fit into 2 packets Save bandwidth by reducing number of small components or size of large components

15 Locality and Caching Flash crowds typically involve a very small number of pages (possibly the home page) Servers allocate GB of memory for cache This is enough for thousands of files Disk is not expected to be a bottleneck

16 CPU Overhead CPU usage reflects several activities –Opening TCP connection –Processing request –Sending data Measure using combinatorical microbenchmarks –Open connection only –One extremely large file –Many small files –Many requests for non-existent file

17 CPU Overhead Example : single 10KB file Equal processing and transfer at 240KB –Only 0.3% of files are so big Establishing connection25% Processing request72% Data transfer3% If CPU is bottleneck, need to reduce number of requests

18 Optimizations

19 Guidelines Either CPU or network are the bottleneck Network bandwidth saved by reducing large components CPU saved by eliminating small components Maintaining “acceptable” quality is subjective

20 Eliminating Images Images have many functions –Story (main illustrative item) –Preview (for other page) –Commercial –Logo –Decoration (bullets, background) –Navigation (buttons, menus) –Text (special formatting) Some can be eliminated or replaced

21 Distribution of Types Manually classified 959 images from 30 random sites 50% decoration 18% preview 11% commercial 6% logo 6% text

22 Automatic Identification Decorations are candidates for elimination Identified by combination of attributes: –Use gif format –Appear in HTML tags other than –Appear multiple times in same page –Small original size –Displayed size much bigger than original –Large change in aspect ratio when displayed

23 Image Sizes Distribution decoration preview commercial

24 Auxiliary Files JavaScript –May be crucial for page function –Impossible to understand automatically CSS (style sheets) –May be crucial for page structure –May be possible to identify those parts that are used

25 Auxiliary Files Cannot be eliminated Common wisdom: use separate files –Allow caching at client –Save retransmission with each page Alternative: embed in HTML –Reduce number of requests –May be better for flash crowds that do not request multiple pages

26 Text and HTML Some areas may be eliminated under extreme conditions –Commercials –Some previews and navigation options Often encapsulated in tags Sometimes identified by ID or class names, e.g. “sidebanner” –Especially when using modular design

27 Summary

28 Content Adaptation Degraded content usually better than exclusion Only way to handle flash crowds that overwhelm installed capacity Empirical results identify main options –Identify and eliminate decorations –Compress large images (story, commercial) –Embed JavaScript and CSS –Hide unnecessary blocks

29 Next Paper Preview Implementation in Apache Monitor CPU utilization and idle threads to switch between modes Use mod_rewrite to redirect URLs to adapted content Achieve up to x10 increase in throughput for extreme adaptation


Download ppt "Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer."

Similar presentations


Ads by Google