Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.

Slides:



Advertisements
Similar presentations
CHAPTER 15 WEBPAGE OPTIMIZATION. LEARNING OBJECTIVES How to test your web-page performance How browser and server interactions impact performance What.
Advertisements

Copyright © 2012 Certification Partners, LLC -- All Rights Reserved Lesson 4: Web Browsing.
Lesson 4: Web Browsing.
1 Web Server Performance in a WAN Environment Vincent W. Freeh Computer Science North Carolina State Vsevolod V. Panteleenko Computer Science & Engineering.
XP Browser and Basics1. XP Browser and Basics2 Learn about Web browser software and Web pages The Web is a collection of files that reside.
Mi-Joung choi, Hong-Taek Ju, Hyun-Jun Cha, Sook-Hyang Kim and J
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
CIS101 Introduction to Computing Week 05. Agenda Your questions Exam next week - Excel Introduction to the Internet & HTML Online HTML Resources Using.
CIS101 Introduction to Computing Week 05. Agenda Your questions CIS101 Survey Introduction to the Internet & HTML Online HTML Resources Using the HTML.
World Wide Web1 Applications World Wide Web. 2 Introduction What is hypertext model? Use of hypertext in World Wide Web (WWW) – HTML. WWW client-server.
Introduction to Web Interface Technology (CSE2030)
1 Static Web Pages Websites on Servers (The Big Picture) –Apache Tomcat can support static web pages –Primarily intended to support servlets and JSP –Some.
Introduction to HTML 2006 CIS101. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
Introduction to HTML 2006 INT197B. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
Introduction to HTML 2004 CIS101. What is the Internet? Global network of computers that are connected and communicate via a series of Protocols Protocols.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
Adaptive Content Delivery for Scalable Web Servers Authors: Rahul Pradhan and Mark Claypool Presented by: David Finkel Computer Science Department Worcester.
Microsoft ® Official Course Developing Optimized Internet Sites Microsoft SharePoint 2013 SharePoint Practice.
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
1 The World Wide Web. 2  Web Fundamentals  Pages are defined by the Hypertext Markup Language (HTML) and contain text, graphics, audio, video and software.
CIS101 Introduction to Computing Week 06. Agenda Your questions Excel Exam during second hour Our status after the snow day Introduction to the Internet.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
Hybrid Prefetching for WWW Proxy Servers Yui-Wen Horng, Wen-Jou Lin, Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic.
HTTP; The World Wide Web Protocol
Dynamic Web Pages (Flash, JavaScript)
Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
A Web Crawler Design for Data Mining
JavaScript, Fourth Edition
MIS 424 Professor Sandvig. Overview  Why Analytics?  Two major approaches:  Server logs  Google Analytics.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
HTML Basics BCIS 3680 Enterprise Programming. Web Client/Server Architecture 2  Your browser (the client) requests a Web page from a remote computer.
Advanced Network Architecture Research Group 2001/11/74 th Asia-Pacific Symposium on Information and Telecommunication Technologies Design and Implementation.
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
XP Practical PC, 3e Chapter 8 1 Browsing and Searching the Web.
Web Design (1) Terminology. Coding ‘languages’ (1) HTML - Hypertext Markup Language - describes the content of a web page CSS - Cascading Style Sheets.
Use CSS to Implement a Reusable Design Selecting a Dreamweaver CSS Starter Layout is the easiest way to create a page with a CSS layout You can access.
The Intranet.
Department of Computer Science Internet Performance Measurements using Firefox Extensions Scot L. DeDeo Professor Craig Wills.
Chapter 29 World Wide Web & Browsing World Wide Web (WWW) is a distributed hypermedia (hypertext & graphics) on-line repository of information that users.
1 WWW. 2 World Wide Web Major application protocol used on the Internet Simple interface Two concepts –Point –Click.
SEO Friendly Website Building a visually stunning website is not enough to ensure any success for your online presence.
Web Technologies Lecture 1 The Internet and HTTP.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
The Good, the Bad & the Ugly: Style and design in Website creation Chris Webster: Information Officer and Website Manager at the EARL Consortium for Public.
Introduction to the World Wide Web & Internet CIS 101.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
1 Chapter 22 World Wide Web (HTTP) Chapter 22 World Wide Web (HTTP) Mi-Jung Choi Dept. of Computer Science and Engineering
Performance Evaluation of Redirection Schemes in Content Distribution Networks Jussi Kangasharju, Keith W. Ross Institut Eurecom Jim W. Roberts France.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Blended HTML and CSS Fundamentals 3 rd EDITION Tutorial 2 Creating Links.
Presented by Michael Rainey South Mississippi Linux Users Group
Tiny http client and server
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
WWW and HTTP King Fahd University of Petroleum & Minerals
CISC103 Web Development Basics: Web site:
Browsing and Searching the Web
E-commerce | WWW World Wide Web - Concepts
E-commerce | WWW World Wide Web - Concepts
Web Caching? Web Caching:.
Processes The most important processes used in Web-based systems and their internal organization.
CISC103 Web Development Basics: Web site:
Windows Internet Explorer 7-Illustrated Essentials
Tools to Show Effects of Different Download Order
CS5123 Software Validation and Quality Assurance
Presentation transcript:

Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer Science The Hebrew University of Jerusalem Supported by a grant from the Israel Internet Association

Capacity Planning Daily cycle of activity Utilized capacity Wasted capacity time capacity

Capacity Planning Flash crowd capacity time

Capacity Planning The problem: –Required capacity for flash crowds cannot be anticipated in advance –Even capacity for daily fluctuations is highly wasteful Academic solution: use admission control Business practice: unacceptable to reject any clients –Especially in cases of surge in traffic

Content Adaptation Trade off quality for throughput –Installed capacity matches normal load –Handle abnormal load by reducing quality –But still manage to provide meaningful service to all clients Assumes normal optimizations have been made already –Compress or combine images, promote caching, … –Empirically this usually is not the case

Content Adaptation smily Low load

Content Adaptation smily High load smily

Content Adaptation Maintain the invariant: Need to change quality (and cost!) of content –Prepare multiple versions in advance

The Questions What are the main costs in web service? –Bottleneck is CPU / network / disk? –What do we gain by eliminating HTTP requests? –What do we gain by reducing file sizes? What can realistically be done? –What is the structure of a “random” site? –How much can we reduce quality? Assumption: static web pages only

Costs of Serving Web Pages

Measuring Random Web Sites Use title of page as input to Google search Extract domain of first link to get home page Retrieve it using IE Collect statistical data by intercepting system calls to send and receive

Retrieved Component Sizes This is only 0.02% of the components A ¼ of total data from components larger than 200 KB

Download Times Download time (and bandwidth requirements) roughly proportional to image size

Network Bandwidth Typical Ethernet packets are 1526 bytes –Ethernet and TCP/IP headers require 54 bytes –HTTP response headers require Most components fit into few packets –43% fit into a single packet –24% more fit into 2 packets Save bandwidth by reducing number of small components or size of large components

Locality and Caching Flash crowds typically involve a very small number of pages (possibly the home page) Servers allocate GB of memory for cache This is enough for thousands of files Disk is not expected to be a bottleneck

CPU Overhead CPU usage reflects several activities –Opening TCP connection –Processing request –Sending data Measure using combinatorical microbenchmarks –Open connection only –One extremely large file –Many small files –Many requests for non-existent file

CPU Overhead Example : single 10KB file Equal processing and transfer at 240KB –Only 0.3% of files are so big Establishing connection25% Processing request72% Data transfer3% If CPU is bottleneck, need to reduce number of requests

Optimizations

Guidelines Either CPU or network are the bottleneck Network bandwidth saved by reducing large components CPU saved by eliminating small components Maintaining “acceptable” quality is subjective

Eliminating Images Images have many functions –Story (main illustrative item) –Preview (for other page) –Commercial –Logo –Decoration (bullets, background) –Navigation (buttons, menus) –Text (special formatting) Some can be eliminated or replaced

Distribution of Types Manually classified 959 images from 30 random sites 50% decoration 18% preview 11% commercial 6% logo 6% text

Automatic Identification Decorations are candidates for elimination Identified by combination of attributes: –Use gif format –Appear in HTML tags other than –Appear multiple times in same page –Small original size –Displayed size much bigger than original –Large change in aspect ratio when displayed

Image Sizes Distribution decoration preview commercial

Auxiliary Files JavaScript –May be crucial for page function –Impossible to understand automatically CSS (style sheets) –May be crucial for page structure –May be possible to identify those parts that are used

Auxiliary Files Cannot be eliminated Common wisdom: use separate files –Allow caching at client –Save retransmission with each page Alternative: embed in HTML –Reduce number of requests –May be better for flash crowds that do not request multiple pages

Text and HTML Some areas may be eliminated under extreme conditions –Commercials –Some previews and navigation options Often encapsulated in tags Sometimes identified by ID or class names, e.g. “sidebanner” –Especially when using modular design

Summary

Content Adaptation Degraded content usually better than exclusion Only way to handle flash crowds that overwhelm installed capacity Empirical results identify main options –Identify and eliminate decorations –Compress large images (story, commercial) –Embed JavaScript and CSS –Hide unnecessary blocks

Next Paper Preview Implementation in Apache Monitor CPU utilization and idle threads to switch between modes Use mod_rewrite to redirect URLs to adapted content Achieve up to x10 increase in throughput for extreme adaptation