2005/2/23 HUT T-110.456 Characterizing Web Workload of Mobile Clients Chuang Yu Juha Raitio.

Slides:



Advertisements
Similar presentations
1 Analyzing Browse Patterns of Mobile Clients Lili Qiu Joint work with Atul Adya and Victor Bahl Microsoft Research ACM.
Advertisements

October 15, 2002MASCOTS WebTraff: A GUI for Web Proxy Cache Workload Modeling and Analysis Nayden Markatchev Carey Williamson Department of Computer.
Toyota InfoTechnology Center U.S.A, Inc. 1 Mixture Models of End-host Network Traffic John Mark Agosta, Jaideep Chandrashekar, Mark Crovella, Nina Taft.
Computer Science Generating Streaming Access Workload for Performance Evaluation Shudong Jin 3nd Year Ph.D. Student (Advisor: Azer Bestavros)
What’s the Problem Web Server 1 Web Server N Web system played an essential role in Proving and Retrieve information. Cause Overloaded Status and Longer.
IEEE PIMRC A Comparative Measurement Study of the Workload of Wireless Access Points in Campus Networks Maria Papadopouli Assistant Professor Department.
The War Between Mice and Elephants LIANG GUO, IBRAHIM MATTA Computer Science Department Boston University ICNP (International Conference on Network Protocols)
Adapted from Menascé & Almeida.1 Workload Characterization for the Web.
1 10 Web Workload Characterization Web Protocols and Practice.
GlobeTraff A traffic workload generator for the performance evaluation of ICN architectures K.V. Katsaros, G. Xylomenos, G.C. Polyzos A.U.E.B. (presented.
1 William Lee Duke University Department of Electrical and Computer Engineering Durham, NC Analysis of a Campus-wide Wireless Network February 13,
September 21, Broadband Wireless Network Applications and Performance Carey Williamson Professor/iCORE Senior Research Fellow Department of Computer.
1 Network Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
1 School of Computing Science Simon Fraser University, Canada Modeling and Caching of P2P Traffic Mohamed Hefeeda Osama Saleh ICNP’06 15 November 2006.
Multi-Variate Analysis of Mobility Models for Network Protocol Performance Evaluation Carey Williamson Nayden Markatchev
On the Self-Similar Nature of Ethernet Traffic - Leland, et. Al Presented by Sumitra Ganesh.
1 CPSC : Project Brainstorming Session Carey Williamson Department of Computer Science University of Calgary.
October 14, 2002MASCOTS Workload Characterization in Web Caching Hierarchies Guangwei Bai Carey Williamson Department of Computer Science University.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Measurement, Modeling, and Analysis of a Peer-to-Peer File sharing Workload Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry.
1 Web Performance Modeling Chapter New Phenomena in the Internet and WWW Self-similarity - a self-similar process looks bursty across several time.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
1 The Content and Access Dynamics of a Busy Web Server: Findings and Implications Venkata N. Padmanabhan Microsoft Research Lili Qiu Cornell University.
1 Internet Protocols and Network Performance Issues Carey Williamson iCORE Professor Department of Computer Science University of Calgary.
1 Simulation Evaluation of a Heterogeneous Web Proxy Caching Hierarchy Mudashiru Busari Carey Williamson University of Saskatchewan University of Calgary.
A Hierarchical Characterization of a Live Streaming Media Workload E. Veloso, V. Almeida W. Meira, A. Bestavros, S. Jin Proceedings of Internet Measurement.
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
Chapter 11 – Virtual Memory Management Outline 11.1 Introduction 11.2Locality 11.3Demand Paging 11.4Anticipatory Paging 11.5Page Replacement 11.6Page Replacement.
A Hierarchical Characterization of a Live Streaming Media Workload IEEE/ACM Trans. Networking, Feb Eveline Veloso, Virg í lio Almeida, Wagner Meira,
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
Copyright © 2005 Department of Computer Science CPSC 641 Winter Network Traffic Measurement A focus of networking research for 20+ years Collect.
A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.
7DS: Node Cooperation in Mostly Disconnected Networks Henning Schulzrinne (joint work with Arezu Moghadan, Maria Papadopouli, Suman Srinivasan and Andy.
1 YouTube Traffic Characterization: A View From the Edge Phillipa Gill¹, Martin Arlitt²¹, Zongpeng Li¹, Anirban Mahanti³ ¹ Dept. of Computer Science, University.
Investigating Forms of Simulating Web Traffic Yixin Hua Eswin Anzueto Computer Science Department Worcester Polytechnic Institute Worcester, MA.
Self-Similarity of Network Traffic Presented by Wei Lu Supervised by Niclas Meier 05/
1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems.
Network Traffic Modeling Punit Shah CSE581 Internet Technologies OGI, OHSU 2002, March 6.
Detecting Node encounters through WiFi By: Karim Keramat Jahromi Supervisor: Prof Adriano Moreira Co-Supervisor: Prof Filipe Meneses Oct 2013.
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
P2P Architecture Case Study: Gnutella Network
Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Characterizing User Access To Videos On The World Wide Web MMCN 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Peter Parnes.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part V Workload Characterization for the Web.
Measuring the Capacity of a Web Server USENIX Sympo. on Internet Tech. and Sys. ‘ Koo-Min Ahn.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.
NTMS 2012 GlobeTraff: a traffic workload generator for the performance evaluation of future Internet architectures K.V. Katsaros, G. Xylomenos, G.C. Polyzos.
We.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas.
An Overview of Proxy Caching Algorithms Haifeng Wang.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Investigating the Prefix-level Characteristics A Case Study in an IPv6 Network Department of Computer Science and Information Engineering, National Cheng.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
#16 Application Measurement Presentation by Bobin John.
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VIII Web Performance Modeling (Book, Chapter 10)
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
Mark E. Crovella and Azer Bestavros Computer Science Dept,
Network Traffic Modeling
Kun-chan Lan National ICT Australia John Heidemann USC/ISI
Presentation transcript:

2005/2/23 HUT T Characterizing Web Workload of Mobile Clients Chuang Yu Juha Raitio

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 2 Outline Web workload analyses What Why How Characteristics of workload Wireline Wireless Case study results Statistical characteristics of Web workload Power laws Self-similarity Examples of workload analyses tools Summary

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 3 What? Content Analysis User behavior analysis User load distribution Session duration Temporal stability Spatial locality System load analysis How do users come to visit the web site? Why do users leave the web site? What contents are users interested in? How do users’ interest vary in time? How do users’ interest vary across different geographic region?

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 4 Why? Characteristics of user load have significant implications on Web site design Content management Protocol design Capacity planning Content provider: Enhance user experience through more effective design and content management Service provider: Efficient resource allocation, capacity planning, and pricing System designer: Shed light on performance bottlenecks and effectiveness of protocols

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 5 How? Gathering requirements, what are the goals of the analysis? Planning and design the data collection What data to collect? Over how long period of time? From where? Web proxies, Web browsers and Web servers What is the scope? How large? How many? What methods to use? What analysis needed? How to analyze data? Collecting data Analysis the traces with statistic and mathematics approaches Execute different analysis Content analysis User behavior analysis System load analysis

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 6 Wireline user workload characterization (1) Content analysis Content type Pure text Graphics-rich multimedia Majority mix of both Content size Size of all contents in a web server Size of content that is transferred by a web server Nonnegligible fraction of files are very large Median transfer size ~2kB, Median content size a few hundred bytes larger Content popularity Highly depends on where traces are collected Content Modification Pattern Large variation in modification pattern, lots of contents never modified, some were modified at least once between two consecutive accesses. Content type dependent, e.g. news web site Most file modifications are small Past modification interval, gives a rough prediction about its future modification time

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 7 Wireline user workload characterization (2) User behavior analysis User Request Arrival and duration Occur at three levels: session, click and request User dependent The number of clicks in a session, the number of embedded images in a web page, think time, and active time can be modeled with Pareto distributions with heavy tails. 8 second rule Temporal locality and stability A page is accessed now, what is the likelihood it will be accessed again in the near future? Stronger temporal locality implies caching would be effective Access ranking stability, stability is high on the scale of days Spatial locality Capture how likely people in the same geographic location or at the same organization request similar set of document Effectiveness of proxy caching Organization and domain membership is significant “hot” event dominant the membership

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 8 Wireline user workload characterization (3) System load analysis Load varies with time and recent event, e.g. World Cup, Sept 11…. Self-similar web traffic

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 9 Wireless user workload characterization WAP traffic Access rate is still low, 80,000 entries in 7 months (99) Amount of data is less than voice Metropolitan wireless network Usage behavior shows diurnal and weekly pattern Users do not move frequently WLAN In campus, session-oriented and chat-oriented, incoming traffic exceeds outgoing traffic; high degree roaming within sessions, sessions are short normally Conference, users are evenly distributed across AP;Web and SSH account 64% traffic; short session, 60% less than 10 min; bandwidth distribution is highly uneven across AP Corporate, different user impose different load;

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 10 Case study ”A popular commercial Web site designed for Mobile clients” Provides Web access for wireline, wireless and offline use Provides notification services Analyses Web access Notifications Comparison between Web access and notications use Comparison between wireline and wireless use Motivation To give an general overview the analyses process and data To show some more concrete results To illustrate possibilities of the analyses To propose direct implications of results

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 11 Case study - architecture

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 12 Case study - material Web access logs for 12 days (August 2000) per user per request Notification logs for 6 days per user per notification Types of Web access

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 13 What content was available for wireless use? Case study – Web content

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 14 How retrieved content varied in size? Replies are small: 98% of replies for wireless are less than 3kB 98% of replies for offline are less than 6kB 80% of bytes are carried in replies of size 10kB or more Implications: systems could be optimized for small replies Case study – Web content size

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 15 Case study – Web content popularity How popularity varied across documents? Heavy tailed distribution 0,1-0,5% of documents returned by 90% of the requests Implications: caching could be very effective

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 16 Case study – Web user load distribution How did individual users contribute to the load? Heavy tailed distribution Small group of users generate majority of the load Implications: different pricing for different user groups needed

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 17 Case study – stability of Web access How did interest vary during weekdays? Interests are relatively stable Of top 100 popular request, 80% remain popular during a week Of top 1000, 70% Implications: performance can be optimized over the stable set

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 18 Case study – locality of Web access Did people in the same region issue similar request? Randomly sampled user groups don’t differ from local users Geographic locality in requests is insignificant Implications: geographic distribution of servers/content does not require localization

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 19 What type of content was available as notifications and how popular it was? Case study – notification popularity

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 20 How notification messages varied in size? Notification are small All messages contain less than 256 bytes Implications: if delivery is not optimized, overhead caused by a network protocols may be considerable Case study – notification size

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 21 Case study – notification popularity How popularity varied across notifications? Heavy tailed distribution Top 1% notifications accounted for 60% of messages Implications: multicasting notifications would yield significant savings

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 22 Case study – notification load distribution How did individual users contribute to the notification load? Heavy tailed distribution Top 5% of clients received 25% of notification messages Top 10% received 40% Implications: different pricing for different user groups needed

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 23 Case study – locality of notifications Did people in the same region receive same notifications? Randomly sampled user groups differ from local users Users in same regions share notification content Implications: regional differences may be utilized in planning of geographic distribution of servers/content

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 24 Correlation bwn browsing and notification Limited correlation between client’s notification and browsing usage People use two services for different purposes, two services deliver different type of contents The result is useful to web design and pricing plan Number of users who have overlap between their top N browsing categories and top N notification categories.

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 25 Workload comparison bwn wireline and mobile web Comparison in content Web content is richer then wireless Content size is smaller in wireless, limited display and bandwidth Wireless content shares the Zipf-like popularity distribution as wireline content Comparison in User behavior Both user dependent Both exhibit temporal stability Wireless user does not exhibit strong spatial locality, limited content Comparison in system load Both exhibit a diurnal and weekly variation Wireless server load is smaller than wireline server Web site for mobile clients has more heterogeneous population of users

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 26 Power laws Measure y depends on another measure x in linear dependence of the a th power of x Power law distributions (a.k.a heavy-tail distributions) include e.g. the Zipfian and Pareto distributions Why? Finding suitable distribution for observed data allows for probabilistic inference on the underlaying phenomenom in closed form

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 27 Power laws and the Web Several distributions derived from the topology of the Internet at router and domain level follow a power law Number of documents per Web site or file system Size of documents per Web site or file system Session durations Links between web pages Example (a = -0.46):

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 28 Self-similarity Self-Similar (a.k.a. fractal) data: Maintains its bursty characteristic even when aggregated over wide range of time scales Slowly decaying variance Long range dependence (not memoryless) Underlaying phonomenom Data generators which are either ON or OFF The distribution of ON and OFF times (or message sizes) are heavy tailed Aggregation of these data leads to self-similarity Internet/WWW traffic is self-similar

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 29 Self-similarity and the Web

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 30 WebTraff: A GUI for Web Proxy Cache Workload Modelling and Analysis An extended and improved version of ProWGen (Proxy Workload Generator), including a GUI interface to a useful set of tools for Web traffic modelling and analysis Purpose: To facilitate the easy generation and analysis of controllable and representative workloads for Web caching simulations The WebTraff toolkit provides three main functions: Web workload trace generation Web workload trace analysis Web proxy cache simulation Graphs displayed in PostScript format

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 31 WebTraff GUI Interface

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 32 Web Workload Generation

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 33 Web Workload Analysis Two main categories of analysis functions: Time series analysis (on the left) Web workload analysis (on the right) Radio buttons, slide bars and text boxes available to control plotting characteristics

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 34 Requests per Interval (time series plot)

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 35 Popularity Distribution plot

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 36 Document Size Distribution (zoomed)

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 37 Web Proxy Cache Simulation Application-level caching simulation parameters Cache size Cache replacement policy Five replacement policies currently available Random replacement (RAND) First-In-First-Out (FIFO) Least-Recently-Used (LRU) (default setting) Least-Frequently-Used (LFU) Greedy-Dual-Size (GDS)

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 38 For More Information about WebTraff WebTraff toolkit: “ProWGen: A Synthetic Workload Generation Tool for the Simulation Evaluation of Web Proxy Caches” Busari/Williamson, Computer Networks, Vol 38, No 6, June Contact information:

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 39 Summary Workload characterization is information that usefull for making better decisions on Web site/application design Content management Protocol design Capacity planning Service pricing etc. Workload characterization can be gained through Gathering requirements for the analyses Planning of data acquisition Statistical analyses of the data Mathematical modeling There are tools for workload characterization Power-law and self-similarity characteristics of load make the Web different from good old telephony world Same models and optimization don’t necessarily apply in these two worlds

Characterizing Web Workload of Mobile Clients HUT T Yu & Raitio 2005/2/23 40 References Adya A, Bahl B, Qiu L. ”Characterizing Web Workload of Mobile Clients” in ”Content Networking in the Mobile Internet”, Ch5. Dixit S, Wu T (eds), 2004 Adya A, Bahl B, Qiu L. ”Characterizing Alert and Browse Services for Mobile Clients”, 2002 Kramer G., ”Self-similar Network Traffic”, 2001 Martin J. Fischer, Thomas B. Fowler. ”Fractals, Heavy-Tails, and the Internet”, 2001 Markatchev N, Williamson C. ” WebTraff: A GUI for Web Proxy Cache Workload Modelling and Analysis”, Department of Computer Science, University of Calgary, 2002