Presentation on theme: "1 Analyzing Browse Patterns of Mobile Clients Lili Qiu Joint work with Atul Adya and Victor Bahl Microsoft Research ACM."— Presentation transcript:
1 Analyzing Browse Patterns of Mobile Clients Lili Qiu Joint work with Atul Adya and Victor Bahl Microsoft Research ACM SIGCOMM Measurement Workshop San Francisco, CA, November 2001
2 Outline Overview Related work Analysis of a popular mobile Web site Document popularity analysis User behavior analysis System load analysis Content analysis Summary and implications
3 Motivation Phenomenal growth in cellular industry and handheld device Crucial to understand the performance of wireless Web Limited understanding of how wireless Web services are being used
4 Related Work Workload of clients at wireline networks Server-based studies [ABC+96], [AW96], [MS97], [AJ99],[PQ00] Proxy-based studies [BCF+99], [DMF97], [GB97], [VDA+99], [WVS+99] Client-based studies [CBC95] and [BBB+98] Workload of wireless clients [KBZ+2000] Only 80K requests over seven months
5 Overview A popular mobile Web site Content news, weather, stock quotes, , yellow pages, travel reservations, entertainment etc. Period studied August 15, 2000 – August 26, million accesses in 12 days Type of analyses This paper is a part of larger analysis study Analysis of browse pattern Analysis of notification logs Correlation between how browsing and notification services are being used
6 Overview: Types of Analysis Document popularity analysis User behavior analysis System load analysis Content analysis
7 Overview: User Categories Cellular users Browse the Web in real time on cellular phones Offline users Download content onto their PDAs for later (offline) browsing, e.g. AvantGo Desktop users Signup services and specify preferences Many more users now User Type# Users# Requests Cellular58,4322,210,758 Offline50,96820,508,272 Desktop639,9717,342,206 Misc.16342,944,708
8 Document Popularity Previous Web research have found Web accesses follow Zipf-like distribution ( i.e. request frequency 1/i ) Two definitions of document URL (i.e. query)
9 Document Popularity (Cont.) Document Popularity does not closely follow Zipf-like distribution.
10 Document Popularity (Cont.) Majority of the requests are concentrated on a small number of documents 0.1% - 0.5% URL and parameter combinations (i.e. 112 – 442) account for 90% requests Very small amount of memory needed to cache popular query results.
11 User Behavior Analysis Understand how long a wireless user stays on the channel as he/she browses the Web Determine user sessions Intuition: a session is idle for a sufficiently long time, we say it has ended. Heuristic to determine a session inactivity period
12 User Behavior Analysis (Cont.) Determine the session inactivity period (s) Too small s => too many sessions Too large s => too few sessions An appropriate value is at the knee point The knee point is between 30 to 45 seconds 95% users Have session time less than 3 minutes Initiated less than 35 sessions during the 12 days We can reclaim IP addresses more quickly than 90 seconds used previously in [KBZ+2000].
13 System Load Analysis Understand how to optimize Web server for better performance Small replies 98% to wireless users < 3 KB 99% to offline users < 6.3 KB Diurnal pattern and weekday vs. weekend variation Over 60% browsing requests are from offline PDA users, and less than 7% are from wireless users. 1) Highly optimize sending small replies. 2) Identify what type of user issued the request, and prioritize the request according to the user type.
14 Content Analysis Rank #1Rank #2Rank #3 WirelessStock quotesNewsYellow pages OfflineHelpNewsStock quotes DesktopSign-ups Sports Top three preferences for different kinds of users Important to content providers: what content is interesting to users
15 Summary of Results and Implications FactsImplications 0.1% - 0.5% queries (i.e ) account for 90% requests. Caching the results of popular queries can be very effective. A large fraction of requests come from automated sync programs. System designers should prioritize requests according to user type.
16 Summary of Results and Implications FactsImplications Most of the replies are short (< 3KB for wireless users, and < 6KB for offline users). Wireless Web servers should highly optimize sending short replies. The session inactivity period is between 30 to 45 seconds. We may reclaim IP addresses more quickly than 90 seconds used previously.