By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

Introduction to Computer Science 2 Lecture 7: Extended binary trees
Chapter 4: Trees Part II - AVL Tree
Optimization of Data Caching and Streaming Media Kristin Martin November 24, 2008.
Data Mining Association Analysis: Basic Concepts and Algorithms
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 Prefetching the Means for Document Transfer: A New Approach for Reducing Web Latency 1. Introduction 2. Data Analysis 3. Pre-transfer Solutions 4. Performance.
WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Mining Longest Repeating Subsequences to Predict World Wide Web Surfing Jatin Patel Electrical and Computer Engineering Wayne State University, Detroit,
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
Prefix Caching assisted Periodic Broadcast for Streaming Popular Videos Yang Guo, Subhabrata Sen, and Don Towsley.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.
1 CS 430: Information Discovery Lecture 4 Data Structures for Information Retrieval.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
LPT for Data Aggregation in Wireless Sensor networks Marc Lee and Vincent W.S Wong Department of Electrical and Computer Engineering, University of British.
Web Caching Schemes For The Internet – cont. By Jia Wang.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
Assignment 4. (Due on Dec 2. 2:30 p.m.) This time, Prof. Yao and I can explain the questions, but we will NOT tell you how to solve the problems. Question.
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Hybrid Prefetching for WWW Proxy Servers Yui-Wen Horng, Wen-Jou Lin, Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic.
Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
IP Address Lookup Masoud Sabaei Assistant professor
: Chapter 12: Image Compression 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Network Aware Resource Allocation in Distributed Clouds.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Web Prefetching Between Low-Bandwidth Clients and Proxies : Potential and Performance Li Fan, Pei Cao and Wei Lin Quinn Jacobson (University of Wisconsin-Madsion)
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
ECO-DNS: Expected Consistency Optimization for DNS Chen Stephanos Matsumoto Adrian Perrig © 2013 Stephanos Matsumoto1.
Fractal Prefetching B + -Trees: Optimizing Both Cache and Disk Performance Author: Shimin Chen, Phillip B. Gibbons, Todd C. Mowry, Gary Valentin Members:
Prof. Amr Goneid, AUC1 Analysis & Design of Algorithms (CSCE 321) Prof. Amr Goneid Department of Computer Science, AUC Part 8. Greedy Algorithms.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Efficient Peer-to-Peer Keyword Searching 1 Efficient Peer-to-Peer Keyword Searching Patrick Reynolds and Amin Vahdat presented by Volker Kudelko.
Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
Slides for “Data Mining” by I. H. Witten and E. Frank.
A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Probabilistic Suffix Trees Maria Cutumisu CMPUT 606 October 13, 2004.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
Using Multiple Predictors to Improve the Accuracy of File Access Predictions Gary A. S. Whittle, U of Houston Jehan-François Pâris, U of Houston Ahmed.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
Effective Prediction of Web-user Accesses: A Data Mining Approach
Storage Access Paging Buffer Replacement Page Replacement
Multiway Search Trees Data may not fit into main memory
The Impact of Replacement Granularity on Video Caching
B+ Tree.
SCOPE: Scalable Consistency in Structured P2P Systems
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
Group Based Management of Distributed File Caches
(edited by Nadia Al-Ghreimil)
Effective Prediction of Web-user Accesses: A Data Mining Approach
CENG 351 Data Management and File Structures
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching

What is Web Latency? What is Web Caching? How does Web Caching help in reducing Web Latency? What is Web Prefetching? How does Web Prefetching help in reducing Web Latency? Does Web Prefetching really decrease Web Latency!!!!

Combining Caching and Prefetching. Performance Improvement. Why Prediction Models? What are Prediction Models? How aggressive Prefetching is? How aggressive Prefetching can be?

PPM (Prediction by Partial Match) Model Slight variations to this model.. Model proposed by Xin Chen and Xiaodong Zhang. POPULARITY BASED PREDICTION MODEL

Log files

Access Session: ENTER EXIT URL

3 Major Regularities: Regularity 1: Majority Clients start their access session from popular URLs of a server. However, majority of URLs in a server are not popular files. Regularity 2: Majority Long access sessions are headed by popular URLs. Regularity 3: The accessing paths in majority access sessions start from popular URLs, move to less popular URLs, and exit from the least URLs. The accessing paths in minority access sessions start from less popular URLs, and remain in the same type of URLs, and exit from the least popular URLs.

Popularity of URLs How to determine popularity of URLs? How do we grade the URLs? How to determine Relative Popularity? Grade 3 : 10<RP≤100% Grade 2 : 1<RP≤10% Grade 1 : 0.1%<RP≤1% Grade 0 : RP≤0.1%

Distribution of Popularity Grades To examine relationship between URL popularity and access session Divided each trace into 4 session groups Regularity 1 is observed Observations Paying special attention to popular URLs which are only a small % Is this advantageous???? Paying small attention to less popular URLs which can be large What about this???

Popularity and session length Day 79 traces 86% of access sessions started from popular URLs, moved to less popular URLs and exited from the least popular URLs Regularity 2 is observed The average popularity grade decreases as the session length increases. Observations Clients starting with less popular URLs tend to surf among URLs with the same popularity.

3 Prediction Models 1. Standard model 2. LRS model (longest repeating sequence) 3. Popularity-based model (All models are evaluated here according to the 92 day evaluation period) (All models use the Markov Tree representation)

Standard Model Node 0 represents the root of the forest When Client access URL the model builds a new tree with root A The Counter is set to 1 The counter is incremented every time that URL is accessed in the session The process continues till we complete all the sessions Every path from root node to leaf node represents the URL session for at least one client

0 A/2B/2C/2A’/2 B’/2 C’/2 B/2 C/2 A’/1 B’/1 C’/1 C/2 A’/1 B’/1 C’/1 A’/1 B’/1 C’/1 B’/2 C’/2 The Three Access Sequences are: {ABCA’B’C’} {ABC} {A’B’C’} STANDARD PPM

Advantages and Disadvantages: Easy to build (not complex) Prediction accuracy improves More Space required ( increases with increase in prediction order) (determined by Entropy analysis and emperical studies) Attempts for Space Optimization: Tree no longer resembles the regular surfing patterns Prediction accuracy low (short tree) Small height increase rapidly increases storage requirements.

LRS Model LRS Model keeps the longest repeating subsequences stores only long branches with frequently accessed URLs The server builds the tree the same way as in standard PPM Scans each branch for non-repeating sequence Identifies and eliminates the non-repeating sequence The stored longest sequence is the frequently repeating sequence (at least one occurrence of one subsequence belongs to an independent access sessions)

0 A/2B/2C/2A’/2B’/2C’/2 B/2 C/2 A’/1 B’/1 C’/1 C/2 A’/1 B’/1 C’/1 A’/1 B’/1 C’/1 B’/2C’/2 The Three Access Sequences are: {ABCA’B’C’} {ABC} {A’B’C’}

Advantages and Disadvantages: LRS PPM model offers a lower storage requirements and higher prediction accuracy It has low hit rates ( because tree keeps only a small number of frequently accessed branches (popular) it ignores prefetching for less frequently accessed URLs (unpopular) so overall prefetching rate can be low) The Process is expensive ( To find the longest matching, the server must have all all previous URLs of current session, thus the server must maintain sessions and update them)

Popularity Based Prediction Model It uses only the most popular URLs as root nodes Each URL in a sequence is added only once to the tree unless the its Popularity grade is higher than the root node Maximum tree height is based on Available memory space Access session lengths Space Optimization is done to the completed tree based on: Relative access probability Absolute Number of accesses (RAP=Number of accesses to the URL/Number of accesses to the parent URL)

0 A/2 B/2 C/2 A’/1 A’/2 B’/2 C’/2 The Three Access Sequences are: {ABCA’B’C’} {ABC} {A’B’C’}

Advantages and Disadvantages: Space Optimization (since less number of nodes) High Prediction Accuracy (since it includes access information) For higher Thresholds --- HIT Ratio decreases (since unpopular files domination increases)

OBSERVATIONS The Standard PPM model without limiting branch height. The LRS PPM model keeping the longest repeating subsequence. Popularity-based PPM model with space optimization. 1) In Standard PPM model without limiting height of each branch, Prediction accuracy is increased 2) In LRS PPM model keeping longest repeating sequence i.e removing independent access sessions, Space is saved 3) In Popularity-based PPM model space optimization considering relative access probability, Preserves Prediction accuracy

Integrating prediction model with prefetching and caching Cache memory is divided into 2 parts. Prefect buffer Cache memory Prefetching manager Cache manager PREDICTION ENGINE Constructs and updates prediction model (based on requests issued) Offers prediction independently to each client.

Integrated Web Caching and Prefetching Model

PREDICTION ALGORITHM current_context [0] : root node of T; for length j=1 to m current_context [j]:=NULL; for every event R in S for length j= 0 to m { if current_context[j] has child node C representing event R { node C occurrence_count:=occurrence_count +1 ; current_context[j+1]:= node C; } else { construct child node C representing event R; node C occurence_count:=1; current_context[j+1]:=node C; } current_context[0]:= root node of T; }

PREFETCHING ALGORITHM LET S be the set of all objects currently in the prefetch buffer; LET P=Ø; //P is set of objects to be prefetched LET TotalSize = 0; // the total size of all objects in P LET j = 0; WHILE (j ≤ n) AND (TotalSize < SIZEOF (prefetch buffer)) IF (O(j) not in cache) AND (O(j) not in prefetch buffer) THEN Put O(j) into P ; LET TotalSize = TotalSize+SIZEOF(O(j)); j=j+1; END IF END WHILE LET M=S.P;

Simulation Parameters 1. Order of Prediction 2. Confidence 3. Previous requests 4. Number of predictions 5. Browsing session idle time 6. Client cache size 7. Client cache idle time

Performance Metrics 1. Usefulness of Predictions ( Hit ratio ) 2. Accuracy of Predictions 3. Network traffic 4. Space Optimization ( Model aims at maximizing first two metrics and minimizing last two metrics) Maximum size of prefetched files effect both hit ratios and network traffic. Large values »» more traffic »» high hit ratio

Hit Ratio Ratio between no. of requests that hit the browser or cache and the total no. of requests. Latency Reduction Average access latency time reduction per request. Space Required memory allocation measured by the no. of nodes for building a PPM model in the web server for prefetching. Traffic Increment Ratio between the total no. of transferred bytes and the total no. of useful bytes for the clients minus 1.

Hit ratio vs threshold

Traffic Increment Vs Threshold

Number of Nodes Vs Number of Clients

CONCLUSIONS Effective web management approach. Makes searching and prefetching highly objective and highly efficient. Web prefetching can have both high prediction accuracy and a low space requirement.

FUTURE WORK To make the model more flexible. To find more elaborate ways of making predictions. Filtering out the effect of backward references. Extending prediction engine to accommodate more predictions.