Presentation is loading. Please wait.

Presentation is loading. Please wait.

Effective Prediction of Web-user Accesses: A Data Mining Approach

Similar presentations


Presentation on theme: "Effective Prediction of Web-user Accesses: A Data Mining Approach"— Presentation transcript:

1 Effective Prediction of Web-user Accesses: A Data Mining Approach
Nanopoulos Alexandros Katsaros Dimitrios Yannis Manolopoulos Aristotle Univ. of Thessaloniki, Greece Presentation: Spyros Papadimitriou, Carnegie Mellon Univ.

2 Introduction (1/2) Web Prefetching: Deducing forthcoming user accesses based on log information Focus on: Predictive prefetching (use of history) Server initiated (server makes predictions and piggybacks them to the clients)

3 Introduction (2/2) Within a site, users navigate following links [5]
For server-initiated predictive prefetching interest is for access patterns reflecting this behavior

4 Outline Motivation & Related work Proposed method
Comparative performance evaluation Conclusions

5 Presentation Outline Motivation & Related work Proposed method
Comparative performance evaluation Conclusions

6 Requirements Site structure and contents impose
The order of dependencies (first or higher) among the documents The interleaving of documents belonging to patterns with random visits (noise) Discovered patterns should respect these factors

7 Related work Dependency graph (DG) [9]
A graph maintains pairwise accesses Prediction by Partial Match (PPM) [10] A trie maintains sequences of consecutive accesses LBOT [6] Special form of association rules of length 2 Others (variations of the above) [3,11]

8 Motivation Order (1st Req.) Noise (2nd Req.) DG No Yes PPM LBOT
Proposed Yes Yes

9 Presentation Outline Motivation & Related work Proposed method
Comparative performance evaluation Conclusions

10 Proposed Method (1) Novel Web log mining algorithm (WMo) Apriori-like
Effective Immune to noise Considers high order dependencies Efficient Significant reduction in the number of candidates

11 Proposed Method (2) Session (or transaction): A sequence of requests that occur in a specified time interval from each other [2] Containment relationship addresses the 1st requirement (avoiding noise) Example: T = A, X, B, Y, C X, Y noise S = A, B, C the pattern S is contained by T Comment:With contiguous subsequences based only on support S (the pattern) will be missed.

12 Proposed Method (3) Candidate generation respects the ordering of accesses in transactions. Example: A,B  B,A Dramatic increase in the number of candidates Exploits the site structure for pruning [7,8]

13 Proposed Method (4) Algorithm genCandidates(Lk, G)
//Lk the set of large k-paths and G the graph begin foreach L=l1, …, lk, L  Lk { N+(lk) = {v|  arc lk v  G} foreach v  N+(lk) { //apply modified apriori pruning if v  L and L’ = l2, …, lk,v  Lk { C= l1, …, lk , v if ( S  C, S  L’  S  Lk ) insert C in the candidate-trie } end

14 Discussion Sequential patterns [1]
Reduction when “customer-sequence” = “user-session” Suffers from large number of candidates (by not considering the site structure) Path Fragments [4] (containment relationship is performed with regular expressions and the “*” label ) Focus on semantics (recommendation systems) Prefetching: patterns are for system and not for human consumption WMo focuses on efficiency/effectiveness rather on expressiveness (semantics)

15 Presentation Outline Motivation & Related work Proposed method
Comparative performance evaluation Conclusions

16 Methodology Synthetic (sample site with 1000 nodes)
Synthetic data generator (see the paper) Modeling site nodes, site linkage, size of documents Real data sets (see the paper) Examine the impact of: noise order client cache (see the paper) efficiency

17 Accuracy w.r.t. noise

18 Usefulness w.r.t. noise

19 Traffic w.r.t. noise

20 Accuracy w.r.t. order

21 Usefulness w.r.t. order

22 Traffic w.r.t. order

23 Efficiency (see also [7,8])

24 Presentation Outline Motivation & Related work Proposed method
Comparative performance evaluation Conclusions

25 Conclusions Factors that influence Web Prefetching
Noise Order A new algorithm WMo was presented based on data mining Compares favorably with previously proposed algorithms WMo is an effective and efficient Web prefetching algorithm

26 References R.Agrawal, Ramakrishnan Srikant, Mining Sequential Patterns, ICDE 1995. R.Cooley, B. Mobasher, J.Srivastava, Data Preparation for Mining World Wide Web Browsing Patterns, KAIS, 1(1), pp. 5-32, 1999. M. Deshpande, G. Karypis, Selective Markov Models for Predicting Web-page Accesses, SIAM Data Mining, 2001. W.Gaul, L.T.Schimdt-Thieme, Mining Web Navigation Path Fragments, WebKDD 2000. B. A. Huberman, P. Pirolli, J. Pitkow and R. J. Lukose, Strong Regularities in World Wide Web Surfing. Science, 280, pp , 1998. B.Lan, S.Bressan, B.C. Ooi, Y.Tay, Making Web Servers Pushier, WebKDD 1999. A. Nanopoulos, Y. Manolopoulos, Finding Generalized Path Patterns for Web Log Data Mining, ADBIS-DASFAA 2000. A. Nanopoulos, Y. Manolopoulos, Mining patterns from graph traversals, DKE 37(3), pp , 2001. V.Padmanabhan, J. Mogul, Using Predictive Prefetching to Improve World Wide Web Latency, ACM SIGCOMM Computer Communications Review, 26(3), 1996. T.Palapans, A.Mendelzon, Web Prefetching Using Partial Match Prediction, WCW 1999. J. Pitkow, P. Pirroli, Mining Longest Repeating Subsequences to Predict World Wide Web Surfing, USITS, 1999. L.T.Schimdt-Thieme, W.Gaul, Recommender Systems Based on Navigation Path Features, WebKDD 2001.


Download ppt "Effective Prediction of Web-user Accesses: A Data Mining Approach"

Similar presentations


Ads by Google