Presentation is loading. Please wait.

Presentation is loading. Please wait.

WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios.

Similar presentations


Presentation on theme: "WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios."— Presentation transcript:

1 WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios Yannis Manolopoulos Aristotle Univ. of Thessaloniki, Greece Presentation: Spyros Papadimitriou, Carnegie Mellon Univ.

2 WebKDD 2001 Aristotle University of Thessaloniki 2 Introduction (1/2) Web Prefetching: Deducing forthcoming user accesses based on log information Focus on: –Predictive prefetching (use of history) –Server initiated (server makes predictions and piggybacks them to the clients)

3 WebKDD 2001 Aristotle University of Thessaloniki 3 Within a site, users navigate following links [5] For server-initiated predictive prefetching interest is for access patterns reflecting this behavior Introduction (2/2)

4 WebKDD 2001 Aristotle University of Thessaloniki 4 Motivation & Related work Proposed method Comparative performance evaluation Conclusions Outline

5 WebKDD 2001 Aristotle University of Thessaloniki 5 Motivation & Related work Proposed method Comparative performance evaluation Conclusions Presentation Outline

6 WebKDD 2001 Aristotle University of Thessaloniki 6 Site structure and contents impose 1.The order of dependencies (first or higher) among the documents 2.The interleaving of documents belonging to patterns with random visits (noise) Discovered patterns should respect these factors Requirements

7 WebKDD 2001 Aristotle University of Thessaloniki 7 Dependency graph (DG) [9] –A graph maintains pairwise accesses Prediction by Partial Match (PPM) [10] –A trie maintains sequences of consecutive accesses LBOT [6] –Special form of association rules of length 2 Others (variations of the above) [3,11] Related work

8 WebKDD 2001 Aristotle University of Thessaloniki 8 Motivation DGNoYes PPM Yes No LBOTNo Order (1 st Req.) Proposed Yes Yes Noise (2 nd Req.)

9 WebKDD 2001 Aristotle University of Thessaloniki 9 Motivation & Related work Proposed method Comparative performance evaluation Conclusions Presentation Outline

10 WebKDD 2001 Aristotle University of Thessaloniki 10 Novel Web log mining algorithm (WM o ) –Apriori-like –Effective Immune to noise Considers high order dependencies –Efficient Significant reduction in the number of candidates Proposed Method (1)

11 WebKDD 2001 Aristotle University of Thessaloniki 11 Session (or transaction): A sequence of requests that occur in a specified time interval from each other [2] Containment relationship addresses the 1 st requirement (avoiding noise) Example: T =  A, X, B, Y, C  X, Y noise S =  A, B, C  the pattern S is contained by T Comment:With contiguous subsequences based only on support S (the pattern) will be missed. Proposed Method (2)

12 WebKDD 2001 Aristotle University of Thessaloniki 12 Candidate generation respects the ordering of accesses in transactions. Example:  A,B    B,A  Dramatic increase in the number of candidates Exploits the site structure for pruning [7,8] Proposed Method (3)

13 WebKDD 2001 Aristotle University of Thessaloniki 13 Algorithm genCandidates(L k, G) //L k the set of large k-paths and G the graph begin foreach L=  l 1, …, l k , L  L k { N + (l k ) = {v|  arc l k  v  G} foreach v  N + (l k ) { //apply modified apriori pruning if v  L and L’ =  l 2, …, l k,v   L k { C=  l 1, …, l k, v  if (  S  C, S  L’  S  L k ) insert C in the candidate-trie } end Proposed Method (4)

14 WebKDD 2001 Aristotle University of Thessaloniki 14 Sequential patterns [1] –Reduction when “customer-sequence” = “user-session” –Suffers from large number of candidates (by not considering the site structure) Path Fragments [4] ( containment relationship is performed with regular expressions and the “*” label ) –Focus on semantics (recommendation systems) Prefetching: patterns are for system and not for human consumption WMo focuses on efficiency/effectiveness rather on expressiveness (semantics) Discussion

15 WebKDD 2001 Aristotle University of Thessaloniki 15 Motivation & Related work Proposed method Comparative performance evaluation Conclusions Presentation Outline

16 WebKDD 2001 Aristotle University of Thessaloniki 16 Synthetic (sample site with 1000 nodes) –Synthetic data generator (see the paper) Modeling site nodes, site linkage, size of documents Real data sets (see the paper) Examine the impact of: –noise –order –client cache (see the paper) –efficiency Methodology

17 WebKDD 2001 Aristotle University of Thessaloniki 17 Accuracy w.r.t. noise

18 WebKDD 2001 Aristotle University of Thessaloniki 18 Usefulness w.r.t. noise

19 WebKDD 2001 Aristotle University of Thessaloniki 19 Traffic w.r.t. noise

20 WebKDD 2001 Aristotle University of Thessaloniki 20 Accuracy w.r.t. order

21 WebKDD 2001 Aristotle University of Thessaloniki 21 Usefulness w.r.t. order

22 WebKDD 2001 Aristotle University of Thessaloniki 22 Traffic w.r.t. order

23 WebKDD 2001 Aristotle University of Thessaloniki 23 Efficiency (see also [7,8])

24 WebKDD 2001 Aristotle University of Thessaloniki 24 Motivation & Related work Proposed method Comparative performance evaluation Conclusions Presentation Outline

25 WebKDD 2001 Aristotle University of Thessaloniki 25 Factors that influence Web Prefetching –Noise –Order A new algorithm WM o was presented based on data mining Compares favorably with previously proposed algorithms WM o is an effective and efficient Web prefetching algorithm Conclusions

26 WebKDD 2001 Aristotle University of Thessaloniki 26 1.R.Agrawal, Ramakrishnan Srikant, Mining Sequential Patterns, ICDE 1995. 2.R.Cooley, B. Mobasher, J.Srivastava, Data Preparation for Mining World Wide Web Browsing Patterns, KAIS, 1(1), pp. 5-32, 1999. 3.M. Deshpande, G. Karypis, Selective Markov Models for Predicting Web-page Accesses, SIAM Data Mining, 2001. 4.W.Gaul, L.T.Schimdt-Thieme, Mining Web Navigation Path Fragments, WebKDD 2000. 5.B. A. Huberman, P. Pirolli, J. Pitkow and R. J. Lukose, Strong Regularities in World Wide Web Surfing. Science, 280, pp. 95-97, 1998. 6.B.Lan, S.Bressan, B.C. Ooi, Y.Tay, Making Web Servers Pushier, WebKDD 1999. 7.A. Nanopoulos, Y. Manolopoulos, Finding Generalized Path Patterns for Web Log Data Mining, ADBIS-DASFAA 2000. 8.A. Nanopoulos, Y. Manolopoulos, Mining patterns from graph traversals, DKE 37(3), pp.243-266, 2001. 9.V.Padmanabhan, J. Mogul, Using Predictive Prefetching to Improve World Wide Web Latency, ACM SIGCOMM Computer Communications Review, 26(3), 1996. 10.T.Palapans, A.Mendelzon, Web Prefetching Using Partial Match Prediction, WCW 1999. 11.J. Pitkow, P. Pirroli, Mining Longest Repeating Subsequences to Predict World Wide Web Surfing, USITS, 1999. 12.L.T.Schimdt-Thieme, W.Gaul, Recommender Systems Based on Navigation Path Features, WebKDD 2001. References


Download ppt "WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios."

Similar presentations


Ads by Google