1 Murat Ali Bayır Middle East Technical University Department of Computer Engineering Ankara, Turkey A New Reactive Method for Processing Web Usage Data.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEBSITE DONE BY: AYESHA NUSRATH 07L51A0517 FIRDOUSE AFREEN 07L51A0522.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
شهره کاظمی 1 آزمايشکاه سيستم های هوشمند ( گزارش پيشرفت کار پروژه مدل مارکف.
Data e Web Mining Paolo Gobbo
Nov, 2002Banerjee and Ghosh1 Characterizing Visitors to a Website Across Multiple Sessions NGDM Workshop, Nov 2002 Arindam Banerjee Joydeep Ghosh.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Chapter 12: Web Usage Mining - An introduction
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Aki Hecht Seminar in Databases (236826) January 2009
WebKDD 2001 Aristotle University of Thessaloniki 1 Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios.
Web Mining Research: A Survey
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
Discovery of Aggregate Usage Profiles for Web Personalization
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Web Usage Patterns Ryan McFadden IST 497E December 5, 2002.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
Web Analytics Basic 6-Step Process Based on content from: /od/loganalysis/a/web_analy tics.htm.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Data Mining By Dave Maung.
Web Personalization Based on Static Information and Dynamic User Behavior Center for E-Business Technology Seoul National University Seoul, Korea Nam,
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
For: CS590 Intelligent Systems Related Subject Areas: Artificial Intelligence, Graphs, Epistemology, Knowledge Management and Information Filtering Application.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Fuzzy Set Approach for Improving Web Log Mining Sajitha Naduvil-Vadukootu Csc 8810 : Computational Intelligence Instructor: Dr. Yanqing Zhang Dec 4, 2006.
Data mining in web applications
Smart Miner: A New Framework for Mining Large Scale Web Usage Data
Effective Prediction of Web-user Accesses: A Data Mining Approach
Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms By Monika Henzinger Presented.
Data Mining Jim King.
Discovering User Access Patterns on the World-Wide Web
Lin Lu, Margaret Dunham, and Yu Meng
SpeedTracer: A Web usage mining and analysis tool
Web Mining Department of Computer Science and Engg.
Effective Prediction of Web-user Accesses: A Data Mining Approach
AGMLAB Information Technologies
Discovery of Significant Usage Patterns from Clickstream Data
Web Mining Research: A Survey
Presentation transcript:

1 Murat Ali Bayır Middle East Technical University Department of Computer Engineering Ankara, Turkey A New Reactive Method for Processing Web Usage Data

Murat Ali Bayir, June 062 Web Mining Previous Session Reconstruction Heuristics Smart-SRA Agent Simulator Experimental Results Conclusion OUTLINE

Murat Ali Bayir, June 063 Data & Web Mining Data Mining: Discovery of useful and interesting patterns from a large dataset. Web mining: the application of data mining techniques to discover and retrieve useful information and patterns from the World Wide Web documents and services. : Dimensions: –Web content mining –Web structure mining –Web usage mining

Murat Ali Bayir, June 064 IP AddressRequest TimeMethodURLProtocolSuccess of Return Code Number of Bytes Transmitted [25/Apr/2005:03:04:41–05]GETA.htmlHTTP/ [25/Apr/2005:03:04:43–05]GETB.htmlHTTP/ [25/Apr/2005:03:04:48–05]GETC.htmlHTTP/ Web Usage Mining (WUM) Application of data mining techniques to web log data in order to discover user access patterns. Example User Web Access Log Web Mining It is possible to capture necessary information for WUM.

Murat Ali Bayir, June 065 Phases of Web Usage Mining 1. 1.Data Processing – – Includes reconstruction of user sessions by using heuristics techniques. (Most important phase) since it directly affects quality of extracted frequent patterns at final step significantly Pattern Discovery – – Includes Discovering useful patterns from reconstructed sessions obtained in the first phase. We have related work about Pattern Discovery phase [Bayir 06-1]. Web Mining

Murat Ali Bayir, June 066 Web Mining Previous Session Reconstruction Heuristics Smart-SRA Agent Simulator Experimental Results Conclusion OUTLINE

Murat Ali Bayir, June 067 Session Reconstruction Includes selecting and grouping requests belonging to the same user by using heuristics techniques. Types: – –Reactive strategies process requests after they are handled by the web server, they process web server logs to obtain session. The proposed approach is this thesis is reactive. – –Proactive strategies process requests during the interactive browsing of the web site by the user. Session data is gathered during interaction of web user. applied on dynamic server pages. Previous Session Reconstruction Heuristics

Murat Ali Bayir, June 068 Session Reconstruction Proactive Strategies need to change internal structure of web site. To illustrate, change in source code of each dynamic web pages. Reactive strategies need no change, used for web analytics purposes, customers give web logs of their web site and analyzed them by using this methods. Reactive methods are applicable for all web sites satisfying same log format. Previous Reactive Heuristics

Murat Ali Bayir, June 069 Time-oriented heuristics [Spiliopoulou 98, Cooley 99-1] Navigation-oriented heuristic Navigation-oriented heuristic [Cooley 99-1, Cooley 99-2] Smart-SRA [Bayir 06-2] is new approach proposed in this thesis. It combines these heuristics with web topology information in order to increase the accuracy of the reconstructed sessions. Previous Reactive Heuristics Two types of reactive heuristics defined before

Murat Ali Bayir, June 0610 Example Web Topology Graph used for Applying heuristics Example Web Page Request Sequence Page P1P1 P 20 P 13 P 49 P 34 P 23 Timestamp Previous Reactive Heuristics The topology of web site can be represented by directed web graph. The topology information can be extracted by using crawling module of Search engine APIs.

Murat Ali Bayir, June 0611 Time-oriented heuristics -1 Time threshold (  1 = 30 mins): [P 1, P 20, P 13, P 49 ] (t(P 1 ) - t(P 49 ) = 29 < 30) [P 34, P 23 ] (t(P 34 ) - t(P 23 ) = 15 < 30) Page P1P1 P 20 P 13 P 49 P 34 P 23 Timestamp Previous Session Reconstruction Heuristics Two types of time oriented Heuristics defined. total duration of a discovered session is limited with a threshold  1 Example:

Murat Ali Bayir, June 0612 Time-oriented Heuristics -2 Time threshold (  2 = 10 mins): [P 1, P 20, P 13 ] [P 49, P 34 ] [P 23 ] Page P1P1 P 20 P 13 P 49 P 34 P 23 Timestamp Previous Session Reconstruction Heuristics The time spent on any page is limited with a threshold  2. That means t(P n+1 ) - t(P n ) <  2 Example:

Murat Ali Bayir, June 0613 Navigation-Oriented Heuristic In Navigation Oriented Heuristics, when processing user request sequence, There are two cases for Adding new page WP N+1 to a session [WP 1, WP 2, …, WP N ] If WP N has a hyperlink to WP N+1 [WP 1, WP 2, …, WP N, WP N+1 ] If WP N does not have a hyperlink to WP N+1 Assume that WP Kmax is the nearest page having a hyperlink to WP N+1 add backward browser moves [WP 1, WP 2,…, WP N, WP N-1, WP N-2,..., WP Kmax, WP N+1 ] Previous Session Reconstruction Heuristics

Murat Ali Bayir, June 0614 Navigation-Oriented Heuristic Curent SessionConditionNew Page [ ]P1P1 [P 1 ]Link[P 1, P 20 ] = 1P 20 [P 1, P 20 ]Link[P 20, P 13 ] = 0 Link[P 1, P 13 ] = 1 P 13 [P 1, P 20, P 1, P 13 ]Link[P 13, P 49 ] = 1P 49 [P 1, P 20, P 1, P 13, P 49 ]Link[P 49, P 34 ] = 0 Link[P 13, P 34 ] = 1 P 34 [P 1, P 20, P 1, P 13, P 49, P 13, P 34 ]Link[P 34, P 23 ] =1P 23 [P 1, P 20, P 1, P 13, P 49, P 13, P 34, P 23 ] Previous Session Reconstruction Heuristics Example: User request sequence

Murat Ali Bayir, June 0615 Web Mining Previous Session Reconstruction Heuristics Smart-SRA Agent Simulator Experimental Results Conclusion OUTLINE

Murat Ali Bayir, June 0616 Smart-SRA Phase 1: Shorter request sequences are constructed by using overall session duration time and page-stay time criteria Phase 2: Candidate sessions are partitioned into maximal sub-sessions such that: – –between each consecutive page pair in a session there is a hyperlink from the previous page to the next page Topology Rule:  i:1  i<n, there is a hyperlink from P i to P i+1 Time Rules: – –o i: 1  i<n, Timestam(P i ) < Timestamp(P i+1 ) – –o i: 1  i<n Timestamp(P i+1 ) - Timestamp(P i )   (page stay time) – –o Timestamp(P n ) - Timestamp(P 1 )  δ (session duration time).

Murat Ali Bayir, June 0617 Smart-SRA Phase2 of Smart-SRA process a candidate session from left to right by repeating the following steps until the candidate session is empty: 1. 1.Determine the web pages without any referrer (on its left) and remove them from the candidate session 2. 2.For each one of these pages For each previously constructed session – –If there is a hyperlink from the last page of the session to the web page and page stay time constraint is satisfied then append the web page to the session 3. 3.Remove non-maximal sessions

Murat Ali Bayir, June 0618 Example Candidate Session Page P1P1 P 20 P 13 P 49 P 34 P 23 Timestamp Smart-SRA Example Web Topology Used of Applying Smart-SRA

Murat Ali Bayir, June 0619 Smart-SRA Iteration1 (non referers in the set)2 Candidate Session[P 1, P 20, P 13, P 49, P 34, P 23 ][P 20, P 13, P 49, P 34, P 23 ] New Session Set (before) [P 1 ] Temp Page Set{P 1 }{P 20, P 13 } Temp Session Set [P 1 ][P 1,P 20 ] [P 1,P 13 ] New Session Set (after) [P 1 ][P 1,P 20 ] [P 1,P 13 ] Iteration34 Candidate Session[P 49, P 34, P 23 ][P 23 ] New Session Set (before) [P 1,P 20 ] [P 1,P 13 ] [P 1,P 13,P 34 ] [P 1, P 13, P 49 ] [P 1, P 20 ] Temp Page Set{P 49, P 34 }{P 23 } Temp Session Set[P 1,P 13,P 34 ] [P 1, P 13, P 49 ] [P 1, P 13, P 34, P 23 ] [P 1, P 13, P 49, P 23 ], [P 1, P 20, P 23 ] New Session Set (after) [P 1,P 13,P 34 ], [P 1, P 13, P 49 ] [P 1, P 20 ] [P 1, P 13, P 34, P 23 ], [P 1, P 13, P 49, P 23 ] [P 1, P 20, P 23 ]

Murat Ali Bayir, June 0620 Web Mining Previous Session Reconstruction Heuristics Smart-SRA Agent Simulator Experimental Results Conclusion OUTLINE

Murat Ali Bayir, June 0621 Agent Simulator Models the behavior of web users and generates web user navigation and the log data kept by the web server Used to Used to compare the performances of alternative session reconstruction heuristics

Murat Ali Bayir, June 0622 Agent Simulator A Web user can start session with any one of the possible entry pages of a web site. A Web user can select the next page having a link from the most recently accessed page. A Web user can press the back button one more time and thus selects as the next page a page having a link from any one of the previously browsed pages (i.e., pages accessed before the most recently accessed one). A Web user can terminate his/her session. Provides 4 basic behaviors of Web User.

Murat Ali Bayir, June 0623 Web user can start a new session with any one of the possible entry pages of the web site P 13 P 1 P 20 P 23 P 34 1 S1 P 49 2 S2 Agent Simulator Behavior I

Murat Ali Bayir, June 0624 P 13 P 1 P 49 P 20 P 23 P Web user can select a new page having a link from the most recently accessed page. Agent Simulator Behavior II

Murat Ali Bayir, June 0625 P 13 P 1 P 49 P 20 P 23 P Web user can select as the next page having a link from any one of the previously browsed pages. Agent Simulator Behavior III

Murat Ali Bayir, June 0626 P 13 P 1 P 49 P 20 P 23 P Web user can terminate the session. Agent Simulator Behavior IV Example session is terminated in P 23.

Murat Ali Bayir, June Parameters for simulating behavior of web user Session Termination Probability (STP) Link from Previous pages Probability (LPP) New Initial page Probability (NIP) Agent Simulator

Murat Ali Bayir, June 0628 Web Mining Previous Session Reconstruction Heuristics Smart-SRA Agent Simulator Experimental Results Conclusion OUTLINE

Murat Ali Bayir, June 0629 Heuristics Tested Time oriented heuristic (heur1) (total time  30 min) Time oriented heuristic (heur2) (page stay  10 min) Navigation oriented heuristic (heur3) Smart-SRA heuristic (heur4) Experimental Results

Murat Ali Bayir, June 0630 Accuracy is determined as: Reconstructed session H captures a real session R if R occurs as a subsequence of H (R  H) String-matching relation needed R = [P1, P3, P5] H = [P9, P1, P3, P5, P8] =>R  H Yes H = [P1, P9, P3, P5, P8] =>R  H No Experimental Results

Murat Ali Bayir, June 0631 Parameters for generating user sessions and web topology Number of web pages (nodes) in topology300 Average number of outdegree15 Average number of page stay time2,2 min Deviation for page stay time0,5 min Number of agents10000 STP : Fixed & Range5% 1%-20% LPP : Fixed & Range30% 0%-90% NIP : Fixed & Range30% 0%-90% Experimental Results

Murat Ali Bayir, June 0632 Accuracy vs. STP Experimental Results Increasing STP leads to sessions with fewer pages. It becomes more easy to predict. In small length sessions the probability of LPP and NIP that holds is also small.

Murat Ali Bayir, June 0633 Accuracy vs LPP Experimental Results As LPP increases the real accuracy decreases. Increasing LPP leads to more complex sessions. Intelligent Path completion is needed for discovering more accurate sessions.

Murat Ali Bayir, June 0634 Accuracy vs. NIP Experimental Results Increasing NIP causes more complex sessions, the accuracy decreases for all heuristics. Path separation is needed for discovering more accurate sessions.

Murat Ali Bayir, June 0635 Web Mining Previous Session Reconstruction Heuristics Smart-SRA Agent Simulator Experimental Results Conclusion OUTLINE

Murat Ali Bayir, June 0636 Conclusion New session reconstruction heuristic: Smart-SRA – –Does not allow sequences with unrelated consecutive requests (no hyperlink between the previous one to the next one) – –No artificial browser (back) requests insertion in order to prevent unrelated consecutive requests – –Only maximal sessions discovered. Agent simulator simulates behaviors of real www users. It is possible to evaluate accuracy of heuristics by using Agent Simulator. Experimental results show Smart-SRA outperforms previous reactive heuristics.

Murat Ali Bayir, June 0637 References [Bayir 06-1] M. A. Bayir, I. H. Toroslu, A. Cosar, (2006) A Performance Comparison of Pattern Discovery Methods on Web Log Data, AICCSA-06, the 4th ACS/IEEE International Conference on Computer Systems and Applications. [Bayir 06-2] M. A. Bayir, I. H. Toroslu, A. Cosar, (2006): A New Approach for Reactive Web Usage Data Processing. ICDE Workshops, 44. [Cooley 99-1] R. Cooley, B. Mobasher, and J. Srivastava (1999), Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems Vol. 1, No. 1. [Cooley 99-2] R. Cooley, P. Tan and J. Srivastava (1999), Discovery of interesting usage patterns from Web data. Advances in Web Usage Analysis and User Profiling. LNAI 1836, Springer, Berlin, Germany [Spiliopoulou 98] M. Spiliopoulou, L.C. Faulstich (1998). WUM: A tool for Web Utilization analysis. Proceedings EDBT workshop WebDB’98, LNCS 1590, Springer, Berlin, Germany

Murat Ali Bayir, June 0638 Thank you for Listening Thank you for Listening Any Questions ?