Web Usage Mining: Processes and Applications

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Web Mining.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Chapter 12: Web Usage Mining - An introduction
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Web Mining Research: A Survey
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Discovery of Aggregate Usage Profiles for Web Personalization
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
Data Mining – Intro.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
Dr. Guandong Xu Intelligent Web & Information Systems (IWIS) Department of Computer Science, Aalborg University Web Usage Mining & Personalization.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Web Usage Patterns Ryan McFadden IST 497E December 5, 2002.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Data Mining By Dave Maung.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Academic Year 2014 Spring Academic Year 2014 Spring.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.
WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern.
WebMiningResearchASurvey Web Mining Research: A Survey Authors: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Computer Science Department University.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining – Intro.
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehousing and Data Mining
Supporting End-User Access
Web Mining Department of Computer Science and Engg.
Data Warehousing Data Mining Privacy
Web Mining Research: A Survey
Presentation transcript:

Web Usage Mining: Processes and Applications Qiaoyuan Jiang CSE 8331 November 24, 2003

Outline Brief overview of Web mining Web usage mining Application areas of Web usage mining Future research directions Conclusions

Web Mining Web Mining is the application of data mining techniques to discover and retrieve useful information and patterns from the World Wide Web documents and services [Etzioni, 1996].

Web Mining Categories Web Content Mining- extracting knowledge from the content of the Web Web Structure Mining- discovering the model underlying the link structures of the Web Web Usage Mining- discovering user’s navigation pattern and predicting user’s behavior

Web Usage Mining Processes Preprocessing: conversion of the raw data into the data abstraction (users, sessions, episodes, clicktreams, and pageviews) necessary for further applying the data mining algorithm. Pattern Discovery: is the key component of WUM, which converges the algorithms and techniques from data mining, machine learning, statistics and pattern recognition etc. research categories. Pattern Analysis: Validation and interpretation of the mined patterns

Web Usage Mining Processes (Cont.)

Web Usage Mining- Preprocessing Data Cleaning: remove outliers and/or irrelative data User Identification: associate page references with different users Session Identification: divide all pages accessed by a user into sessions Path Completion: add important page access records that are missing in the access log due to browser and proxy server caching Formatting: format the sessions according to the type of data mining to be accomplished.

Web Usage Mining –Preprocessing (Cont.)

Web Usage Mining - Pattern Discovery Tasks Statistical Analysis Clustering Classification Association Rules Sequential Patterns Dependency Modeling

Web Usage Mining - Pattern Discovery Tasks (Cont.) Statistical Analysis: frequency analysis, mean, median, etc. Improve system performance Provide support for marketing decisions Simplify site modification task Clustering: Clustering of users help to discover groups of users with similar navigation patterns => provide personalized Web content Clustering of pages help to discover groups of pages having related content => search engine

Web Usage Mining - Pattern Discovery Tasks (Cont.) Classification: the technique to map a data item into one of several predefined classes Develop profile of users belonging to a particular class or category Association Rules: discover correlations among pages accessed together by a client Help the restructure of Web site Page prefetching Develop e-commerce marketing strategies

Web Usage Mining - Pattern Discovery Tasks (Cont.) Sequential Patterns: extract frequently occurring inter-session patterns such that the presence of a set of items s followed by another item in time order Predict future user visit patterns=>placing ads or recommendations Page prefeteching Dependency Modeling: determine if there are any significant dependencies among the variables in the Web domain Predict future Web resource consumption Develop business strategies to increase sales Improve navigational convenience of users

Web Usage Mining - Pattern Analysis Pattern Analysis is the final stage of WUM, which involves the validation and interpretation of the mined pattern Validation: to eliminate the irrelative rules or patterns and to extract the interesting rules or patterns from the output of the pattern discovery process Interpretation: the output of mining algorithms is mainly in mathematic form and not suitable for direct human interpretations

Web Usage Mining - Pattern Analysis Methodologies and Tools Visualization: help people to understand both real and abstract concepts WebViz: Web is visualized as a direct graph Query mechanism: allow analysts to extract only relevant and useful patterns by specifying constraints. WEBMINER On-Line Analytical Processing (OLAP): enable analysts to perform ad hoc analysis of data in multiple dimensions for decision-making WebLogMiner

WEMINER Query Example Finds all ARs with min support of 1% and min confidence of 90%. The analyst only interested in clients from “.edu” domain and data later than Nov. 1st, 2003 with page accesses start with URL A and contains B and C in that order: SELECT association-rules(A*B*C*) FROM log.data WHERE date>=031101 AND domain=“edu” AND support = 1.0 AND confidence = 90.0

Application Areas for Web Usage Mining Personalized: discover the preference and needs of individual Web users in order to provide personalized Web site for certain types of users Impersonalized: examine general user navigation patterns in order to understand how general users use the site System Improvement Site Modification Business Intelligence Web Characterization

System Improvement High performance of a web application is expected since it directly affects user’s satisfaction WUM provides a key to understanding Web traffic behavior Applications Develop policies for web caching, network transmission, load balancing, or data distribution Detecting intrusion, fraud, and attempted break-ins to the system

Site Modification Structure of a Web site is another crucial attribute for attracting users other than the content of the Web WUM can provide detailed feedback on user’s navigation behavior, which can be used to redesign the Web site structure for user’s navigational convenience Adaptive Web site project [Perkowiz & Etzioni, 1998-1999]

Business Intelligence Information on how customers are using a Web site is critical information for marketers of e-commerce businesses WUM can provide business process optimization and marketing decisions Business intelligence includes personalization for C2B systems

Usage Characterization Mining general usage patterns (do not focus on any specific users or web sites) help in the study of how browsers are used and the user’s interaction with a browser interface. Enables the ability to look at the dynamics of the Web and how it is growing.

Personalization Choosing among thousands of options is challenge for Web users Goal: provides users with dynamic content tailored to their individual interest Form: recommending one or more items or pages to a user, based on the user’s profile and usage behavior, or the patterns of past visitors who have similar profiles. Performance Measurement: Effectiveness: accuracy + coverage Scalability

Applications of Personalization Customizing access to information sources Filtering news or e-mails Recommendation services for the browsing process Tutoring systems Search More ...

3 phases of Personalization Data preparation and transformation: data cleaning, filtering, transaction identification Pattern discovery: discovery usage patterns Recommendation: generate personalized content for a user based on matching the user’s session. (online process)

Personalization Techniques – Collaborative Filtering (CF) Pattern discovery: online kNN algorithm applied on user profiles in a given domain and matching people who have the same taste. Recommendation: pages or items that are interested to the k-neighbors will be interested to the active user as well. Drawbacks: Online process =>Lack of scalability Static user profiles => low quality of recommendations

Personalization Techniques – Clustering Technique: clustering user transactions and pageviews. Advantages: User preference is automatically learned from usage data and therefore up-to-date. Better scalability through clustering Drawbacks: Low accuracy

Personalization Techniques – Association Rules (ARs) For each user, create a transaction contains all the items the user have ever accessed. Find all rules satisfy the given support and confidence. For each active user, find all the rules supported by the user. Items predicted by these rules are the candidate recommendations Drawbacks: All association rules must be discovered prior generating recommendation. This can be improved by real-time generating ARs from a subset of transactions within the active users neighborhood High support => better scalability and accuracy, low coverage.

Personalization Techniques – Sequential Patterns (SPs) Technique: Markov Model Advantages: Better accuracy: SPs contains more precise information about user navigation behavior. Drawbacks: Low recommendation coverage More suitable for predictive tasks, e.g., Web prefeteching

Personalization Techniques – Hybrid Models Hybrid Models automatically switch among different personalization models based on localized degree of hyperlink connectivity. High connectivity degree => Non-SP models Low connectivity degree and deeper navigation path => SP models Performance: better than any individual models

Future Research Directions Usage Mining on Semantic Web Help to build semantic Web With semantic Web, WUM can be improved Multimedia Web Data Mining Representation, problem solving and learning from Multimedia data is indeed a challenge

Future Research Directions (Cont.) Software Computing Technology for Web Mining Fuzzy logic: dealing with imprecision and conceptual data. Used in clustering Web log data and mining ARs. Neural network: Adaptive to new new data and information Suitable for parallel process Robust for missing, confusing, ill-defined data Capable for modeling non-linear decision boundaries Effective for learning user profiles Genetic algorithm: randomized search and optimization guided by evaluation criteria. Efficient, adaptive, robust, parallel process Used in search and query optimization, predict user preference

Future Research Directions (Cont.) Analysis of Discovered Patterns Research on efficient, flexible and powerful analysis tools More Applications Temporal evolutions of usage behavior Improving Web services Detect credit card fraud Privacy issues

Conclusions