Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Usage Mining: Processes and Applications

Similar presentations


Presentation on theme: "Web Usage Mining: Processes and Applications"— Presentation transcript:

1 Web Usage Mining: Processes and Applications
Qiaoyuan Jiang CSE 8331 November 24, 2003

2 Outline Brief overview of Web mining Web usage mining
Application areas of Web usage mining Future research directions Conclusions

3 Web Mining Web Mining is the application of data mining techniques to discover and retrieve useful information and patterns from the World Wide Web documents and services [Etzioni, 1996].

4 Web Mining Categories Web Content Mining- extracting knowledge from the content of the Web Web Structure Mining- discovering the model underlying the link structures of the Web Web Usage Mining- discovering user’s navigation pattern and predicting user’s behavior

5 Web Usage Mining Processes
Preprocessing: conversion of the raw data into the data abstraction (users, sessions, episodes, clicktreams, and pageviews) necessary for further applying the data mining algorithm. Pattern Discovery: is the key component of WUM, which converges the algorithms and techniques from data mining, machine learning, statistics and pattern recognition etc. research categories. Pattern Analysis: Validation and interpretation of the mined patterns

6 Web Usage Mining Processes (Cont.)

7 Web Usage Mining- Preprocessing
Data Cleaning: remove outliers and/or irrelative data User Identification: associate page references with different users Session Identification: divide all pages accessed by a user into sessions Path Completion: add important page access records that are missing in the access log due to browser and proxy server caching Formatting: format the sessions according to the type of data mining to be accomplished.

8 Web Usage Mining –Preprocessing (Cont.)

9 Web Usage Mining - Pattern Discovery Tasks
Statistical Analysis Clustering Classification Association Rules Sequential Patterns Dependency Modeling

10 Web Usage Mining - Pattern Discovery Tasks (Cont.)
Statistical Analysis: frequency analysis, mean, median, etc. Improve system performance Provide support for marketing decisions Simplify site modification task Clustering: Clustering of users help to discover groups of users with similar navigation patterns => provide personalized Web content Clustering of pages help to discover groups of pages having related content => search engine

11 Web Usage Mining - Pattern Discovery Tasks (Cont.)
Classification: the technique to map a data item into one of several predefined classes Develop profile of users belonging to a particular class or category Association Rules: discover correlations among pages accessed together by a client Help the restructure of Web site Page prefetching Develop e-commerce marketing strategies

12 Web Usage Mining - Pattern Discovery Tasks (Cont.)
Sequential Patterns: extract frequently occurring inter-session patterns such that the presence of a set of items s followed by another item in time order Predict future user visit patterns=>placing ads or recommendations Page prefeteching Dependency Modeling: determine if there are any significant dependencies among the variables in the Web domain Predict future Web resource consumption Develop business strategies to increase sales Improve navigational convenience of users

13 Web Usage Mining - Pattern Analysis
Pattern Analysis is the final stage of WUM, which involves the validation and interpretation of the mined pattern Validation: to eliminate the irrelative rules or patterns and to extract the interesting rules or patterns from the output of the pattern discovery process Interpretation: the output of mining algorithms is mainly in mathematic form and not suitable for direct human interpretations

14 Web Usage Mining - Pattern Analysis Methodologies and Tools
Visualization: help people to understand both real and abstract concepts WebViz: Web is visualized as a direct graph Query mechanism: allow analysts to extract only relevant and useful patterns by specifying constraints. WEBMINER On-Line Analytical Processing (OLAP): enable analysts to perform ad hoc analysis of data in multiple dimensions for decision-making WebLogMiner

15 WEMINER Query Example Finds all ARs with min support of 1% and min confidence of 90%. The analyst only interested in clients from “.edu” domain and data later than Nov. 1st, 2003 with page accesses start with URL A and contains B and C in that order: SELECT association-rules(A*B*C*) FROM log.data WHERE date>= AND domain=“edu” AND support = 1.0 AND confidence = 90.0

16 Application Areas for Web Usage Mining
Personalized: discover the preference and needs of individual Web users in order to provide personalized Web site for certain types of users Impersonalized: examine general user navigation patterns in order to understand how general users use the site System Improvement Site Modification Business Intelligence Web Characterization

17 System Improvement High performance of a web application is expected since it directly affects user’s satisfaction WUM provides a key to understanding Web traffic behavior Applications Develop policies for web caching, network transmission, load balancing, or data distribution Detecting intrusion, fraud, and attempted break-ins to the system

18 Site Modification Structure of a Web site is another crucial attribute for attracting users other than the content of the Web WUM can provide detailed feedback on user’s navigation behavior, which can be used to redesign the Web site structure for user’s navigational convenience Adaptive Web site project [Perkowiz & Etzioni, ]

19 Business Intelligence
Information on how customers are using a Web site is critical information for marketers of e-commerce businesses WUM can provide business process optimization and marketing decisions Business intelligence includes personalization for C2B systems

20 Usage Characterization
Mining general usage patterns (do not focus on any specific users or web sites) help in the study of how browsers are used and the user’s interaction with a browser interface. Enables the ability to look at the dynamics of the Web and how it is growing.

21 Personalization Choosing among thousands of options is challenge for Web users Goal: provides users with dynamic content tailored to their individual interest Form: recommending one or more items or pages to a user, based on the user’s profile and usage behavior, or the patterns of past visitors who have similar profiles. Performance Measurement: Effectiveness: accuracy + coverage Scalability

22 Applications of Personalization
Customizing access to information sources Filtering news or s Recommendation services for the browsing process Tutoring systems Search More ...

23 3 phases of Personalization
Data preparation and transformation: data cleaning, filtering, transaction identification Pattern discovery: discovery usage patterns Recommendation: generate personalized content for a user based on matching the user’s session. (online process)

24

25 Personalization Techniques – Collaborative Filtering (CF)
Pattern discovery: online kNN algorithm applied on user profiles in a given domain and matching people who have the same taste. Recommendation: pages or items that are interested to the k-neighbors will be interested to the active user as well. Drawbacks: Online process =>Lack of scalability Static user profiles => low quality of recommendations

26 Personalization Techniques – Clustering
Technique: clustering user transactions and pageviews. Advantages: User preference is automatically learned from usage data and therefore up-to-date. Better scalability through clustering Drawbacks: Low accuracy

27 Personalization Techniques – Association Rules (ARs)
For each user, create a transaction contains all the items the user have ever accessed. Find all rules satisfy the given support and confidence. For each active user, find all the rules supported by the user. Items predicted by these rules are the candidate recommendations Drawbacks: All association rules must be discovered prior generating recommendation. This can be improved by real-time generating ARs from a subset of transactions within the active users neighborhood High support => better scalability and accuracy, low coverage.

28 Personalization Techniques – Sequential Patterns (SPs)
Technique: Markov Model Advantages: Better accuracy: SPs contains more precise information about user navigation behavior. Drawbacks: Low recommendation coverage More suitable for predictive tasks, e.g., Web prefeteching

29 Personalization Techniques – Hybrid Models
Hybrid Models automatically switch among different personalization models based on localized degree of hyperlink connectivity. High connectivity degree => Non-SP models Low connectivity degree and deeper navigation path => SP models Performance: better than any individual models

30 Future Research Directions
Usage Mining on Semantic Web Help to build semantic Web With semantic Web, WUM can be improved Multimedia Web Data Mining Representation, problem solving and learning from Multimedia data is indeed a challenge

31 Future Research Directions (Cont.)
Software Computing Technology for Web Mining Fuzzy logic: dealing with imprecision and conceptual data. Used in clustering Web log data and mining ARs. Neural network: Adaptive to new new data and information Suitable for parallel process Robust for missing, confusing, ill-defined data Capable for modeling non-linear decision boundaries Effective for learning user profiles Genetic algorithm: randomized search and optimization guided by evaluation criteria. Efficient, adaptive, robust, parallel process Used in search and query optimization, predict user preference

32 Future Research Directions (Cont.)
Analysis of Discovered Patterns Research on efficient, flexible and powerful analysis tools More Applications Temporal evolutions of usage behavior Improving Web services Detect credit card fraud Privacy issues

33 Conclusions


Download ppt "Web Usage Mining: Processes and Applications"

Similar presentations


Ads by Google