Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovery of Aggregate Usage Profiles for Web Personalization

Similar presentations


Presentation on theme: "Discovery of Aggregate Usage Profiles for Web Personalization"— Presentation transcript:

1 Discovery of Aggregate Usage Profiles for Web Personalization
Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Jim Wiltshire School of Computer Science, Telecommunications, and Information Systems DePaul University

2 Web Personalization The Problem Current Approaches
dynamically serve customized content (pages, products, etc.) to users based on their profiles, preferences, or expected interests Current Approaches rule-based filtering usually relies on static profile for users in part obtained through explicit registration collaborative filtering usually requires explicit ratings from users on similar types of objects content-based filtering: learn/store personal profiles locally or on server-side based on content similarity of user profile to pages or product descriptions Limitations of Current Technologies user input may be subjective and prone to bias explicit (and non-binary) user ratings may not be available profiles may be static and can become outdated quickly collaborative filtering: problems with scalability due to sparse data content-based filtering: may miss other semantic relationships among objects

3 Usage-Based Web Personalization
Basic Idea find aggregate user profiles by automatically discovering user access patterns through Web usage mining (offline process) data sources for mining include server logs, other click-stream data (e.g., product-oriented user events), and site structure match a user’s active session against the discovered profiles to provide dynamic content (online process) Advantages / Goals profiles are based on objective information (how users actually use the site) no explicit user ratings or interaction with users (to enter a profile, etc.) helps preserve user privacy, by making effective use of anonymous data usage data captures relationships missed by content-based approaches can help enhance the effectiveness of collaborative or content-based filtering techniques

4 Automatic Web Personalization: Offline Process
Data Preparation Usage Mining Transaction Clustering Pageview Clustering Site Files Usage Profiles Data Cleaning Session Identification Pageview Identification Transaction Identification Support Filtering User Transaction File Frequent Itemsets Server Logs & Other Click-Stream Data Association-Rule Discovery Domain Knowledge

5 Automatic Web Personalization: Online Process
Recommendation Engine Input from the batch process Usage Profiles Recommendations Active Session Web Server Client Browser

6 Data Preparation Tasks
Preprocess and filter logs and other usage data remove redundant references and create pageviews domain knowledge to assign types to pageviews handle references to scripts creating dynamic pages map logs against site topology Identify user sessions and transactions heuristics based on IP, referrer, agent fields, and session time-outs used to identify unique user sessions (may need to infer missing references) intra-session transactions can be obtained based on a model of user behavior (involves classifying references as “content” or “navigational” for each user) weights are assigned to each pageview based on static pageview types as well as some measure of user interest (e.g., duration of pageview) Support filtering - remove very low/high support pageviews

7 Aggregate Usage Profiles
Characteristics of Aggregate Profiles the goal is to effectively capture common usage patterns from potentially anonymous click-stream data profiles are represented as weighted collections of pageviews weights represent the significance of pageviews within each profile profiles are overlapping in order to capture common interests among different groups/types of users multiple profiles may contribute to the recommendation set for a given user Example Profiles from the ACR (Assoc. for Consumer Research) Site: 1.00 Call for Papers 0.67 ACR News Special Topics 0.67 CFP: Journal of Psychology and Marketing I 0.67 CFP: Journal of Psychology and Marketing II 0.67 CFP: Journal of Consumer Psychology II 0.67 CFP: Journal of Consumer Psychology I 1.00 CFP: Winter 2000 SCP Conference 1.00 Call for Papers 0.36 CFP: ACR 1999 Asia-Pacific Conference 0.30 ACR 1999 Annual Conference 0.25 ACR News Updates 0.24 Conference Update

8 Methodologies for the Discovery of Aggregate Profiles
Discovery of Profiles Based on Transaction Clusters cluster user transactions - features are significant pageviews identified in the preprocessing stage derive usage profiles (set of pageview-weight pairs) based on characteristics of each transaction cluster Cluster Pageviews directly compute overlapping clusters of pageviews based on co-occurrence patterns across transactions features are user transactions, so dimensionality poses a problem for traditional clustering algorithms we use Association-Rule Hypergraph Partitioning with an overlap factor

9 Profile Aggregation Based on Clustering Transactions (PACT)
Input set of relevant pageviews in preprocessed log set of user transactions each transaction is a pageview vector Transaction Clusters each cluster contains a set of transaction vectors for each cluster compute centroid as cluster representative Aggregate Usage Profiles a set of pageview-weight pairs: for transaction cluster C, select each pageview pi such that (in the cluster centroid) is greater than a pre-specified threshold

10 Hypergraph-Based Clustering
Construct a hypergraph from sets of related items Each hyperedge represents a frequent itemset Weight of each hyperedge can be based on the characteristics of frequent itemsets or association rules Recursively partition hypergraph so that each partition contains only highly connected data items Given a hypergraph G=(V,E) we find a k-way partitioning such that the weight of the hyperedges that are cut is minimized The fitness of partitions measured in terms of the ratio of weights of cut edges to the weights of uncut edges within the partitions The connectivity measures the percentage of edges within the partition with which the vertex is associated -- used for filtering partitions Vertices from partial edges can be added back to clusters based on a user-specified overlap factor 8

11 Profiles Based on Hypergraph Clusters of Pageviews
Input input for clustering is the set of large itemsets from association rule module each itemset is a hyperedge (weights are a function of the interest of the itemset) Aggregate Profiles (Pageview Clusters) hMETIS used as the underlying hypergraph partitioning algorithm clustering program directly outputs a set of overlapping pageview clusters the weight associated with pageview p in a cluster C is based on the connectivity value of p in hypergraph partition:

12 Recommendations Based on Usage Profiles
Match current user’s activity against the discovered usage profiles a sliding window over the active session to capture the current user’s “short-term” history depth usage profiles and the active session are treated as vectors matching score is computed based on the similarity between vectors (e.g, normalized cosine similarity) Recommendations each pageview is assigned a recommendation score based on matching score to aggregate profiles “information value” of the pageview based on domain knowledge (e.g., link distance of the candidate recommendation to the active session) recommendations are contributed by multiple matching aggregate profiles

13 Experimental Set-up The Data Sets Evaluation Methodology
Log data from the Association for Consumer Research Web site 18342 transactions, 62 pageview URLs (after filtering) Data set divided into training and evaluation sets Evaluation Methodology Portion of each transaction (based on a specified window size) in evaluation set was used to generate a recommendation set (based on a given recommendation threshold) For each transaction, the overall coverage of the recommendation set was divided by the number of recommendations to produce an accuracy measure The overall score was computed (for each threshold) by taking the average scores over all transactions in the evaluation set

14 Average Visit Percentage
AVP measures the likelihood that a user who visits any page in a Given profile, also visits other pages in that profile

15 Evaluation: Measuring Recommendation Accuracy
Recommendation accuracy results, using a active session window of size 3.

16 Evaluation: Impact of Filtering
Comparison of PACT and Hypergraph (using window size 2) for filtered and unfiltered data sets. Filtering involved the removal of top-level navigational pages from the data set, leaving only deeper content-oriented pages.

17 Conclusions Usage-Based Web Personalization Which Method is Best?
results suggest that effective personalization can be achieved even with anonymous and short-term click-stream data possibly useful in the early stages of personalization when more detailed profiles are not available for individual users could be used effectively in conjunction with other methods based on content-based or collaborative filtering Which Method is Best? PACT may be most appropriate when the goal is to provide a more general personalization solution involving a variety of objects across the whole site Hypergraph may be most appropriate when the goal is to provide a highly focused set of recommendations for specific portions of the site In practice, usage-based methods need to be combined with other techniques to provide an integrated solution


Download ppt "Discovery of Aggregate Usage Profiles for Web Personalization"

Similar presentations


Ads by Google