Discovery of Aggregate Usage Profiles for Web Personalization

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Data Mining for Web Personalization
WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEBSITE DONE BY: AYESHA NUSRATH 07L51A0517 FIRDOUSE AFREEN 07L51A0522.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Reporter: Jing Chiu Advisor: Yuh-Jye Lee /7/181Data Mining & Machine Learning Lab.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Chapter 12: Web Usage Mining - An introduction
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Web Usage Mining: Processes and Applications
LinkSelector: Select Hyperlinks for Web Portals Prof. Olivia Sheng Xiao Fang School of Accounting and Information Systems University of Utah.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Recommender systems Ram Akella November 26 th 2008.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
Overview of Web Data Mining and Applications Part II
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
Dr. Guandong Xu Intelligent Web & Information Systems (IWIS) Department of Computer Science, Aalborg University Web Usage Mining & Personalization.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Consumer Behavior, Market Research
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
Tag-based Social Interest Discovery
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Copyright © 2009 Pearson Education, Inc. Slide 6-1 Chapter 6 E-commerce Marketing Concepts.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
User Models for Personalization Josh Alspector Chief Technology Officer.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
Discovery of Aggregate Usage Profiles for Web Personalization Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire WebKDD 2000.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 Murat Ali Bayır Middle East Technical University Department of Computer Engineering Ankara, Turkey A New Reactive Method for Processing Web Usage Data.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Science data sharing user behavior mining: an approach combining Web Usage Mining and GIS Mo Wang, Juanle Wang, Yongqing Bai Institute of Geographic Sciences.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites Nasraoui, Soliman, Saka, Badia, Germain IEEE Transactions on Knowledge.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
Data mining in web applications
Web Mining Ref:
Lin Lu, Margaret Dunham, and Yu Meng
DATA MINING Introductory and Advanced Topics Part II - Clustering
Web Mining Department of Computer Science and Engg.
Web Mining Research: A Survey
Presentation transcript:

Discovery of Aggregate Usage Profiles for Web Personalization Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Jim Wiltshire School of Computer Science, Telecommunications, and Information Systems DePaul University

Web Personalization The Problem Current Approaches dynamically serve customized content (pages, products, etc.) to users based on their profiles, preferences, or expected interests Current Approaches rule-based filtering usually relies on static profile for users in part obtained through explicit registration collaborative filtering usually requires explicit ratings from users on similar types of objects content-based filtering: learn/store personal profiles locally or on server-side based on content similarity of user profile to pages or product descriptions Limitations of Current Technologies user input may be subjective and prone to bias explicit (and non-binary) user ratings may not be available profiles may be static and can become outdated quickly collaborative filtering: problems with scalability due to sparse data content-based filtering: may miss other semantic relationships among objects

Usage-Based Web Personalization Basic Idea find aggregate user profiles by automatically discovering user access patterns through Web usage mining (offline process) data sources for mining include server logs, other click-stream data (e.g., product-oriented user events), and site structure match a user’s active session against the discovered profiles to provide dynamic content (online process) Advantages / Goals profiles are based on objective information (how users actually use the site) no explicit user ratings or interaction with users (to enter a profile, etc.) helps preserve user privacy, by making effective use of anonymous data usage data captures relationships missed by content-based approaches can help enhance the effectiveness of collaborative or content-based filtering techniques

Automatic Web Personalization: Offline Process Data Preparation Usage Mining Transaction Clustering Pageview Clustering Site Files Usage Profiles Data Cleaning Session Identification Pageview Identification Transaction Identification Support Filtering User Transaction File Frequent Itemsets Server Logs & Other Click-Stream Data Association-Rule Discovery Domain Knowledge

Automatic Web Personalization: Online Process Recommendation Engine Input from the batch process Usage Profiles Recommendations Active Session Web Server Client Browser

Data Preparation Tasks Preprocess and filter logs and other usage data remove redundant references and create pageviews domain knowledge to assign types to pageviews handle references to scripts creating dynamic pages map logs against site topology Identify user sessions and transactions heuristics based on IP, referrer, agent fields, and session time-outs used to identify unique user sessions (may need to infer missing references) intra-session transactions can be obtained based on a model of user behavior (involves classifying references as “content” or “navigational” for each user) weights are assigned to each pageview based on static pageview types as well as some measure of user interest (e.g., duration of pageview) Support filtering - remove very low/high support pageviews

Aggregate Usage Profiles Characteristics of Aggregate Profiles the goal is to effectively capture common usage patterns from potentially anonymous click-stream data profiles are represented as weighted collections of pageviews weights represent the significance of pageviews within each profile profiles are overlapping in order to capture common interests among different groups/types of users multiple profiles may contribute to the recommendation set for a given user Example Profiles from the ACR (Assoc. for Consumer Research) Site: 1.00 Call for Papers 0.67 ACR News Special Topics 0.67 CFP: Journal of Psychology and Marketing I 0.67 CFP: Journal of Psychology and Marketing II 0.67 CFP: Journal of Consumer Psychology II 0.67 CFP: Journal of Consumer Psychology I 1.00 CFP: Winter 2000 SCP Conference 1.00 Call for Papers 0.36 CFP: ACR 1999 Asia-Pacific Conference 0.30 ACR 1999 Annual Conference 0.25 ACR News Updates 0.24 Conference Update

Methodologies for the Discovery of Aggregate Profiles Discovery of Profiles Based on Transaction Clusters cluster user transactions - features are significant pageviews identified in the preprocessing stage derive usage profiles (set of pageview-weight pairs) based on characteristics of each transaction cluster Cluster Pageviews directly compute overlapping clusters of pageviews based on co-occurrence patterns across transactions features are user transactions, so dimensionality poses a problem for traditional clustering algorithms we use Association-Rule Hypergraph Partitioning with an overlap factor

Profile Aggregation Based on Clustering Transactions (PACT) Input set of relevant pageviews in preprocessed log set of user transactions each transaction is a pageview vector Transaction Clusters each cluster contains a set of transaction vectors for each cluster compute centroid as cluster representative Aggregate Usage Profiles a set of pageview-weight pairs: for transaction cluster C, select each pageview pi such that (in the cluster centroid) is greater than a pre-specified threshold

Hypergraph-Based Clustering Construct a hypergraph from sets of related items Each hyperedge represents a frequent itemset Weight of each hyperedge can be based on the characteristics of frequent itemsets or association rules Recursively partition hypergraph so that each partition contains only highly connected data items Given a hypergraph G=(V,E) we find a k-way partitioning such that the weight of the hyperedges that are cut is minimized The fitness of partitions measured in terms of the ratio of weights of cut edges to the weights of uncut edges within the partitions The connectivity measures the percentage of edges within the partition with which the vertex is associated -- used for filtering partitions Vertices from partial edges can be added back to clusters based on a user-specified overlap factor 8

Profiles Based on Hypergraph Clusters of Pageviews Input input for clustering is the set of large itemsets from association rule module each itemset is a hyperedge (weights are a function of the interest of the itemset) Aggregate Profiles (Pageview Clusters) hMETIS used as the underlying hypergraph partitioning algorithm clustering program directly outputs a set of overlapping pageview clusters the weight associated with pageview p in a cluster C is based on the connectivity value of p in hypergraph partition:

Recommendations Based on Usage Profiles Match current user’s activity against the discovered usage profiles a sliding window over the active session to capture the current user’s “short-term” history depth usage profiles and the active session are treated as vectors matching score is computed based on the similarity between vectors (e.g, normalized cosine similarity) Recommendations each pageview is assigned a recommendation score based on matching score to aggregate profiles “information value” of the pageview based on domain knowledge (e.g., link distance of the candidate recommendation to the active session) recommendations are contributed by multiple matching aggregate profiles

Experimental Set-up The Data Sets Evaluation Methodology Log data from the Association for Consumer Research Web site 18342 transactions, 62 pageview URLs (after filtering) Data set divided into training and evaluation sets Evaluation Methodology Portion of each transaction (based on a specified window size) in evaluation set was used to generate a recommendation set (based on a given recommendation threshold) For each transaction, the overall coverage of the recommendation set was divided by the number of recommendations to produce an accuracy measure The overall score was computed (for each threshold) by taking the average scores over all transactions in the evaluation set

Average Visit Percentage AVP measures the likelihood that a user who visits any page in a Given profile, also visits other pages in that profile

Evaluation: Measuring Recommendation Accuracy Recommendation accuracy results, using a active session window of size 3.

Evaluation: Impact of Filtering Comparison of PACT and Hypergraph (using window size 2) for filtered and unfiltered data sets. Filtering involved the removal of top-level navigational pages from the data set, leaving only deeper content-oriented pages.

Conclusions Usage-Based Web Personalization Which Method is Best? results suggest that effective personalization can be achieved even with anonymous and short-term click-stream data possibly useful in the early stages of personalization when more detailed profiles are not available for individual users could be used effectively in conjunction with other methods based on content-based or collaborative filtering Which Method is Best? PACT may be most appropriate when the goal is to provide a more general personalization solution involving a variety of objects across the whole site Hypergraph may be most appropriate when the goal is to provide a highly focused set of recommendations for specific portions of the site In practice, usage-based methods need to be combined with other techniques to provide an integrated solution