Discovery of Aggregate Usage Profiles for Web Personalization Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire WebKDD 2000
System Architecture
Data Abstractions Drafts from W3C Web Characterization Activity(WCA) user TERM clickstream server session DEFINITION pageview user session episode A single individual that is accessing file from one or more Web servers through a browser Every file that contributes to the display on a user’s browser at one time. It is usually associated with a single user action. A sequential series of page view requests The click-stream of pageviews for a single user across the entire web The set of pageviews in a user session for a particular web site Any semantically meaningful subset of a user or server session.
Typical Web Usage Mining Preprocessing
Example A BCDE F G H O P T IL J Q KN M R S USER1 : A B F O G A D USRE2 : A B C J USRE3 : L R
Usage Mining After preprocessing, we will have –A set of n pageview records, P = { p 1, p 2, …, p n } –A set of m user transactions, T = { t 1, t 2, …, t m } Each transaction can be viewed as n-dimensional vector t = Goal of Usage Mining –Aggregate Usage profiles representing groups of different user behaviors. –Each item in a usage profile is a URL representing a relevant pageview object, and can have an associated weight representing its significance within the profile.
Transaction Clustering Use k-means algorithm to partition this this pageview space into different clusters. PACT(Profile Aggregations on Clustering Transactions) Given a transaction cluster c, construct a usage profile prc. pr c = { | p P, weight(p,pr c ) } weight(p,pr c ) = Σ w(p,t) 1 |C| tctc
Pageview Clustering (1/2) Use Apriori algorithm to find frequent item sets. Use (ARHP)Association Rule Hypergraph Partitioning to find aggregate profiles. Hypergraph H = (V,E) V : pageview set E : weighted frequent itemsets A B C D E F G H I J K L M N O P Q R average confidence
Pageview Clustering (2/2) A B C D E F G H I K L M N P Q R Fitness(C) = Σ e C Weight(e) Σ| e ∩ C | Weight(e) J O Connectivity(v) = | {e| e C, v e}| |{e|e C}|
Recommendation Given a usage profile C, we can represent C as a vector C = { w 1 c, w 2 C, …,w n C } W i c = Given current active session S, S= weight(p i,C), if p i C 0, otherwise match(S,C) = Σw k c s k Σ(s k ) 2 Σ(w k c ) 2 Rec(S,p) = weight(p,C)match(S,C)