Web Personalization and Recommender Systems Bamshad Mobasher School of Computing, DePaul University Bamshad Mobasher School of Computing, DePaul University.

Web Personalization and Recommender Systems Bamshad Mobasher School of Computing, DePaul University Bamshad Mobasher School of Computing, DePaul University

2 What is Web Personalization  Web Personalization: “personalizing the browsing experience of a user by dynamically tailoring the look, feel, and content of a Web site to the user’s needs and interests.”  Related Phrases  mass customization, one-to-one marketing, site customization, target marketing  Why Personalize?  broaden and deepen customer relationships  provide continuous relationship marketing to build customer loyalty  help automate the process of proactively market products to customers  lights-out marketing  cross-sell/up-sell products  provide the ability to measure customer behavior and track how well customers are responding to marketing efforts

3 Personalization v. Customization  It’s a question of who controls the user’s browsing experience  Customization  user controls and customizes the site or the product based on his/her preferences  usually manual, but sometimes semi-automatic based on a given user profile  Personalization  done automatically based on the user’s actions, the user’s profile, and (possibly) the profiles of others with “similar” profiles

6 Challenges and Pitfalls  Technical Challenges  data collection and data preprocessing  discovering actionable knowledge from the data  which personalization algorithms  Implementation/Deployment Challenges  what to personalize  when to personalize  degree of personalization or customization  how to target information without being intrusive

7 Web Personalization & Recommender Systems  Dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests  Most common type of personalization: Recommender systems Recommendation algorithm User profile

8 Common Recommendation Techniques  Collaborative Filtering  Give recommendations to a user based on preferences of “similar” users  Preferences on items may be explicit or implicit  Content-Based Filtering  Give recommendations to a user based on items with “similar” content in the user’s profile  Rule-Based (Knowledge-Based) Filtering  Provide recommendations to users based on predefined (or learned) rules  age(x, 25-35) and income(x, 70-100K) and childred(x, >=3)  recommend(x, Minivan)

9 The Recommendation Task  Basic formulation as a prediction problem  Typically, the profile P u contains preference scores by u on some other items, {i 1, …, i k } different from i t  preference scores on i 1, …, i k may have been obtained explicitly (e.g., movie ratings) or implicitly (e.g., time spent on a product page or a news article) Given a profile P u for a user u, and a target item i t, predict the preference score of user u on item i t

10 Notes on User Profiling  Utilizing user profiles for personalization assumes  1) past behavior is a useful predictor of the future behavior  2) wide variety of behaviors amongst users  Basic task in user profiling: Preference elicitation  May be based on explicit judgments from users (e.g. ratings)  May be based on implicit measures of user interest  Automatic user profiling  Use machine learning or data mining techniques to learn models user behavior, preferences  May build a model for each specific user or build group profiles  Usually based on passive observation of user behavior  Advantages:  less work for user and application writer  adaptive behavior  user and system build trust relationship gradually

11 Consequences of passiveness  Weak heuristics  example: click through multiple uninteresting pages en route to interestingness  example: user browses to uninteresting page, then goes for a coffee  example: hierarchies tend to get more hits near root  Cold start  No ability to fine tune profile or express interest without visiting “appropriate” pages  Some possible alternative/extensions to internally maintained profiles:  expose to the user (e.g. fine tune profile) ?  expose to other users/agents (e.g. collaborative filtering)?  expose to web server (e.g. cnn.com custom news)?

12 Content-Based Filtering Systems  Track which pages/items the user visits and give as recommendations other pages with similar content  Often involves the use of client-side learning interface agents  May require the user to enter a profile or to rate pages/objects as “interesting” or “uninteresting”  Advantages:  useful for large information-based sites (e.g., portals) or for domains where items have content-rich features  can be easily integrated with “content servers”  Disadvantages  may miss important pragmatic relationships among items (based on usage)  not effective in small-specific sites or sites which are not content-oriented

13 Content-Based Recommenders  Predictions for unseen (target) items are computed based on their similarity (in terms of content) to items in the user profile.  E.g., user profile P u contains recommend highly: and recommend “mildly”:

14 Content-Based Recommender Systems

15 Content-Based Recommenders: Personalized Search  How can the search engine determine the “user’s context”? Query: “Madonna and Child” ? ?  Need to “learn” the user profile:  User is an art historian?  User is a pop music fan?

16 Content-Based Recommenders  Music recommendations  Play list generation Example: PandoraPandora

17 Example: Recommender Systems  Collaborative filtering recommenders  Predictions for unseen (target) items are computed based the other users’ with similar interest scores on items in user u’s profile  i.e. users with similar tastes (aka “nearest neighbors)  requires computing correlations between user u and other users according to interest scores or ratings

18 Collaborative Recommender Systems

21 Basic Collaborative Filtering Process Neighborhood Formation Phase Recommendations Neighborhood Formation Neighborhood Formation Recommendation Engine Recommendation Engine Current User Record Historical User Records user item rating Nearest Neighbors Combination Function Recommendation Phase Both of the Neighborhood formation and the recommendation phases are real-time components

22 Collaborative Filtering: Measuring Similarities  Pearson Correlation  weight by degree of correlation between user U and user J  1 means very similar, 0 means no correlation, -1 means dissimilar  Works well in case of user ratings (where there is at least a range of 1-5)  Not always possible (in some situations we may only have implicit binary values, e.g., whether a user did or did not select a document)  Alternatively, a variety of distance or similarity measures can be used Average rating of user J on all items.

23 Collaborative Filtering: Making Predictions  When generating predictions from the nearest neighbors, neighbors can be weighted based on their distance to the target user  To generate predictions for a target user a on an item i:  r a = mean rating for user a  u 1, …, u k are the k-nearest-neighbors to a  r u,i = rating of user u on item I  sim(a,u) = Pearson correlation between a and u  This is a weighted average of deviations from the neighbors’ mean ratings (and closer neighbors count more)

24 Example Collaborative System Item1Item 2Item 3Item 4Item 5Item 6Correlation with Alice Alice5233? User 12441 User 2213120.33 User 342321.90 User 4332310.19 User 53222 User 6531320.65 User 75151 Best match Prediction  Using k-nearest neighbor with k = 1

25 Item-based Collaborative Filtering  Find similarities among the items based on ratings across users  Often measured based on a variation of Cosine measure  Prediction of item I for user a is based on the past ratings of user a on items similar to i.  Suppose:  Predicted rating for Karen on Indep. Day will be 7, because she rated Star Wars 7  That is if we only use the most similar item  Otherwise, we can use the k-most similar items and again use a weighted average sim(Star Wars, Indep. Day) > sim(Jur. Park, Indep. Day) > sim(Termin., Indep. Day)

26 Item-Based Collaborative Filtering Item1Item 2Item 3Item 4Item 5 Item 6 Alice5233 ? User 12441 User 221312 User 342321 User 433231 User 53222 User 653132 User 75151 Item similarity0.760.790.600.710.75 Best match Prediction  Cosine Similarity to the target item

27 Collaborative Filtering: Evaluation  split users into train/test sets  for each user a in the test set:  split a’s votes into observed (I) and to-predict (P)  measure average absolute deviation between predicted and actual votes in P  MAE = mean absolute error  average over all test users

28 Other Forms of Collaborative and Social Filtering  Social Tagging (Folksonomy)  people add free-text tags to their content  where people happen to use the same terms then their content is linked  frequently used terms floating to the top to create a kind of positive feedback loop for popular tags.  Examples:  Del.icio.us Del.icio.us  Flickr Flickr  Last.fm Last.fm

29 Social Tagging  By allowing loose coordination, tagging systems allow social exchange of conceptual information.  Facilitates a similar but richer information exchange than collaborative filtering.  I comment that a movie is "romantic", or "a good holiday movie". Everyone who overhears me has access to this metadata about the movie.  The social exchange goes beyond collaborative filtering - facilitating transfer of more abstract, conceptual information about the movie.  Note: the preference information is transferred implicitly - we are more likely to tag items we like than don't like  No algorithm mediating our connection between individuals: when we navigate by tags, we are directly connecting with others.

30 Social Tagging  Deviating from standard mental models  No browsing of topical, categorized navigation or searching for an explicit term or phrase  Instead is use language I use to define my world (tagging)  Sharing my language and contexts will create community  Tagging creates community through the overlap of perspectives  This leads to the creation of social networks which may further develop and evolvesocial networks  But, does this lead to dynamic evolution of complex concepts or knowledge? Collective intelligence?

31 Folksonomies

32 Hybrid Recommender Systems

33 Semantically Enhanced Collaborative Filtering  Basic Idea:  Extend item-based collaborative filtering to incorporate both similarity based on ratings (or usage) as well as semantic similarity based on domain knowledge  Semantic knowledge about items  Can be extracted automatically from the Web based on domain-specific reference ontologies  Used in conjunction with user-item mappings to create a combined similarity measure for item comparisons  Singular value decomposition used to reduce noise in the semantic data  Semantic combination threshold  Used to determine the proportion of semantic and rating (or usage) similarities in the combined measure

34 Semantically Enhanced Hybrid Recommendation  An extension of the item-based algorithm  Use a combined similarity measure to compute item similarities:  where,  SemSim is the similarity of items i p and i q based on semantic features (e.g., keywords, attributes, etc.); and  RateSim is the similarity of items i p and i q based on user ratings (as in the standard item-based CF)   is the semantic combination parameter:   = 1  only user ratings; no semantic similarity   = 0  only semantic features; no collaborative similarity

35 Semantically Enhanced CF  Movie data set  Movie ratings from the movielens data set  Semantic info. extracted from IMDB based on the following ontology

36 Semantically Enhanced CF  Used 10-fold x-validation on randomly selected test and training data sets  Each user in training set has at least 20 ratings (scale 1-5)

37 Semantically Enhanced CF  Dealing with new items and sparse data sets  For new items, select all movies with only one rating as the test data  Degrees of sparsity simulated using different ratios for training data

38 Collaborative Filtering: Problems  Problems with standard CF  major problem with CF is scalability  neighborhood formation is done in real-time  small number of users relative to items may result in poor performance  data become too sparse to provide accurate predictions  “new item” problem  Vulnerability to attacks (will come back to this later)  Problems in context of clickstream / e-commerce data  explicit user ratings are not available  features are binary (visit or a non-visit for a particular item) or a function of the time spent on a particular item  a visit to a page is not necessarily an indication of interest in that item  number of user records (and items) is far larger than the standard domains for CF where users are limited to purchasers or people who rated items  need to rely on very short user histories

39 Web Mining Approach to Personalization  Basic Idea  generate aggregate user models (usage profiles) by discovering user access patterns through Web usage mining (offline process)  Clustering user transactions  Clustering items / pageviews  Association rule mining  Sequential pattern discovery  match a user’s active session against the discovered models to provide dynamic content (online process)  Advantages  no explicit user ratings or interaction with users  helps preserve user privacy, by making effective use of anonymous data  enhance the effectiveness and scalability of collaborative filtering  more accurate and broader recommendations than content-only approaches

40 Automatic Web Personalization: Offline Process Web & Application Server Logs Data Cleaning Pageview Identification Sessionization Data Integration Data Transformation Data Preprocessing User Transaction Database Transaction Clustering Pageview Clustering Correlation Analysis Association Rule Mining Sequential Pattern Mining Usage Mining Patterns Pattern Filtering Aggregation Characterization Pattern Analysis Site Content & Structure Domain Knowledge Aggregate Usage Profiles Data Preparation PhasePattern Discovery Phase

41 Automatic Web Personalization: Online Process Recommendation Engine Web Server Client Browser Active Session Recommendations Integrated User Profile Aggregate Usage Profiles Stored User Profile Domain Knowledge

42 Conceptual Representation of User Transactions or Sessions Session/user data Pageview/objects Raw weights are usually based on time spent on a page, but in practice, need to normalize and transform.

43 Real-Time Recommendation Engine  Keep track of users’ navigational history through the site  a fixed-size sliding window over the active session to capture the current user’s “short-term” history depth  Match current user’s activity against the discovered profiles  profiles either can be based on aggregate usage profiles, or are obtained directly from association rules or sequential patterns  Dynamically generated recommendations are added to the returned page  each pageview can be assigned a recommendation score based on  matching score to user profiles (e.g., aggregate usage profiles)  “information value” of the pageview based on domain knowledge (e.g., link distance of the candidate recommendation to the active session)

44  Matching score computed using cosine similarity  User’s active session (pageviews in the current window) is compared to each aggregate profile (both are viewed as pageview vectors)  Weight of items in the profile vector is the significance weight of the item for that profile  Weight of items in the session vector can be all 1’s, or based on some method for determining their significance in the current session  Generating recommendations based on matching profiles  from each matching profile recommend the items not already in the user session window, and not directly linked from the pages in the current session window  the recommendation score for an item is based on a combination of profile matching score (similarity to session window) and the weight of the item in that profile  additionally, we can weight items farther away from the current location of user higher (i.e., consider them better recommendations) Recommendations Based on Aggregate Profiles

45 Discovery of Aggregate Profiles  Transaction clusters as Aggregate Profiles  Each transaction is viewed as a pageview vector  Each cluster contains a set of transaction vectors with a centroid  Each centroid acts as an aggregate profile with representing the weight for pageview p i in the profile  Personalization involves computing similarity between a current user’s profile (or the active user session) and the cluster centroids

46 Web Usage Mining: clustering example  Transaction Clusters:  Clustering similar user transactions and using centroid of each cluster as a usage profile (representative for a user segment) SupportURLPageview Description 1.00/courses/syllabus.asp?course=450- 96-303&q=3&y=2002&id=290 SE 450 Object-Oriented Development class syllabus 0.97/people/facultyinfo.asp?id=290Web page of a lecturer who thought the above course 0.88/programs/Current Degree Descriptions 2002 0.85/programs/courses.asp?depcode=96 &deptmne=se&courseid=450 SE 450 course description in SE program 0.82/programs/2002/gradds2002.aspM.S. in Distributed Systems program description Sample cluster centroid from dept. Web site (cluster size =330)

47 Using Clusters for Personalization PROFILE 0 (Cluster Size = 3) -------------------------------------- 1.00C.html 1.00D.html PROFILE 1 (Cluster Size = 4) -------------------------------------- 1.00B.html 1.00F.html 0.75A.html 0.25C.html PROFILE 2 (Cluster Size = 3) -------------------------------------- 1.00A.html 1.00D.html 1.00E.html 0.33C.html Original Session/user data Result of Clustering Given an active session A  B, the best matching profile is Profile 1. This may result in a recommendation for page F.html, since it appears with high weight in that profile.

48 Association Rules & Personalization  Approach of Fu, Budzik, Hammond, 2000  Proposed solution to the problem of reduced coverage due to sparse data  rank all discovered rules by the degree of intersection between the left-hand-side of rule and a user's active session and then generate the top k recommendations  Problem: requires the generation of all association rules, requiring a search in the full space of rules during the recommendation process  Approach of Lin, Alvarez, Ruiz, 2000  Basic Approach  find an appropriate number of rules for each target user by automatically selecting the minimum support  the recommendation engine generates association rules among both users and articles  Problem: requires online generation of relevant rules for each user

49 Association Rules & Personalization  Approach of Mobasher, et al., 2001  discovered frequent itemsets of are stored into an “itemset graph” (an extension of lexicographic tree structure of Agrawal, et al. 1999)  each node at depth d in the graph corresponds to an itemset, I, of size d and is linked to itemsets of size d+1 that contain I at level d+1. The single root node at level 0 corresponds to the empty itemset.  frequent itemsets are matched against a user's active session S by performing a search of the graph to depth |S|  recommendation generation can be done in constant time  does not require apriori generate association rules from frequent itemsets  a recommendation r is an item at level |S+1| whose recommendation score is the confidence of rule S ==> r

50 Sequential Patterns & Personalization  Sequential / Navigational Patterns as Aggregate Profiles  similar to association rules, but the ordering of accessed items is taken into account  Two basic approaches  use contiguous sequential patterns (CSP) (e.g., Web navigational patterns)  use general sequential patterns (SP)  Contiguous sequential patterns are often modeled as Markov chains and used for prefetching (i.e., predicting the immediate next user access based on previously accessed pages)  In context of recommendations, they can achieve high accuracy, but may be difficult to obtain reasonable coverage

51 Sequential Patterns & Personalization  Sequential / Navigational Patterns (continued)  representation as Markov chains often leads to high space complexity due to model sizes  some approaches have focused on reducing model size  selective Markov Models (Deshpande, Karypis, 2000)  use various pruning strategies to reduce the number of states (e.g., support or confidence pruning, error pruning)  longest repeating subsequences (Pitkow, Pirolli, 1999)  similar to support pruning, used to focus only on significant navigational paths  increased coverage can be achieved by using all-Kth-order models (i.e., using all possible sizes for user histories)

52 Sequential Patterns & Personalization (Mobasher, et al. 2002)  A Frequent Sequence Trie (FST), is used to store both the sequential and contiguous sequential patterns  organized into levels from 0 to k, where k is the maximal size among all sequential (respectively, contiguous sequential) patterns  each non-root node N at depth d contains an item s d and representing a frequent sequence  along with each node the support (or frequency) value of the corresponding pattern is stored  for each active session window w =  perform a depth-first search of the FST to level n  If a match is found, then the children of the matching node N are used to generate candidate recommendations  given a sequence S =, the item p is added to the recommendation set if the confidence of S is greater than or equal to the confidence threshold

53 Example: Frequent Itemsets Sample Transactions Frequent itemsets (using min. support frequency = 4)

54 Example: Sequential Patterns Sample Transactions SP (min. support frequency = 4) CSP (min. support frequency = 4)

55 Example: An Itemset Graph Frequent Itemset Graph for the Example Given an active session window, the algorithm finds items A and C with recommendation scores of 1 and 4/5 (corresponding to confidences of the rules {B,E } => {A } and {B,E } => {C} ).

56 Example: Frequent Sequence Trie Frequent Sequence Trie for the Example Given an active session window, the algorithm finds item E with recommendation score of 1 (corresponding to confidences of the rules {A,B } => {E }.

57 Quantitative Evaluation of Recommendation Effectiveness  Two important factors in evaluating recommendations  Precision: measures the ratio of “correct” recommendations to all recommendations produced by the system  low precision would result in angry or frustrated users  Coverage: measures the ratio of “correct” recommendations to all pages/items that will be accessed by user  low coverage would inhibit the ability of the system to give relevant recommendations at critical points in user navigation  Transactions Divided into Training & Evaluation Sets  training set is used to build models (generation of aggregate profiles, neighborhood formation)  evaluation set is used to measure precision & coverage  10-Fold Cross Validation generally used in the experiments

58 Evaluation Methodology  Each transaction t in the evaluation set is divided into two parts  as t : portion of the first n items in t, used as the user session to generate recommendations (n is the maximum allowable window size)  Eval t : the remaining portion of t is used to evaluate the recommendations (|Eval t | = |t| - n)  R(as t,  ): the recommendation set which contains all items whose recommendation score is greater than or equal to the threshold . Example: t = A,B,C,D,E,F,G,H -Use A,B,C,D to generate recommendations, say: E,G,K -Match E,G,K with E,F,G,H -No. of matches = 2 -Size of Eval t = 4 -Size of recommendation set = 3 -Coverage = 2/4 = 50% -Precision = 2/3 = 67% Example: t = A,B,C,D,E,F,G,H -Use A,B,C,D to generate recommendations, say: E,G,K -Match E,G,K with E,F,G,H -No. of matches = 2 -Size of Eval t = 4 -Size of recommendation set = 3 -Coverage = 2/4 = 50% -Precision = 2/3 = 67%

59  Increasing window sizes (using larger portion of user’s history) generally leads to improvement in precision Impact of Window Size This example is based on the association rule approach

60 Associations vs. Sequences  Comparison of recommendations based on association rules, sequential patterns, contiguous sequential patterns, and standard k-nearest neighbor Support threshold for Association, SP, CSP = 0.04

61 Problems with Web Usage Mining  New item problem  Patterns will not capture new items recently added  Bad for dynamic Web sites  Poor machine interpretability  Hard to generalize and reason about patterns  No domain knowledge used to enhance results  E.g., Knowing a user is interested in a program, we could recommend the prerequisites, core or popular courses in this program to the user  Poor insight into the patterns themselves  The nature of the relationships among items or users in a pattern is not directly available

62 Solution: Integrate Semantic Knowledge with Web Usage Mining  Information Retrieval/Extraction Approach  Represent semantic knowledge in pageviews as keyword vectors  Keywords extracted from text or meta-data  Text mining can be used to capture higher-level concepts or associations among concepts  Cannot capture deeper relationships among objects based on their inherent properties or attributes  Ontology-based approach  Represent domain knowledge using relational model or ontology representation languages  Process Web usage data with the structured domain knowledge  Requires the extraction of ontology instances from Web pages  Challenge: performing underlying mining operations on structured objects (e.g., computing similarities or performing aggregations)

63  Pre-Mining  Initial transaction vector: t =  Transform into content-enhanced transaction:  t =  Now transaction clustering can be performed based on content similarity among user transactions  Post-Mining  First perform mining operations on usage and content data independently  Integrate usage and content patterns in the recommendation phase  Example: Content Profiles  Perform clustering on the term-pageview matrix  Each cluster centroid represents pages with some similar content  Use both content and usage profiles to generate recommendations Integration of Content Features

64 A.htmlB.htmlC.htmlD.htmlE.html user110101 user211001 user301110 user410111 user511001 user610111 A.htmlB.htmlC.htmlD.htmlE.html web00111 data01110 mining01110 business11000 intelligence11001 marketing11001 ecommerce01100 search10100 information10111 retrieval10111 User transaction matrix UT Feature-Pageview Matrix FP

65 Content Enhanced Transactions webdataminingbusinessintelligencemarketingecommercesearchinformationretrieval user12111221233 user21112331122 user32331112122 user43221221244 user51112331122 user63221221244 User-Feature Matrix UF Note that: UF = UT x FP T Example: users 4 and 6 are more interested in concepts related to Web information retrieval, while user 3 is more interested in data mining.

66 Integrating Content and Usage For Personalization

67 Example: Content Profiles Examples of feature (word) clusters (Association for Consumer Research Web site) CLUSTER 0 ---------- anthropologi associ behavior... CLUSTER 4 ---------- consum journal market Psychologi … CLUSTER 10 ---------- ballot result vote... CLUSTER 11 ---------- advisori appoint committe council... Cluster Centroids

68 Example: Usage Profiles  Example Usage Profiles from the ACR Site: 1.00Call for Papers 0.67ACR News Special Topics 0.67CFP: Journal of Psychology and Marketing I 0.67CFP: Journal of Psychology and Marketing II 0.67CFP: Journal of Consumer Psychology II 0.67CFP: Journal of Consumer Psychology I 1.00Call for Papers 0.67ACR News Special Topics 0.67CFP: Journal of Psychology and Marketing I 0.67CFP: Journal of Psychology and Marketing II 0.67CFP: Journal of Consumer Psychology II 0.67CFP: Journal of Consumer Psychology I 1.00CFP: Winter 2000 SCP Conference 1.00Call for Papers 0.36CFP: ACR 1999 Asia-Pacific Conference 0.30ACR 1999 Annual Conference 0.25ACR News Updates 0.24Conference Update 1.00CFP: Winter 2000 SCP Conference 1.00Call for Papers 0.36CFP: ACR 1999 Asia-Pacific Conference 0.30ACR 1999 Annual Conference 0.25ACR News Updates 0.24Conference Update  Generated by clustering user transactions directly  Usage profiles represent groups of users commonly accessing certain pages together  Content profiles represent groups of pages with similar content

69 Comparison of Recommendations

70 Ontology-Based Usage Mining  Approach 1: Ontology-Enhanced Transactions  Initial transaction vector: t =  Transform into content-enhanced transaction: t =  The structured objects o 1, …, o r are instances on ontology entities extracted from pages p 1, …, p n in the transaction  Now mining tasks can be performed based on ontological similarity among user transactions  Approach 2: Ontology-Enhanced Patterns  Discover usage patterns in the standard way  Transform patterns by creating an aggregate representation of the patterns based on the ontology  Requires the categorization of similar objects into ontology classes  Also requires the specification of different aggregation/combination function for each attribute of each class in the ontology

71 Example: Ontology for a Movie Site An example of a Movie object instance

72 Ontology-Based Pattern Aggregation {2000} {1999} {2002} Year {S: 0.6; W: 0.4} {C} {S: 0.5; T:0.5} {B} Movie 2: {S: 0.7; T: 0.2; U: 0.1} {A} Movie 1: Genre Name Genre-All RomanceComedy Romantic Comedy Kid&Family Romance Genre-All Movie 3: Comedy Genre-All Actor [1999, 2002] Year {S: 0.58; T: 0.27; W:0.09; U: 0.05} {A: 0.5; B: 0.35; C: 0.15} ActorGenre Name Genre-All RomanceComedy 0.50Movie1.html 0.35Movie2.html 0.15Movie3.html Usage profile Object Extraction Ontology- Based Aggregation Comedy Semantic Usage pattern

73 Personalization with Semantic Usage Patterns Aggregate Semantic Usage Patterns Match Profiles Extended User Profile Current User Profile Recommendations of Items Instantiate to Real Web Objects Note that the matching between the semantic representations of user’s profile and patterns requires computation of similarities at the ontological level (may be defined based on domain- specific characteristics)

74 Profile Injection Attacks  Consist of a number of "attack profiles"  added to the system by providing ratings for various items  engineered to bias the system's recommendations  Two basic types:  “Push attack” (“Shilling”): designed to promote an item  “Nuke attack”: designed to demote a item  Prior work has shown that CF recommender systems are highly vulnerable to such attacks  Attack Models  strategies for assigning ratings to items based on knowledge of the system, products, or users  examples of attack models: “random”, “average”, “bandwagon”, “segment”, “love-hate”

75 A Successful Push Attack Item1Item 2Item 3Item 4Item 5Item 6Correlation with Alice Alice5233? User 12441 User 2213120.33 User 342321.90 User 4332310.19 User 53222 User 6531320.65 User 75151 Attack 12325 Attack 2323250.76 Attack 3322250.93 Prediction  Best Match “user-based” algorithm using k-nearest neighbor with k = 1

76 A Generic Attack Profile  Attack models differ based on ratings assigned to filler and selected items ……… itit …… null Ratings for k selected items Rating for the target item ISIS IFIF II Ratings for l filler items Unrated items in the attack profile

77 Random ratings for l filler items Average and Random Attack Models  Random Attack: filler items are assigned random ratings drawn from the overall distribution of ratings on all items across the whole DB  Average Attack: ratings each filler item drawn from distribution defined by average rating for that item in the DB  The percentage of filler items determines the amount knowledge (and effort) required by the attacker …… itit … null r max Rating for the target item IFIF II Unrated items in the attack profile

78 Bandwagon Attack Model  What if the system's rating distribution is unknown?  Identify products that are frequently rated (e.g., “blockbuster” movies)  Associate the pushed product with them  Ratings for the filler items centered on overall system average rating (Similar to Random attack)  frequently rated items can be guessed or obtained externally ……… itit r max … … null r max Ratings for k frequently rated items Rating for the target item ISIS IFIF II Random ratings for l filler items Unrated items in the attack profile

79 Segment Attack Model  Assume attacker wants to push product to a target segment of users  those with preference for similar products  fans of Harrison Ford  fans of horror movies  like bandwagon but for semantically-similar items  originally designed for attacking item-based CF algorithms  maximize sim(target item, segment items)  minimize sim(target item, non-segment items) ……… itit r max … r min … null r max Ratings for k favorite items in user segment Rating for the target item ISIS IFIF II Ratings for l filler items Unrated items in the attack profile

80 Nuke Attacks: Love/Hate Attack Model …… itit r max … null r min Min rating for the target item IFIF II Max rating for l filler items Unrated items in the attack profile  A limited-knowledge attack in its simplest form  Target item given the minimum rating value  All other ratings in the filler item set are given the maximum rating value  Note:  Variations of this (an the other models) can also be used as a push or nuke attacks, essentially by switching the roles of r min and r max.

81 How Effective Can Attacks Be?  First A Methodological Note  Using MovieLens 100K data set  50 different "pushed" movies  selected randomly but mirroring overall distribution  50 users randomly pre-selected  Results were averages over all runs for each movie-user pair  K = 20 in all experiments  Evaluating results  prediction shift –how much the rating of the pushed movie differs before and after the attack  hit ratio –how often the pushed movie appears in a recommendation list before and after the attack

82 Example Results: Average Attack  Average attack is very effective against user based algorithm (Random not as effective)  Item-based CF more robust (but vulnerable to other attack types such as “segment attack” [Burke & Mobasher, 2005]

83 Example Results: Bandwagon Attack  Only a small profile needed (3%-7%)  Only a few (< 10) popular movies needed  As effective as the more data-intensive average attack (but still not effective against item-based algorithms)

84 Results: Impact of Profile Size Only a small number of filler items need to be assigned ratings. An attacker, therefore, only needs to use part of the product space to make the attack effective. In the item-based algorithm we don’t see the same drop-off, but prediction shift shows a logarithmic behavior – near maximum at about 7% filler size.

85 Example Results: Segmented Attack Against Item-Based CF Very effective against targeted group Best against item-based Also effective against user-based Low knowledge

86 Possible Solutions  Explicit trust calculation?  select peers through network of trust relationships  law of large numbers  hard to achieve numbers needed for CF to work well  Hybrid recommendation  Some indications that some hybrids may be more robust  Model-based recommenders  Certain recommenders using clustering are more robust, but generally at the cost of less accuracy  But a probabilistic approach has been shown to be relatively accurate  Detection and Response

87 Results: Semantically Enhanced Hybrid Alpha 0.0 = 100% semantic item-based similarity Alpha 1.0 = 100% collaborative item-based similarity Semantic features extracted for movies: top actors, director, genre, synopsis (top keywords), etc.

88 Approaches to Detection & Response Profile Classification  Classification model to identify attack profiles and exclude these profiles in computing predictions  Uses the characteristic features of most successful attack models  Designed to increase cost of attacks by detecting most effective attacks Anomaly Detection  Classify Items (as being possibly under attack)  Not dependent on known attack models  Can shed some light on which type of items are most vulnerable to which types of attacks But, what if the attack does not closely correspond to known attack signature In Practice: need a comprehensive framework combining both approaches

89 Conclusions  Why recommender systems?  Many algorithmic advances  more accurate and reliable systems  more confidence by users  Assist users in  Finding more relevant information, items, products  Give users alternatives  broaden user knowledge  Building communities  Help companies to  Better engage users and customers  building loyalty  Increase sales (on average 5-10%)  Problems and challenges  More complex Web-based applications  more complex user interactions  need more sophisticated models  Need to further explore the impact of recommendations on (a) user behavior and (b) on the evolution of Web communities  Privacy, security, trust

Web Personalization and Recommender Systems Bamshad Mobasher School of Computing, DePaul University Bamshad Mobasher School of Computing, DePaul University.

Similar presentations

Presentation on theme: "Web Personalization and Recommender Systems Bamshad Mobasher School of Computing, DePaul University Bamshad Mobasher School of Computing, DePaul University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Web Personalization and Recommender Systems Bamshad Mobasher School of Computing, DePaul University Bamshad Mobasher School of Computing, DePaul University.

Similar presentations

Presentation on theme: "Web Personalization and Recommender Systems Bamshad Mobasher School of Computing, DePaul University Bamshad Mobasher School of Computing, DePaul University."— Presentation transcript:

Similar presentations

About project

Feedback