Presentation is loading. Please wait.

Presentation is loading. Please wait.

HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki,

Similar presentations


Presentation on theme: "HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki,"— Presentation transcript:

1 HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems
Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki, Lise Getoor University of California, Santa Cruz University of Maryland, College Park San Jose State University Thanks for the introduction. This is joint work with my colleagues from the University of California Santa Cruz, the University of Maryland, and San Jose State University. In this talk I will introduce a general and extensible framework for constructing hybrid recommender systems.

2 Motivation Increasing amount of data useful for recommendations
content ratings But first, some motivation. We all know that the amount of data available in online settings such as recommender systems is rapidly increasing. It isn’t just that we have more users to recommend to, more items to recommend to them, and more recorded user interactions – the number of TYPES of recorded data is increasing as well. Traditional recommender systems make recommendations to users based on information such as ratings matrices, and the content features of items, such as a movie’s genre, or its lead actors or director. However, in many cases we have many other sources of data that we’d like to leverage. For example, if a user logs in to our recommender site via Facebook, we may have access to their social network, which could inform our recommendations. In other settings, we may have demographic information about users such as gender and nationality, which are also likely to be informative. As recommender websites continue to evolve, and web companies record every piece of data that they can in the hopes of monetizing it using machine learning, chances are that new kinds of useful data will continue to arise, some of which will likely be completely new to us. social demographic

3 Multiple Data Sources Combining ratings with other
Content [Gunawardana and Meek, RecSys 2009] [Forbes and Zhu, RecSys 2011] [de Campos et al., IJAR 51(7) 2010] Social relationships [Ma et al., WSDM 2011] [Liu et al., DSS 55(3) 2013] Combining ratings with other data sources improves performance But is all of this data really useful? The growing body of work on hybrid recommender systems has demonstrated that the answer is YES. Combining ratings data with other data sources can improve recommendation performance.

4 Multiple Data Sources Combining ratings with other
Content [Gunawardana and Meek, RecSys 2009] [Forbes and Zhu, RecSys 2011] [de Campos et al., IJAR 51(7) 2010] Social relationships [Ma et al., WSDM 2011] [Liu et al., DSS 55(3) 2013] Combining ratings with other data sources improves performance Many authors have found a benefit to incorporating content information together with collaborative filtering techniques based on ratings data, including Gunawardana and Meek, Forbes and Zhu, and de Campos et al. As you’d expect, content is very much predictive of user preferences – if a user likes one horror movie, it’s probable that they will like other horror movies as well.

5 Multiple Data Sources Combining ratings with other
Content [Gunawardana and Meek, RecSys 2009] [Forbes and Zhu, RecSys 2011] [de Campos et al., IJAR 51(7) 2010] Social relationships [Ma et al., WSDM 2011] [Liu et al., DSS 55(3) 2013] Combining ratings with other data sources improves performance Social network relationships have also been found to be beneficial, e.g. by Ma et al. and Liu et al. This shouldn’t be surprising – user-user collaborative filtering makes recommendations by finding similar users and recommending the movies that they like. The social networking principle of homophily states that we associate with people who are similar to us, so our friends are likely to share our tastes.

6 Multiple Data Sources Combining ratings with other
Review text [McAuley & Leskovec, RecSys 2013] [Ling et al., RecSys, 2014] Tags and labels [Guy et al., SIGIR 2010] Feedback [Sedhain et al., RecSys, 2014] Combining ratings with other data sources improves performance Other kinds of data that have been found to be useful include review text, e.g. McAuley & Leskovec and Ling et al, #cool #neat #ok #sucks

7 Multiple Data Sources Combining ratings with other
Review text [McAuley & Leskovec, RecSys 2013] [Ling et al., RecSys, 2014] Tags and labels [Guy et al., SIGIR 2010] Feedback [Sedhain et al., RecSys, 2014] Combining ratings with other data sources improves performance Hashtags and other label information such as Guy et al., #cool #neat #ok #sucks

8 Multiple Data Sources Combining ratings with other
Review text [McAuley & Leskovec, RecSys 2013] [Ling et al., RecSys, 2014] Tags and labels [Guy et al., SIGIR 2010] Feedback [Sedhain et al., RecSys, 2014] Combining ratings with other data sources improves performance and feedback on other items, such as likes on Facebook pages, e.g. Sedhain et al. Hybrid systems such as these are typically the most crucial in the cold-start setting, where the extra data sources compensate for the lack of available ratings data. For example, Sedhain et al. were able to leverage side information to successfully recommend in the extreme cold-start setting, where zero interaction data were available, resulting in a 3-fold improvement over naïve approaches which did not use the additional data sources. #cool #neat #ok #sucks

9 Multiple Recommenders
Combining predictions of multiple recommenders also improves performance [Jahrer et al., KDD 2010] [Burke, In The Adaptive Web, 2007] “Predictive accuracy is substantially improved when blending multiple predictors” -[Bell et al., The BellKor Solution to the Netflix Prize, 2007] See also: Ensemble methods, which combine the predictions of multiple recommender algorithms, are also known to improve performance. The most famous example of this is the Netflix Prize competition. The winners of the competition, Bell et al. of BellKor’s Pragmatic Chaos, stated that “predictive accuracy is substantially improved when blending multiple predictors,” and this was one of the biggest lessons learned from the contest. Ensemble methods for recommender systems have been studied much further in the literature, e.g. by Jahrer et al., and a summary of ensemble approaches is provided in a book chapter by Burke.

10 Desiderata for Hybrid Systems
To get the best performance, we should make use of all available data sources and algorithms We need a framework that is: General Combines arbitrary data modalities Combines multiple recommenders problem and data-agnostic Extensible to new information sources/recommenders Scalable to large data sets The take-away lesson from all of these papers is simple: if we have data, we ought to use it! In order to get the best recommendation performance for our recommendation tasks, we need a hybrid system that can make use of any and all data sources that we have available, including the predictions of any other algorithms that we can execute. We need a framework that is general in the sense that it can combine arbitrary data modalities, and the predictions of multiple other recommender systems, and is not tied to a single problem or dataset.

11 Desiderata for Hybrid Systems
To get the best performance, we should make use of all available data sources and algorithms We need a framework that is: General Combines arbitrary data modalities Combines multiple recommenders problem and data-agnostic Extensible to new information sources/recommenders Scalable to large data sets The system needs to be extensible so that it can make use of new information sources or algorithms as they become available, since we can never know what other types of data might be around the corner.

12 Desiderata for Hybrid Systems
To get the best performance, we should make use of all available data sources and algorithms We need a framework that is: General Combines arbitrary data modalities Combines multiple recommenders problem and data-agnostic Extensible to new information sources/recommenders Scalable to large data sets Finally, to be practical in real-world applications, the system also needs to scale to the large data sets that we routinely encounter in an industry setting.

13 General Hybrid Recommenders in the Literature
Existing hybrid systems, though powerful, typically fall short on either generality, extensibility, or scalability Often combine collaborative and/or content-based methods with each other or just one other data modality (cf. previous slides) Some systems can leverage heterogeneous data [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014] Probabilistic graphical modeling approaches are typically more general, less scalable Bayesian networks [de Campos et al., IJAR 51(7) 2010] Markov logic networks [Hoxha & Rettinger, ICMLA 2013] There are many powerful techniques for hybrid recommendation which have been shown to benefit prediction using additional data, however these systems typically fall short on at least one of our desiderata: generality, extensibility, or scalability. Many hybrid recommenders, such as those discussed in the previous slides, combine collaborative filtering with only simple content features, or just one other data modality. Some systems can leverage heterogeneous data types, although they are not extensible to new data types that do not fit into their data paradigm.

14 General Hybrid Recommenders in the Literature
Existing hybrid systems, though powerful, typically fall short on either generality, extensibility, or scalability Often combine collaborative and/or content-based methods with each other or just one other data modality (cf. previous slides) Some systems can leverage heterogeneous data [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014] Probabilistic graphical modeling approaches are typically more general, less scalable Bayesian networks [de Campos et al., IJAR 51(7) 2010] Markov logic networks [Hoxha & Rettinger, ICMLA 2013] On the other hand, probabilistic graphical modeling approaches are typically more general than other hybrid models, but suffer in terms of scalability when it comes to practical real-world recommendation tasks. It has been proposed to use either directed Bayesian networks, or undirected models in the form of Markov logic networks, for hybrid recommendation. These models are expected to be computationally challenging in the general case, as Inference in these formalisms is NP-hard.

15 Our Approach We propose HyPER:
Hybrid Probabilistic Extensible Recommender A general, extensible, scalable recommender framework Leverages advances in statistical relational learning Probabilistic soft logic [Bach et al., UAI 2013, ArXiv 2015] Inspired by recent work in drug-target interaction prediction [Fakhraei et al., Transactions on Computational Biology and Bioinformatics 11(5) 2014] To address this challenge, we propose a hybrid framework which we call HyPER: the hybrid probabilistic extensible recommender. HyPER is a general, extensible, and scalable recommender framework, which we accomplish by leveraging advances in statistical relational learning. In particular, we make use of an SRL system called probabilistic soft logic, or PSL, from Lise Getoor’s group at the University of Maryland, and now the University of California, Santa Cruz. HyPER is inspired by recent and related work in predicting drug-target interaction, which is another bipartite prediction setting with some similarities to recommendation. That study, due to my colleagues Fakhraei et al., also leveraged probabilistic soft logic, and our approach can be viewed as both extending and adapting this work to the recommendation domain.

16 Hybrid Modeling with HyPER
3 Data Source Recommender 4 Predicted Ratings While a typical recommender system uses data from a single data source to build a recommendation model and predict ratings,

17 Hybrid Modeling with HyPER
3 Data Source 1 Recommender 4 Data Source 2 Predicted Ratings our proposed HyPER approach can leverage arbitrary data sources, Data Source N

18 Hybrid Modeling with HyPER
Data Source 1 Recommender 1 HyPER 3 Recommender 2 4 Data Source 2 Recommender M Predicted Ratings as well as multiple input recommender algorithms, and aggregate them into a final unified prediction of unobserved ratings. Data Source N

19 HyPER: High-Level Approach
User-item ratings viewed as a weighted bipartite graph Build hybrid model by adding links to encode additional information multiple user and item similarities, social information,… Predict ratings by reasoning over the graph, via a graphical model At a high level, our approach with HyPER is as follows. We begin by viewing user-item ratings as a weighted bipartite graph, as depicted in the figure on the right. Here, users and items are nodes in the graph, and edges constitute rating interactions, with each edge containing a weight corresponding to the rating given by the user to the item.

20 HyPER: High-Level Approach
User-item ratings viewed as a weighted bipartite graph Build hybrid model by adding links to encode additional information multiple user and item similarities, social information,… Predict ratings by reasoning over the graph, via a graphical model We build a hybrid model by adding further links to this graph to encode additional information, such as multiple user and item similarity measures, social information, demographic information, and any other data sources that are available to us.

21 HyPER: High-Level Approach
User-item ratings viewed as a weighted bipartite graph Build hybrid model by adding links to encode additional information multiple user and item similarities, social information,… Predict ratings by reasoning over the graph, via a graphical model Finally, we predict the ratings by reasoning over this extended graph, by way of inference in a graphical model corresponding to this graph. Scalability is achieved by using a particularly scalable class of graphical models, called hinge-loss Markov random fields, which I will introduce soon.

22 Extended Recommendation Graph
We begin with our bipartite graph of ratings, shown by the green edges in the figure.

23 Extended Recommendation Graph
To combine user-user neighborhood-based techniques, we add edges into the graph between neighbors for any and all available user-user similarity measures. This may also include social network friendship links, if they are available.

24 Extended Recommendation Graph
Similarly, we incorporate item-item neighborhood-based techniques by including edges in the graph for neighbors according to all available item-item similarity measures.

25 Extended Recommendation Graph
Each additional data source is encoded by extra nodes in the graph, which are linked to the users and items that they correspond to.

26 Extended Recommendation Graph
Finally, each input base recommender system in our hybrid/ensemble provides predictions on the rating edges. We can view each base recommender as a node in the graph, with hypergraph edges added to our extended recommendation graph for each prediction that the base recommender makes.

27 Modeling and Reasoning over the Graph
Hinge-loss Markov random fields (HL-MRFs) [Bach et al., UAI 2013] Exact, efficient, and scalable inference Continuous random variables Models defined by PSL programs Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015] Statistical relational learning system Logical probabilistic programming interface Templating language for HL-MRFs Once we’ve defined our recommendation graph, we need to reason over it to make predictions. We define a graphical model over this graph using a special kind of graphical model called a hinge-loss Markov random field, due to Bach et al. of Lise Getoor’s research group, formerly at the University of Maryland. The particular formalism of hinge-loss Markov random fields, or HL-MRFs, allows exact, efficient and scalable MAP inference. HL-MRFs are defined over continuous random variables, such as ratings. Another advantage of these models is that they can be easily specified by an intuitive modeling language, called probabilistic soft logic, or PSL.

28 Modeling and Reasoning over the Graph
Hinge-loss Markov random fields (HL-MRFs) [Bach et al., UAI 2013] Exact, efficient, and scalable inference Continuous random variables Models defined by PSL programs Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015] Statistical relational learning system Logical probabilistic programming interface Templating language for HL-MRFs PSL is a statistical relational learning system with a logical probabilistic programming language interface. A PSL program is a template for HL-MRFs – after instantiating, or grounding a PSL program for a given dataset the result is a hinge-loss MRF graphical model.

29 Hinge-loss Markov Random Fields
Conditional random field over continuous random variables between 0 and 1 Hinge-loss MRFs are conditional random field models over continuous random variables between 0 and 1.

30 Hinge-loss Markov Random Fields
Conditional random field over continuous random variables between 0 and 1 Their feature functions are hinge-loss functions, which you may be familiar with from SVMs and rectified linear units for neural networks. Feature functions are hinge loss functions

31 Hinge-loss Markov Random Fields
Conditional random field over continuous random variables between 0 and 1 A hinge loss is a max of a linear function and 0. Feature functions are hinge loss functions

32 Hinge-loss Markov Random Fields
Conditional random field over continuous random variables between 0 and 1 Feature functions are hinge loss functions Linear function

33 Hinge-loss Markov Random Fields
Conditional random field over continuous random variables between 0 and 1 This results in a "hinge-shaped" function, which consists of a linear piece that gets truncated at 0. Feature functions are hinge loss functions Linear function

34 Hinge-loss Markov Random Fields
Conditional random field over continuous random variables between 0 and 1 In HL-MRFs, we can optionally square the hinge-loss potential, which leads to a curved hinge. Feature functions are hinge loss functions 2 Linear function

35 Hinge-loss Markov Random Fields
Conditional random field over continuous random variables between 0 and 1 Hinge loss potentials encode rules from probabilistic soft logic by outputting the distance to satisfaction for each logical rule - in other words, how badly is the rule violated? Probability in the MRF model is thereby determined by the weighted distance to satisfaction of the rules in the system. Feature functions are hinge loss functions 2 Hinge losses encode the distance to satisfaction for each instantiated rule Linear function

36 Efficient Inference in HL-MRFs
Energy function is convex, can find a global MAP state The alternating direction method of multipliers (ADMM) is used for efficient and scalable inference Because the energy function of of HL-MRFs is convex, it has a single global optimum and we can perform MAP inference to find it exactly by solving a convex optimization problem. We can also scale up inference to large datasets by using the alternating direction method of multipliers (ADMM), which is an inherently parallelizable algorithm. The ADMM approach for inference in HL-MRFs was proposed by Bach et al., and uses an ADMM method called consensus ADMM. Intuitively, the consensus ADMM algorithm makes local copies of each variable, then relaxes the constraint that the local copies are equal to the global solution. The local and global variables then "argue" with each other until the algorithm reaches a "consensus" at the optimal solution.

37 Probabilistic Soft Logic
Statistical relational learning language Uses first-order logical rules Τemplates HL-MRFs logical operators Probabilistic soft logic is a statistical relational learning language which uses first-order logical rules to template HL-MRFs. For example, the following rule expresses the intuition that if a user likes a genre, such as horror movies, then that user is likely to rate movies of that genre highly. w : LikesGenre(U, G) && IsGenre(M, G)  Rating(U, M) predicates weight

38 Probabilistic Soft Logic
Statistical relational learning language Uses first-order logical rules Τemplates HL-MRFs PSL rules consist of predicates, such as likesGenre, isGenre, and rating, w : LikesGenre(U, G) && IsGenre(M, G)  Rating(U, M) predicates weight

39 Probabilistic Soft Logic
Statistical relational learning language Uses first-order logical rules Τemplates HL-MRFs logical operators which are connected to each other by logical operators. w : LikesGenre(U, G) && IsGenre(M, G)  Rating(U, M) predicates weight

40 Probabilistic Soft Logic
Statistical relational learning language Uses first-order logical rules Τemplates HL-MRFs logical operators Each first-order rule is given a weight, which determines the importance of the rule, relative to the other rules. Together, the rules define a PSL model. w : LikesGenre(U, G) && IsGenre(M, G)  Rating(U, M) predicates weight

41 Probabilistic Soft Logic
Converts rules to hinge-loss potentials PSL program = rules + data Open source: LikesGenre(U, G) && IsGenre(M, G)  Rating(U, M) hinge-loss Each instantiated rule is converted into a hinge-loss potential, and together these potential functions define an HL-MRF model.

42 Probabilistic Soft Logic
Converts rules to hinge-loss potentials PSL program = rules + data Open source: LikesGenre(U, G) && IsGenre(M, G)  Rating(U, M) max{LikesGenre(U, G) + IsGenre(M, G) - Rating(U, M) -1, 0} hinge-loss For example, this rule is converted into a potential function consisting of the max of 0 and likesGenre(user, genre) + isGenre(movie, genre) – rating(user,movie) – 1.

43 Probabilistic Soft Logic
Converts rules to hinge-loss potentials PSL program = rules + data Open source: LikesGenre(U, G) && IsGenre(M, G)  Rating(U, M) max{LikesGenre(U, G) + IsGenre(M, G) - Rating(U, M) -1, 0} hinge-loss This is a hinge-loss function which is corresponds to a potential function in the hinge-loss MRF graphical model. A full PSL program contains the rules, and the data, which allow us to instantiate the rules into a hinge-loss MRF model which we can infer. PSL is open source software, and is available for free at the link on this slide.

44 Recommendations with HyPER
Similar items get similar ratings from a user e.g. cosine, adjusted cosine, Pearson, content SimilarItemssim(i1, i2) && Rating(u, i1)  Rating(u, i2) SimilarItems(i1,i2) Rating(u,i1) = 5 Rating(u,i1) = ? Having overviewed HyPER, hinge-loss MRFs and probabilistic soft logic, I can now introduce the PSL rules we use to define our hybrid models. This slide shows a rule for combining item-item neighborhood methods with different similarity measures. As per standard item-item collaborative filtering, the intuition is that similar items will get similar ratings from a given user. We can plug in as many similarity measures as we’d like by creating a rule such as this one for each similarity, such as cosine similarity, adjusted cosine, Pearson correlation, or content similarity. Intuitively, the rule states that IF items i1 and i2 are similar, for a given similarity measure, AND user u rated item i1 highly, THEN user u will rate item i2 highly. The figure shows this reasoning in terms of the recommendation graph – the predicted rating on the hypotenuse is inferred by propagating the rating at the top along the similarity link on the right. Note that PSL is effective for combining similarities as it was originally designed for this task. Before it was called probabilistic soft logic, it was called probabilistic similarity logic.

45 Recommendations with HyPER
Similar users give similar ratings to an item e.g. cosine, Pearson SimilarUserssim(u1, u2) && Rating(u1, i)  Rating(u2, i) SimilarUsers(u1,u2) Rating(u1,i) = 4 Rating(u2,i) = ? We can include similar rules to incorporate one or more user-user neighborhood relationships obtained with user-user similarities such as cosine similarity, or Pearson correlation. Intuitively, the rule states that IF users u1 and u2 are similar AND u1 rated i highly, then it is likely that u2 rated i highly as well. As we can see in the figure, once again the rating is propagated along the similarity link.

46 Recommendations with HyPER
Mean-centering priors Additional data sources Leveraging existing recommenders e.g. matrix factorization, item-based AverageUserRating(u)  Rating(u, i) AverageItemRating(i)  Rating(u, i) In neighborhood-based collaborative filtering, mean-centering corrections are frequently used to take into account the popularity of each item and the frugality of each user’s ratings. We can encode this intuition in our HyPER models with PSL rules that penalize the difference from the mean rating of each user, and each item. The rules shown here penalize the rating being lower than the average for users and for items. We also include versions of these rules with the predicates negated, not shown, which penalize the rating being higher than the average. We also include mean-centering rules across all ratings, which can be useful when we have no information about a user or an item.

47 Recommendations with HyPER
Mean-centering priors Social network links Leveraging existing recommenders e.g. matrix factorization, item-based AverageUserRating(u)  Rating(u, i) AverageItemRating(i)  Rating(u, i) Friends(u1, u2) && Rating (u1, i)  Rating(u2, i) Social network links between users can be incorporated similarly to user-user similarities. This rule states that IF users u1 and u2 are friends, and u1 rated item i highly, then u2 is also likely to rate item i highly.

48 Recommendations with HyPER
Mean-centering priors Social network links Leveraging existing recommenders e.g. matrix factorization, item-based AverageUserRating(u)  Rating(u, i) AverageItemRating(i)  Rating(u, i) Friends(u1, u2) && Rating (u1, i)  Rating(u2, i) We can also combine the predictions of existing recommender systems into a HyPER model, such as matrix factorization algorithms or the output of another item-based collaborative filtering method. The rules penalize the differences between the final predicted ratings and the ratings predicted by the base recommender. As for the mean-centering rules, we also include the negative version of these rules to penalize a difference in the other direction. RatingRecommender(u, i)  Rating(u, i)

49 Recommendations with HyPER
Mean-centering priors Social network links Leveraging existing recommenders e.g. matrix factorization, item-based AverageUserRating(u)  Rating(u, i) AverageItemRating(i)  Rating(u, i) Friends(u1, u2) && Rating (u1, i)  Rating(u2, i) It’s important to note that HyPER is highly extensible beyond the rules and data types that I have shown here. It’s simple to include more data sources: just add rules! A practitioner can add custom rules to flexibly incorporate all of the data sources that are available in her specific recommendation setting. If a new data source becomes available, all she has to do is add another first-order PSL rule to include it, and then retrain the model. RatingRecommender(u, i)  Rating(u, i) Extensible to new data/algorithms – just add rules!

50 Balancing the Rules Balancing done through weights wj
Higher wj indicates a more important rule Weight learning by approximating a gradient step in the conditional log-likelihood: Training the model consists of learning weights w_j to balance the relative importance of each rule, i.e. each information source. A higher w_j indicates that rule j is more important. So if the rule for item-item similarity with a cosine similarity measure has a higher weight than the rule for user-level mean-centering, then item-item cosine similarity will essentially get a larger vote on the predicted rating. We perform weight learning via an approximation to gradient descent on the conditional log-likelihood function. The exact gradient step, shown here, is intractable due to the expectation. Following Bach et al., we approximate this by replacing the expectation with the relatively easily computed MAP state, as in the voted perceptron algorithm.

51 Experimental Validation
Yelp academic dataset ~34k users, ~3.6k items, ~99k ratings ~81k friendships 514 business categories Last.fm ~1.8k users, ~17k items, ~92k ratings ~12k friendships ~9.7k artist tags Evaluation metrics: RMSE, MAE We evaluated our HyPER framework on two commonly used recommender system datasets: the Yelp academic dataset, and Last.fm. Yelp has 34,000 users, and 3,600 items, with around 99,000 ratings. It also has social network friendship information, as well as categories for the businesses, which can be viewed as content information. Last.fm, on the other hand, has fewer users than items, with 1,800 users and 17,000 items, and around 92,000 ratings. It also has friendship links, as well as content information in the form of artist tags. We computed RMSE and MAE as evaluation metrics, using 5-fold cross-validation.

52 Baselines Collaborative filtering systems Hybrid Systems
Item-based cf. [Ning et al., In Recommender Systems Handbook, 2015] Matrix factorization (MF) cf. [Koren et al., IEEE Computer 42(8) 2009] Bayesian probabilistic matrix factorization (BPMF) [Salakhutdinov & Mnih., ICML 2008] Hybrid Systems Naïve hybrid (averaged predictions) BPMF with social relations and content (BPMF-SRIC) [Liu et al., DSS 55(3) 2013] Our baselines were item-based collaborative filtering, matrix factorization, and Bayesian probabilistic matrix factorization, BPMF. We considered user-based collaborative filtering but it performed poorly so we did not report those results. We also compared to hybrid recommender systems: a naïve hybrid computed by averaging the predictions, and a sophisticated hybrid baseline due to Liu et al, which incorporates social relations and content into BPMF.

53 HyPER vs Baselines HyPER outperforms all other models in both datasets
In the following slides, the left-hand figures, in blue, are for Yelp, and the right-hand figures, in orange, are for Last.FM. The HyPER model is always the right-most bar in the charts, and is indicated by the blue arrows. The plots show RMSE on the Y-axes; the results for MAE are similar. Standard deviations are also shown around the top of each bar. Firstly, we find that our HyPER model outperforms all other models in both Yelp and Last.FM, and these results are statistically significant. HyPER outperforms all other models in both datasets Results statistically significant

54 HyPER Submodels: Mean-centering
We also studied the performance of HyPER more closely by examining the performance of submodels with each type of data. Here, we look at the results for each of the mean-centering rules in isolation, followed by a HyPER combination of these mean-centering rules. In both datasets the HyPER model which combines all mean-centering rules performs better than any one rule by itself. Interestingly, this is true despite the poor performance of some rules. In Yelp the user-average rule performed very poorly while the item-average rule performed well. The reverse was true in Last.FM. This is likely due in part to the different shapes of the ratings matrices between the two datasets. The overall average rating performed badly on Last.fm, which makes sense as many people strongly dislike so-called “popular” music. HyPER combined model beats individual rules

55 HyPER Submodels: User-based
In another submodel experiment, these plots show the results for user-based collaborative filtering rules with different similarity measures, both individually and combined into a hybrid model using HyPER. The combined HyPER model beats, or at least matches the best performing individual user-based model on each dataset. Columns 3 and 4 in each plot correspond to similarities computed in the latent spaces generated by matrix factorization. These similarities performed better than similarities computed in ratings space, in the left two columns, which is quite interesting in itself. According to these results, the latent space was a better representation than the rating space for user-based neighborhood methods in these two datasets. The results are similar for item-based rules, as well as content and social rules, which I omit in the interests of time. HyPER combined model beats/matches best individual rules Similar story for item-based, content & social

56 Combining the Baselines
Considering just the base methods, and a HyPER ensemble of those methods, we found that the HyPER ensemble performed better than any of the base methods by itself, and that this improvement was statistically significant. HyPER can combine different recommenders effectively Results statistically significant better

57 HyPER (All Rules) Finally, we compared the full HyPER model with all rules to each of the submodels considered in the previous slides. Once again, the full model performs the best, indicating that HyPER can successfully combine and balance the different information sources in order to improve performance. Combining all rules achieves the best performance in both datasets

58 Scaling to Large Datasets
Parallel implementation for inference and learning based on ADMM [Bach et al, UAI 2013] Scaling to big-data applications: perform inference in parallel on densely connected subgraphs of the original graph fully distributed implementation of ADMM We implemented HyPER using the publicly available PSL software of Bach et al., which uses a multi-threaded single-machine parallel implementation of ADMM. To scale this system to big-data applications, a simple approach would be to split the graph into densely connected subgraphs that each fit on a single machine, and execute inference independently for each subgraph. This could easily be performed with the current implementation of PSL. In order to avoid the approximation of independent subgraphs, a fully distributed implementation of ADMM can instead be used, similar to the one used by LinkedIn for their distributed logistic regression, which is implemented using Hadoop.

59 Conclusions HyPER is a general-purpose, extensible framework for hybrid recommender systems With HyPER, practitioners can define custom hybrid models for using all available data/algorithms, via logical rules in PSL HyPER outperforms existing techniques on two popular datasets To conclude, HyPER is a general-purpose, extensible framework for hybrid recommender systems. With HyPER, RecSys practitioners can define custom hybrid recommender models for using all available data sources and recommendation algorithms, via logical rules in the PSL language. It’s simple to add new data types to a HyPER model as they become available: just add rules! HyPER also performs very well in practice, outperforming existing state-of-the-art techniques on two popular datasets.

60 Conclusions HyPER is a general-purpose, extensible framework for hybrid recommender systems With HyPER, practitioners can define custom hybrid models for using all available data/algorithms, via logical rules in PSL HyPER outperforms existing techniques on two popular datasets Thank you for your attention! Thank you for your attention!

61 HyPER Submodels – Item-based, Content & Social

62 References X. Ning, C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation methods. In Recommender Systems Handbook. 2nd edition, Springer, 2015 S. Fakhraei, B. Huang, L. Raschid, and L. Getoor. Network-based drug-target interaction prediction with probabilistic soft logic. Transactions on Computational Biology and Bioinformatics, 11(5), J. Liu, C. Wu, and W. Liu. Bayesian probabilistic matrix factorization with social relations and item contents for recommendation. Decision Support Systems, 55(3), R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In ICML, Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8), A. Gunawardana and C. Meek. A unified approach to building hybrid recommender systems. In RecSys, R. Burke. Hybrid web recommender systems. In The Adaptive Web. Springer, L. de Campos, J. Fernandez-Luna, J. Huete, and M. Rueda-Morales. Combining content-based and collaborative recommendations: A hybrid approach based on Bayesian networks. International Journal of Approximate Reasoning, 51(7), M. Jahrer, A. Toscher, and R. Legenstein. Combining predictions for accurate recommender systems. In KDD, 2010.

63 References J. Hoxha and A. Rettinger. First-order probabilistic model for hybrid recommendations. In ICMLA, S. H. Bach, B. Huang, B. London, and L. Getoor. Hinge-loss Markov random fields: Convex inference for structured prediction. In UAI, S.H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random fields and probabilistic soft logic. ArXiv: [cs.LG], A. P. Forbes and M. Zhu. Content-boosted matrix factorization for recommender systems: Experiments with recipe recommendation. In RecSys, J. Chen, G. Chen, H. Zhang, J. Huang, and G. Zhao. Social recommendation based on multi-relational analysis. In WI-IAT, R. Burke, F. Vahedian, and B. Mobasher. Hybrid recommendation in heterogeneous networks. In User Modeling, Adaptation, and Personalization. Springer, J. Gemmell, T. S., B. Mobasher, and R. Burke. Resource recommendation in social annotation systems: A linear-weighted hybrid approach. Journal of Computer and System Sciences, 78(4), X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han. Personalized entity recommendation: A heterogeneous information network approach. In WSDM, H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In WSDM, J. McAuley and J. Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. In RecSys, G. Ling, M. R. Lyu, and I. King. Ratings meet reviews, a combined approach to recommend. In RecSys, I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel. Social media recommendation based on people and tags. In SIGIR, S. Sedhain, S. Sanner, D. Braziunas, L. Xie, and J. Christensen. Social collaborative ltering for cold-start recommendations. In RecSys, 2014.


Download ppt "HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki,"

Similar presentations


Ads by Google