Presentation is loading. Please wait.

Presentation is loading. Please wait.

Content-based Recommendation Systems Group: Tippy.

Similar presentations

Presentation on theme: "Content-based Recommendation Systems Group: Tippy."— Presentation transcript:

1 Content-based Recommendation Systems Group: Tippy

2 Group Members Nerin George Goal Models + Presentation Deepan Murugan Domain Models + Presentation Thach Tran Strategies + Presentation

3 Outline Introduction Item Representation User Profiles Manual Recommendation Methods Learning A User Model Classification Learning Algorithms Decision Trees and Rule Induction Nearest Neighbour Methods Conclusions Q & A

4 Introduction The WWW is growing exponentially. Many websites become enormous in term of size and complexity Users need help in finding items that are in accordance with their interests Recommendation –Content-based recommendation: recommend an item to a user based upon a description of the item and a profile of the users interests

5 Introduction Pazzani, M. J., & Billsus, D. (2007). Content- Based Recommendation Systems. Lecture Notes in Computer Science. (4321), Pazzani, M. J., & Billsus, D. (2007). Content- Based Recommendation Systems. Lecture Notes in Computer Science. (4321),

6 Related Research Recommender systems present items (e.g., movies, books, music, images, web pages, news, etc.) that are likely of interest to the user compare the users profile to some reference characteristics to predict whether the user would be interested in an unseen item Reference characteristics Information about the unseen item content-based approach Users social environment collaborative filtering approach

7 Item Representation Items stored in a database table Structured data Small number of attributes Each item is described by the same set of attributes Known set of values that the attributes may have Straightforward to work with Users profile contains positive rating for 1001, 1002, 1003 Would the user be interested in say Oscars (French cuisine, table service)?Oscars IDNameCuisineServiceCost 1001Mikes PizzaItalianCounterLow 1002Chriss CaféFrenchTableMedium 1003Jacques BistroFrenchTableHigh

8 Item Representation Information about item could also be free text; e.g., text description or review of the restaurant, or news articles Unstructured data No attribute names with well-defined values Natural language complexity Same word with different meanings Different words with same meaning Need to impose structure on free text before it can be used in recommendation algorithm

9 TF*IDF Weighting First, stemming is applied to get the root forms of words compute, computation, computer, computes, etc., are represented by one term Compute a weight for each term that represents the importance or relevance of that term

10 TF*IDF Weighting Term frequency tf t,d of a term t in a document d Inverse document frequency idf t of a term t TF*IDF weighting

11 TF*IDF Weighting The term with highest weight occur more often in that document than in other documents more central to the topic of the document Limitations This method does not capture the context in which a word is used This restaurant does not serve vegetarian dishes

12 User Profiles A profile of the users interests is used by most recommendation systems This profile consists of two main types of information A model of the users preferences. E.g., a function that for any item predicts the likelihood that the user is interested in that item Users interaction history. E.g., items viewed by a user, items purchased by a user, search queries, etc.

13 User Profiles Users history will be used as training data for a machine learning algorithm that creates a user model Manual recommending approaches User customisation Provide check box interface that let the users construct their own profiles of interests A simple database matching process is used to find items that meet the specified criteria and recommend these to users.

14 User Profiles Limitations Require efforts from users Cannot cope with changes in users interests Do not provide a way to determine order among recommending items

15 User Profiles Manual recommending approaches Rule-based Recommendation The system has rules to recommend other products based on user history Rule to recommend sequel to a book or movie to customers who purchased the previous item in the series Can capture common reasons for making recommendations

16 Learning a User Model Creating a model of the users preference from the user history is a form of classification learning The training data (i.e., users history) could be captured through explicit feedback (e.g., user rates items) or implicit observing of users interactions (e.g., user bought an item and later returned it is a sign of user doesnt like the item) Implicit method can collect large amount of data but could contains noise while data collected through explicit method is perfect but the amount collected could be limited

17 Learning a User Model Next, a number of classification learning algorithms are reviewed The main goal of these classification learning algorithms is to learn a function that model the users interests Applying the function on a new item can give the probability that a user will like this item or a numeric value indicating the degree of interest in this item

18 Decision Trees and Rule Induction Given the history of users interests as training data, build a decision tree which represents the users profile of interest Will the user like an inexpensive Mexican restaurant? CuisineServiceCostRating ItalianCounterLowNegative FrenchTableMedPositive FrenchCounterLowPositive …………

19 Decision Trees and Rule Induction Well-suited for structured data In unstructured data, the number of attributes becomes too enormous and consequently, the tree becomes too large to provide sufficient performance RIPPER: a rule induction algorithm based on the same principles but provide better performance in classifying text

20 Nearest Neighbour Methods Simply store all the training data in memory To classify a new item, compare it to all stored items using a similarity function and determine the nearest neighbour or the k nearest neighbours. The class or numeric score of the previously unseen item can then be derived from the class of the nearest neighbour.

21 Nearest Neighbour Methods unseen item needed to be classified positive rated items negative rated items k = 3: negative k = 5: positive

22 Nearest Neighbour Methods The similarity function depends on the type of data Structured data: Euclidean distance metric Unstructured data (i.e., free text): cosine similarity function

23 Euclidean Distance Metric Distance between A and B Attributes which are not measured quantitatively need to be labeled by numbers representing their categories Cuisine attribute: 1=Frech, 2=Italian, 3=Mexican. ItemAttr. XAttr. YAttr. Z AXAXA YAYA ZAZA BXBXB YBYB ZBZB

24 Cosine Similarity Function Vector space model An item or a document d is represented as a vector w t,d is the tf*idf weight of a term t in a document d The similarity between two items can then be computed by the cosine of the angle between two vectors

25 Nearest Neighbour Methods Despite the simplicity of the algorithm, its performance has been shown to be competitive with more complex algorithms

26 Other Classification Learning Algorithms Relevance Feedback and Rocchios Algorithm Linear Classifiers Probabilistic Methods and Naïve Bayes

27 Conclusions Can only be effective in limited circumstances. It is not straightforward to recognise the subtleties in content Depend entirely on previous selected items and therefore cannot make predictions about future interests of users These shortcomings can be addressed by collaborative filtering (CF) techniques CF is the dominant technique nowadays thanks to the popularity of Web 2.0/Social Web concept Many recommendation system utilise a hybrid of content-based and collaborative filtering approaches

28 Summary Content-based Recommendation Item Representation User Profiles Manual Recommendation Methods Learning A User Model Decision Trees and Rule Induction Nearest Neighbour Methods

29 Q & A

Download ppt "Content-based Recommendation Systems Group: Tippy."

Similar presentations

Ads by Google