- 1 -. - 2 - Content-based recommendation While CF – methods do not require any information about the items, it might be reasonable to exploit such information;

Slides:



Advertisements
Similar presentations
Recommender System A Brief Survey.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Recommender Systems & Collaborative Filtering
Content-based Recommendation Systems
Filtering Semi-Structured Documents Based on Faceted Feedback Lanbo Zhang, Yi Zhang, Qianli Xing Information Retrieval and Knowledge Management (IRKM)
Text Categorization.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.
Traditional IR models Jian-Yun Nie.
Machine Learning: Intro and Supervised Classification
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Boolean and Vector Space Retrieval Models
Co-funded by the European Union Semantic CMS Community Content Management From free text input to automatic entity enrichment Copyright IKS Consortium.
Prediction Modeling for Personalization & Recommender Systems Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Classification Classification Examples
Information Extraction Lecture 7 – Linear Models (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
Chapter 5: Introduction to Information Retrieval
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
K nearest neighbor and Rocchio algorithm
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Vector Space Model CS 652 Information Extraction and Integration.
Presented by Zeehasham Rasheed
Recommender systems Ram Akella November 26 th 2008.
Chapter 5: Information Retrieval and Web Search
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Recommender Systems: An Introduction to the Art Tony White
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Content-Based Recommendation Systems Michael J. Pazzani and Daniel Billsus Rutgers University and FX Palo Alto Laboratory By Vishal Paliwal.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Text mining.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Bayesian Networks. Male brain wiring Female brain wiring.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Modern Information Retrieval: A Brief Overview By Amit Singhal Ranjan Dash.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Chapter 6: Information Retrieval and Web Search
1 Computing Relevance, Similarity: The Vector Space Model.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Text mining. The Standard Data Mining process Text Mining Machine learning on text data Text Data mining Text analysis Part of Web mining Typical tasks.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Vector Space Models.
C.Watterscsci64031 Probabilistic Retrieval Model.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
User Modeling and Recommender Systems: recommendation algorithms
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Data Mining: Concepts and Techniques
Representation of documents and queries
Text Categorization Assigning documents to a fixed set of categories
Chapter 5: Information Retrieval and Web Search
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Presentation transcript:

- 1 -

- 2 - Content-based recommendation While CF – methods do not require any information about the items, it might be reasonable to exploit such information; and recommend fantasy novels to people who liked fantasy novels in the past What do we need: some information about the available items such as the genre ("content") some sort of user profile describing what the user likes (the preferences) The task: learn user preferences locate/recommend items that are "similar" to the user preferences "show me more of the same what I've liked"

- 3 - What is the "content"? Most CB-recommendation techniques were applied to recommending text documents. –Like web pages or newsgroup messages for example. Content of items can also be represented as text documents. –With textual descriptions of their basic characteristics. –Structured: Each item is described by the same set of attributes –Unstructured: free-text description. TitleGenreAuthorTypePriceKeywords The Night of the Gun MemoirDavid CarrPaperback29.90Press and journalism, drug addiction, personal memoirs, New York The Lace Reader Fiction, Mystery Brunonia Barry Hardcover49.90American contemporary fiction, detective, historical Into the FireRomance, Suspense Suzanne Brockmann Hardcover45.90American fiction, murder, neo-Nazism

- 4 - Item representation Content representation and item similarities Simple approach –Compute the similarity of an unseen item with the user profile based on the keyword overlap (e.g. using the Dice coefficient) –Or use and combine multiple metrics TitleGenreAuthorTypePriceKeywords The Night of the Gun MemoirDavid CarrPaperback29.90Press and journalism, drug addiction, personal memoirs, New York The Lace Reader Fiction, Mystery Brunonia Barry Hardcover49.90American contemporary fiction, detective, historical Into the Fire Romance, Suspense Suzanne Brockmann Hardcover45.90American fiction, murder, neo-Nazism User profile TitleGenreAuthorTypePriceKeywords …FictionBrunonia, Barry, Ken Follett Paperback25.65Detective, murder, New York

- 5 - Simple keyword representation has its problems –in particular when automatically extracted as not every word has similar importance longer documents have a higher chance to have an overlap with the user profile Standard measure: TF-IDF –Encodes text documents in multi-dimensional Euclidian space weighted term vector –TF: Measures, how often a term appears (density in a document) assuming that important terms appear more often normalization has to be done in order to take document length into account –IDF: Aims to reduce the weight of terms that appear in all documents

- 6 - TF-IDF II

- 7 - Example TF-IDF representation Example taken from Antony and Cleopatra Julius Caesar The Tempest HamletOthelloMacbeth Antony Brutus Caesar Calpurnia Cleopatra mercy worser

- 8 - Example TF-IDF representation Example taken from Antony and Cleopatra Julius Caesar The Tempest HamletOthelloMacbeth Antony Brutus Caesar Calpurnia Cleopatra mercy worser

- 9 - Improving the vector space model Vectors are usually long and sparse remove stop words –They will appear in nearly all documents. –e.g. "a", "the", "on", … use stemming –Aims to replace variants of words by their common stem –e.g. "went" "go", "stemming" "stem", … size cut-offs –only use top n most representative words to remove "noise" from data –e.g. use top 100 words

Improving the vector space model II Use lexical knowledge, use more elaborate methods for feature selection –Remove words that are not relevant in the domain Detection of phrases as terms –More descriptive for a text than single words –e.g. "United Nations" Limitations –semantic meaning remains unknown –example: usage of a word in a negative context "there is nothing on the menu that a vegetarian would like.." The word "vegetarian" will receive a higher weight then desired an unintended match with a user interested in vegetarian restaurants

Cosine similarity

Recommending items

Recommending items Retrieval quality depends on individual capability to formulate queries with right keywords. Query-based retrieval: Rocchio's method –The SMART System: Users are allowed to rate (relevant/irrelevant) retrieved documents (feedback) –The system then learns a prototype of relevant/irrelevant documents –Queries are then automatically extended with additional terms/weight of relevant documents

Rocchio details Document collections D + (liked) and D - (disliked) –Calculate prototype vector for these categories. Computing modified query Q i+1 from current query Q i with:,, used to fine-tune the feedback – weight for original query – weight for positive feedback – weight for negative feedback Often only positive feedback is used –More valuable than negative feedback

Practical challenges of Rocchio's method

Probabilistic methods Recommendation as classical text classification problem –long history of using probabilistic methods Simple approach: 2 classes: hot/cold simple Boolean document representation calculate probability that document is hot/cold based on Bayes theorem Doc-IDrecommenderintelligentlearningschoolLabel ?

Relevant Nonrelevant Most learning methods aim to find coefficients of a linear model A simplified classifier with only two dimensions can be represented by a line Other linear classifiers: –Naive Bayes classifier, Rocchio method, Windrow-Hoff algorithm, Support vector machines Linear classifiers

Improvements Side note: Conditional independence of events does in fact not hold –"New York", "Hong Kong" –Still, good accuracy can be achieved Boolean representation simplistic –positional independence assumed –keyword counts lost More elaborate probabilistic methods –e.g., estimate probability of term v occurring in a document of class C by relative frequency of v in all documents of the class Other linear classification algorithms (machine learning) can be used –Support Vector Machines,.. Use other information retrieval methods (used by search engines..)

Explicit decision models Decision tree for recommendation problems –inner nodes labeled with item features (keywords) –used to partition the test examples existence or non existence of a keyword –in basic setting only two classes appear at leaf nodes interesting or not interesting –decision tree can automatically be constructed from training data –works best with small number of features –use meta features like author name, genre,... instead of TF-IDF representation.

Explicit decision models II Rule induction –built on RIPPER algorithm –good performance compared with other classification methods eloborate postpruning techniques of RIPPER extension for classification –takes document structure into account main advantages of these decision models: –inferred decision rules serve as basis for generating explanations for recommendation –existing domain knowledge can be incorporated in models

On feature selection

Limitations of content-based recommendation methods Keywords alone may not be sufficient to judge quality/relevance of a document or web page up-to-date-ness, usability, aesthetics, writing style content may also be limited / too short content may not be automatically extractable (multimedia) Ramp-up phase required Some training data is still required Web 2.0: Use other sources to learn the user preferences Overspecialization Algorithms tend to propose "more of the same" Or: too similar news items

Discussion & summary In contrast to collaborative approaches, content-based techniques do not require user community in order to work Presented approaches aim to learn a model of user's interest preferences based on explicit or implicit feedback –Deriving implicit feedback from user behavior can be problematic Evaluations show that a good recommendation accuracy can be achieved with help of machine learning techniques –These techniques do not require a user community Danger exists that recommendation lists contain too many similar items –All learning techniques require a certain amount of training data –Some learning methods tend to overfit the training data Pure content-based systems are rarely found in commercial environments

Literature [Michael Pazzani and Daniel Billsus 1997] Learning and revising user profiles: The identification of interesting web sites, Machine Learning 27 (1997), no. 3, [Soumen Chakrabarti 2002] Mining the web: Discovering knowledge from hyper-text data, Science & Technology Books, 2002.