Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle

Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle michelcapelle@gmail.com Frederik Hogenboom fhogenboom@ese.eur.nl Alexander Hogenboom hogenboom@ese.eur.nl Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands

Introduction (1) Recommender systems help users to plough through a massive and increasing amount of information Recommender systems: –Content-based –Collaborative filtering –Hybrid Content-based systems often make term-based comparisons between user profiles and items Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988] 28th Symposium On Applied Computing 2013 (SAC 2013)

Introduction (2) One could take into account semantics: –Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF): Measures (cosine) similarity using item and profile concept scores Reduces noise caused by non-meaningful terms Yields less terms to evaluate Allows for semantic features, e.g., synonyms Relies on a domain ontology –Synsets instead of concepts → Synset Frequency – Inverse Document Frequency (SF-IDF): Similar to CF-IDF Measures (cosine) similarity using item and profile synset scores Does not rely on a domain ontology Relies on a large semantic lexicon: WordNet 28th Symposium On Applied Computing 2013 (SAC 2013)

Introduction (3) One could take into account semantics: –Semantic Similarity (SS) recommender: Measures similarity between item and profile synsets Various similarity measures: Jiang & Conrath [1997], Leacock & Chodorow [1998], Lin [1998], Resnik [1995], Wu & Palmer [1994] Outperforms TF-IDF, CF-IDF, and SF-IDF SS recommenders seem to be a good choice, but: –No support for named entities (persons, companies, …) –Many of these are used in texts, e.g., news 28th Symposium On Applied Computing 2013 (SAC 2013)

Introduction (4) Hence, we propose the BingSS recommender: –Bing: Identifies similarities of named entities Uses Bing page counts Bing offered a free API at the time of writing –SS: Identifies similarities of known synsets Uses WordNet synsets Wu & Palmer similarity Implementation in Ceryx (as a plug-in for the Hermes news processing framework [Frasincar et al., 2009]) 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: User Profile User profile consists of all read news items Implicit preference for specific topics 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: Preprocessing Before recommendations can be made, each news item is parsed: –Tokenizer –Sentence splitter –Lemmatizer –Part-of-Speech 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: Synsets We make use of the WordNet dictionary and WSD Each word has a set of senses and each sense has a set of semantically equivalent synonyms (synsets): –Turkey: turkey, Meleagris gallopavo (animal) Turkey, Republic of Turkey (country) joker, turkey (annoying person) turkey, bomb, dud (failure) –Fly: fly, aviate, pilot (operate airplane) flee, fly, take flight (run away) Synsets are linked using semantic pointers –Hypernym, hyponym, … 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: Bing Bing similarity score is calculated by computing the pair-wise similarities between all named entities u and r in an unread document U and the user profile R : V is a vector with all combinations of named entities from U and R, and sim PMI (u,r) is the Point-Wise Mutual Information co-occurrence measure for u and r We only consider the top β Bing entity pairs with the highest similarity in V 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: SS (1) TF-IDF, CF-IDF, and SF-IDF use cosine similarity: –Two vectors: User profile items scores News message items scores –Measures the cosine of the angle between the vectors Semantic Similarity (SS): –Two vectors: User profile synsets News message synsets –Jiang & Conrath [1997], Resnik [1995], and Lin [1998]: information content of synsets –Leacock & Chodorow [1998] and Wu & Palmer [1994]: path length between synsets 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: SS (2) SS similarity score is calculated by computing the pair-wise similarities between all synsets u and r in an unread document U and the user profile R : W is a vector with all combinations of synsets from U and R that have a common Part-of-Speech, and sim(u,r) is any of the mentioned SS measures We only consider the top β SS synset pairs with the highest similarity in W 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: BingSS We take the weighted average of Bing page counts sim Bing and SS scores sim SS : where weight α is optimized during training 28th Symposium On Applied Computing 2013 (SAC 2013)

Implementation: Hermes Hermes framework is utilized for building a news personalization service for RSS Its implementation is the Hermes News Portal (HNP): –Programmed in Java –Uses OWL / SPARQL / Jena / GATE / WordNet 28th Symposium On Applied Computing 2013 (SAC 2013)

Implementation: Ceryx Ceryx is a plug-in for HNP Uses WordNet / Stanford POS Tagger / JAWS Lemmatizer / Lesk WSD / Alias-I LingPipe 4.1.0 Named Entity Recognizer / Bing API 2.0 Main focus is on recommendation support User profiles are constructed Computes SS and BingSS 28th Symposium On Applied Computing 2013 (SAC 2013)

Evaluation (1) Experiment: –We evaluate 100 news items on their correspondence with 8 topics (USA, Microsoft or competitors, Google or competitors, financial markets, …) –User profile: all articles that are related to each of the topics –Ceryx computes SS and BingSS with various cut-offs –Measurements: Accuracy Precision Recall Specificity F 1 -measure 28th Symposium On Applied Computing 2013 (SAC 2013)

Evaluation (2) 28th Symposium On Applied Computing 2013 (SAC 2013)

Evaluation (3) Results: –Optimized cut-off values are 0.49 (SS) and 0.63 (BingSS) –BingSS recommendation outperforms SS recommendation on accuracy, precision, specificity, and F 1 –This comes at the cost of a reduced recall –For BingSS, named entity similarities are more important than synset similarities ( α = 0.72 vs. 0.28) 28th Symposium On Applied Computing 2013 (SAC 2013) MeasureSSBingSS Accuracy64.2%73.1% Precision44.0%54.0% Recall73.1%62.9% Specificity60.2%77.4% F 1 -measure54.3%58.1%

Conclusions Semantics-based recommendation can be performed by means of synsets from a semantic lexicon (SS) Named entities are not included, but can be considered through search page counts (BingSS) BingSS outperforms SS and named entities are considered to be more important than synsets Future work: –Also include page counts for synsets –Apply named entity page counts to other methods, e.g., TF- IDF, CF-IDF, or SF-IDF 28th Symposium On Applied Computing 2013 (SAC 2013)

Questions 28th Symposium On Applied Computing 2013 (SAC 2013)

Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle

Similar presentations

Presentation on theme: "Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle

Similar presentations

Presentation on theme: "Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle"— Presentation transcript:

Similar presentations

About project

Feedback