Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle

Slides:



Advertisements
Similar presentations
A Comparison Study for Novelty Control Mechanisms Applied to Web News Stories 2012 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2012)
Advertisements

RCQ-ACS: RDF Chain Query Optimization Using an Ant Colony System WI 2012 Alexander Hogenboom Erasmus University Rotterdam Ewout Niewenhuijse.
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Improved TF-IDF Ranker
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Hermes: News Personalization Using Semantic Web Technologies
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Connecting Customer Relationship Management Systems to Social Networks 7th International Conference on Knowledge Management, Services, and Cloud Computing.
Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
March 17, 2008SAC WT Hermes: a Semantic Web-Based News Decision Support System* Flavius Frasincar Erasmus University Rotterdam.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
June 19-21, 2006WMS'06, Chania, Crete1 Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
A News-Based Approach for Computing Historical Value-at-Risk International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) Frederik Hogenboom.
Personalisation Seminar on Unlocking the Secrets of the Past: Text Mining for Historical Documents Sven Steudter.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Intelligent Database Systems Lab Presenter : BEI-YI JIANG Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM ASSOCIATION FOR COMPUTING MACHINERY.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar Erasmus University Rotterdam * Joint work with Kim Schouten,
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Chapter 6: Information Retrieval and Web Search
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Measuring the Semantic Similarity of Texts Author : Courtney Corley and Rada Mihalcea Source : ACL-2005 Reporter : Yong-Xiang Chen.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Semantic Evaluation of Machine Translation Billy Wong, City University of Hong Kong 21 st May 2010.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Linguistic Graph Similarity for News Sentence Searching
Exploiting Wikipedia as External Knowledge for Document Clustering
Semantic Processing with Context Analysis
Web News Sentence Searching Using Linguistic Graph Similarity
Bing-SF-IDF+: A Hybrid Semantics-Driven News Recommender
Exploring and Navigating: Tools for GermaNet
News Recommendation with CF-IDF+
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Movie Recommendation System
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle Frederik Hogenboom Alexander Hogenboom Flavius Frasincar Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands

Introduction (1) Recommender systems help users to plough through a massive and increasing amount of information Recommender systems: –Content-based –Collaborative filtering –Hybrid Content-based systems often make term-based comparisons between user profiles and items Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988] 28th Symposium On Applied Computing 2013 (SAC 2013)

Introduction (2) One could take into account semantics: –Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF): Measures (cosine) similarity using item and profile concept scores Reduces noise caused by non-meaningful terms Yields less terms to evaluate Allows for semantic features, e.g., synonyms Relies on a domain ontology –Synsets instead of concepts → Synset Frequency – Inverse Document Frequency (SF-IDF): Similar to CF-IDF Measures (cosine) similarity using item and profile synset scores Does not rely on a domain ontology Relies on a large semantic lexicon: WordNet 28th Symposium On Applied Computing 2013 (SAC 2013)

Introduction (3) One could take into account semantics: –Semantic Similarity (SS) recommender: Measures similarity between item and profile synsets Various similarity measures: Jiang & Conrath [1997], Leacock & Chodorow [1998], Lin [1998], Resnik [1995], Wu & Palmer [1994] Outperforms TF-IDF, CF-IDF, and SF-IDF SS recommenders seem to be a good choice, but: –No support for named entities (persons, companies, …) –Many of these are used in texts, e.g., news 28th Symposium On Applied Computing 2013 (SAC 2013)

Introduction (4) Hence, we propose the BingSS recommender: –Bing: Identifies similarities of named entities Uses Bing page counts Bing offered a free API at the time of writing –SS: Identifies similarities of known synsets Uses WordNet synsets Wu & Palmer similarity Implementation in Ceryx (as a plug-in for the Hermes news processing framework [Frasincar et al., 2009]) 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: User Profile User profile consists of all read news items Implicit preference for specific topics 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: Preprocessing Before recommendations can be made, each news item is parsed: –Tokenizer –Sentence splitter –Lemmatizer –Part-of-Speech 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: Synsets We make use of the WordNet dictionary and WSD Each word has a set of senses and each sense has a set of semantically equivalent synonyms (synsets): –Turkey: turkey, Meleagris gallopavo (animal) Turkey, Republic of Turkey (country) joker, turkey (annoying person) turkey, bomb, dud (failure) –Fly: fly, aviate, pilot (operate airplane) flee, fly, take flight (run away) Synsets are linked using semantic pointers –Hypernym, hyponym, … 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: Bing Bing similarity score is calculated by computing the pair-wise similarities between all named entities u and r in an unread document U and the user profile R : V is a vector with all combinations of named entities from U and R, and sim PMI (u,r) is the Point-Wise Mutual Information co-occurrence measure for u and r We only consider the top β Bing entity pairs with the highest similarity in V 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: SS (1) TF-IDF, CF-IDF, and SF-IDF use cosine similarity: –Two vectors: User profile items scores News message items scores –Measures the cosine of the angle between the vectors Semantic Similarity (SS): –Two vectors: User profile synsets News message synsets –Jiang & Conrath [1997], Resnik [1995], and Lin [1998]: information content of synsets –Leacock & Chodorow [1998] and Wu & Palmer [1994]: path length between synsets 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: SS (2) SS similarity score is calculated by computing the pair-wise similarities between all synsets u and r in an unread document U and the user profile R : W is a vector with all combinations of synsets from U and R that have a common Part-of-Speech, and sim(u,r) is any of the mentioned SS measures We only consider the top β SS synset pairs with the highest similarity in W 28th Symposium On Applied Computing 2013 (SAC 2013)

Framework: BingSS We take the weighted average of Bing page counts sim Bing and SS scores sim SS : where weight α is optimized during training 28th Symposium On Applied Computing 2013 (SAC 2013)

Implementation: Hermes Hermes framework is utilized for building a news personalization service for RSS Its implementation is the Hermes News Portal (HNP): –Programmed in Java –Uses OWL / SPARQL / Jena / GATE / WordNet 28th Symposium On Applied Computing 2013 (SAC 2013)

Implementation: Ceryx Ceryx is a plug-in for HNP Uses WordNet / Stanford POS Tagger / JAWS Lemmatizer / Lesk WSD / Alias-I LingPipe Named Entity Recognizer / Bing API 2.0 Main focus is on recommendation support User profiles are constructed Computes SS and BingSS 28th Symposium On Applied Computing 2013 (SAC 2013)

Evaluation (1) Experiment: –We evaluate 100 news items on their correspondence with 8 topics (USA, Microsoft or competitors, Google or competitors, financial markets, …) –User profile: all articles that are related to each of the topics –Ceryx computes SS and BingSS with various cut-offs –Measurements: Accuracy Precision Recall Specificity F 1 -measure 28th Symposium On Applied Computing 2013 (SAC 2013)

Evaluation (2) 28th Symposium On Applied Computing 2013 (SAC 2013)

Evaluation (3) Results: –Optimized cut-off values are 0.49 (SS) and 0.63 (BingSS) –BingSS recommendation outperforms SS recommendation on accuracy, precision, specificity, and F 1 –This comes at the cost of a reduced recall –For BingSS, named entity similarities are more important than synset similarities ( α = 0.72 vs. 0.28) 28th Symposium On Applied Computing 2013 (SAC 2013) MeasureSSBingSS Accuracy64.2%73.1% Precision44.0%54.0% Recall73.1%62.9% Specificity60.2%77.4% F 1 -measure54.3%58.1%

Conclusions Semantics-based recommendation can be performed by means of synsets from a semantic lexicon (SS) Named entities are not included, but can be considered through search page counts (BingSS) BingSS outperforms SS and named entities are considered to be more important than synsets Future work: –Also include page counts for synsets –Apply named entity page counts to other methods, e.g., TF- IDF, CF-IDF, or SF-IDF 28th Symposium On Applied Computing 2013 (SAC 2013)

Questions 28th Symposium On Applied Computing 2013 (SAC 2013)