Presentation is loading. Please wait.

Presentation is loading. Please wait.

News Recommendation with CF-IDF+

Similar presentations


Presentation on theme: "News Recommendation with CF-IDF+"— Presentation transcript:

1 News Recommendation with CF-IDF+
Emma de Koning Frederik Hogenboom Flavius Frasincar Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands June 13, 2018 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

2 Introduction (1) Recommender systems help users to plough through a massive and increasing amount of information Recommender systems: Content-based Collaborative filtering Hybrid Content-based systems are often term-based Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by [Salton and Buckley, 1988] 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

3 Introduction (2) TF-IDF:
Preprocessed documents (stop words removal and stemming) For each term, it takes into consideration: The importance in a single document The inverse of the general importance within a set of documents TF-IDF performance tends to decrease as documents get larger The red, purple, and blue terms are important, whereas the yellow, green, and pink terms are irrelevant 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

4 Introduction (3) Utilizing concepts instead of terms:
Reduces noise caused by non-meaningful terms Yields less terms to evaluate Allows for semantic features, e.g., synonyms Therefore, in 2011 we proposed Concept Frequency – Inverse Document Frequency (CF-IDF), showing an improvement over regular TF-IDF [Goossen et al., 2011] The black concepts are important, while the brown and beige concepts are irrelevant 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

5 Introduction (4) Research has shown that relationships like synonymy and hyponymy provide structure and improved interpretability Hence, we coin CF-IDF+, which additionally accounts for semantic relationships CF-IDF+ is implemented in Ceryx (an extension for Hermes [Frasincar et al., 2009], a news processing framework) Results are evaluated in comparison with TF-IDF and CF-IDF The black concepts have become less important due to their relationship with beige concepts 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

6 Introduction (5) Earlier work has been done:
CF-IDF-like methods: [Baziz et al., 2005], [Yan and Li, 2007] Frameworks: OntoSeek [Guarino et al., 1999], Quickstep [Middleton et al., 2004], [Cantador et al., 2008] Although some work shows overlap: Methods are not thoroughly compared with TF-IDF Often, WSD and synonym handling is lacking 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

7 TF-IDF Term Frequency: the occurrence of a term ti in a document dj, i.e., Inverse Document Frequency: the occurrence of a term ti in a set of documents D, i.e., And hence 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

8 CF-IDF Concept Frequency: the occurrence of a concept ci in a document dj, i.e., Inverse Document Frequency: the occurrence of a concept ci in a set of documents D, i.e., And hence 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

9 CF-IDF+ Each concept c from set C has a set of related concepts R(c)
A related concept r is associated with a weight wr We focus on domain ontologies, and identify 3 different weights for superclasses, subclasses, and domain relationships For concept ci and related concept ri  R(ci) with weight wr in document dj  D, CF-IDF+ is computed as 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

10 Recommendations Ontology contains a set of concepts and relations
User profile consists of (a subset of) these concepts and relations Each concept and relation is associated with all news articles Each article is represented as: TF-IDF: a set containing all terms CF-IDF: a set containing all concepts CF-IDF+: a set containing all concepts and related concepts Then, for each article, weights are calculated Weights of a new article are compared to the user profile using cosine similarity 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

11 Implementation: Hermes
Hermes is used for building a news personalization service Its implementation Hermes News Portal (HNP): Is ontology-based Is programmed in Java Uses OWL / SPARQL / Jena / GATE / WordNet Input: RSS feeds of news items Internal processing: Classification News querying Output: news items 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

12 Implementation: Ceryx (1)
Ceryx is a plug-in for HNP Main focus is on recommendation support User profiles are constructed TF-IDF (using a stemmer as proposed in [Krovetz, 1993]), CF-IDF, and CF-IDF+ recommendation calculations (using Lesk Word Sense Disambiguation [Jensen and Boss, 2008]) can be performed 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

13 Implementation: Ceryx (2)
30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

14 Evaluation (1) Experiment: Cut-off values: {0, 0.01, 0.02, …, 1}
For each cut-off value, relationship weights are optimized to maximize F1-scores: Subclass relations receive low weights (too specific) Superclass relations receive higher weights (somewhat generic) Domain relations receive highest weights (just about right) Experts 3 Profiles 8 News items 100 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

15 Evaluation (2) CF-IDF+ has a significantly higher recall than CF-IDF and TF-IDF, at the cost of a slightly reduced precision Overall, F1-scores for CF-IDF+ are higher at high cut-off values, which are the tougher nuts to crack! 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

16 Evaluation (3) CF-IDF+ consistently has a higher classification power (Kappa statistic) than CF-IDF CF-IDF+ mostly outperforms TF-IDF in the higher regions, yet stays behind for low cut-off values 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

17 Conclusions For strict recommendation settings, CF-IDF+ outperforms CF-IDF and TF-IDF significantly Especially classification power and recall show vast improvements, at the cost of a slight loss in precision Hence, using key concepts and semantic relations instead of analyzing all terms could be beneficial for recommender systems Future work: Invest in a more fine-grained weight learning procedure Include a larger collection of relationships Use other large ontological resources like Evaluate on a larger set of news items 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

18 Questions 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

19 References (1) Baziz, M., Boughanem, M., Traboulsi, S.: A Concept-Based Approach for Indexing Documents in IR. In: Actes du XXIIIème Congrès Informatique des Organisations et Systèmes d'Information et de Décision (INFORSID 2005). pp HERMES Science Publications (2005) Cantador, I., Bellogín, A., Castells, P.: A Semantic Web Approach to Recommending News. In: 5th International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH 2008). pp Springer-Verlag, Berlin, Heidelberg (2008) Frasincar, F., Borsje, J., Levering, L.: A Semantic Web-Based Approach for Building Personalized News Services. International Journal of E-Business Research 5(3), (2009) 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

20 References (2) Goossen, F., IJntema, W., Frasincar. F., Hogenboom, F., Kaymak, U.: News Personalization using the CF-IDF Semantic Recommender. In: 1st International Conference on Web Intelligence, Mining and Semantics (WIMS 2011). ACM (2011) Guarino, N., Masolo, C., Vetere, G.: OntoSeek: Content-Based Access to the Web. IEEE Intelligent Systems 14(3), (1999) Jensen, A.S., Boss, N.S.: Textual Similarity: Comparing Texts in order to Discover How Closely They Discuss the Same Topics. Bachelor’s thesis, Technical University of Denmark (2008) Krovetz, R.: Viewing Morphology as an Inference Process. In: 26th ACM Conference on Research and Development in Information Retrieval (SIGIR 1993). pp ACM (1993) 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)

21 References (3) Middleton, S.E., Roure, D.D., Shadbolt, N.R.: Ontology-Based Recommender Systems. In: Handbook on Ontologies, pp International Handbooks on Information Systems, Springer (2004) Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), (1988) Yan, L., Li, C.: A Novel Semantic-based Text Representation Method for Improving Text Clustering. In: 3rd Indian International Conference on Artificial Intelligence (IICAI 2007). pp (2007) 30th International Conference on Advanced Information Systems Engineering (CAISE 2018)


Download ppt "News Recommendation with CF-IDF+"

Similar presentations


Ads by Google