Postdoc, School of Information, University of Arizona Profiling users with tag networks in diffusion-based personalized recommendation Jin Mao Postdoc, School of Information, University of Arizona Oct. 30, 2015
Outline Introduction Tag-based User Profiling Diffusion-based Personalized Recommendation Experiments Discussion
Introduction Personalized Information Recommendation Recommender System (Resnick and Varian, 1997): a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item.
Introduction Social Tagging System Social tagging system consists of users, resources, social tags and their associations behind the tagging behaviour(Hotho et al., 2006)
Implicit rating or voting behaviour against items (Liang et al., 2008) Introduction Tags Personal collections Implicit rating or voting behaviour against items (Liang et al., 2008)
Tag-based User Profiling Personalization User profiling aims to understand personal interests of users and describe the interests with some kind of representation(Liang, 2010). Explicit: demographic information Implicit: click streams, customer purchase logs content behaviour Tag
Tag-based User Profiling Methods The common way to construct a tag-based user profile is to obtain a representation of a tag vector or a set of tags. Selecting suitable candidate tags and inferring their weights are the major focus. Vector Space Model (Vallet et al., 2009; Bogers and Bosch, 2009) Tag Set(Yeung et al., 2008) Connect personal tag vocabularies to common folksonomy (Wetzker et al., 2010), or canonical ontologies(Movahedian and Khayyambashi, 2014; Han et al, 2010; Hsu 2012)
Tag-based User Profiling Co-occurrence approach It is more reasonable to profile users’ interests with tags in combination rather than independently (Michlmayr and Cayzer 2007). Weighted tag network How to infer the weight of tags?
Diffusion-based Personalized Recommendation Main Steps Construct the user tag network Infer the weights of tags Generate the scores of recommended items via a diffusion process
Diffusion-based Personalized Recommendation User tag network Tripartite Graph
Diffusion-based Personalized Recommendation User tag network
Diffusion-based Personalized Recommendation Inferring tag weights
Diffusion-based Personalized Recommendation Inferring tag weights
Diffusion-based Personalized Recommendation Diffusion process Tag-item bipartite graph An edge represents the usage of the tag to annotate the item in the collection The weights denote the frequency of the usage. The tag-item bipartite graph manifests the opinion of other users in the system.
Diffusion-based Personalized Recommendation Diffusion process
Avg Tag Assignments per User Experiments Dataset The HetRec 2011 conference Collection Users Items # of unique Tags Tag Assignments Avg Tag Assignments per User Std.dev Last.fm 1,892 12,523 9,749 186,725 98.56 224.53 MovieLens 2,113 5,908 9,079 47,957 22.70 169.95 Delicious 1,867 69,223 40,897 437,593 234.38 192.39
Statistics about the user tag networks of the three datasets Experiments Dataset Statistics about the user tag networks of the three datasets Collection Min Nodes Max Nodes Median Nodes Avg Edges Avg Density Last.fm 1 50 12 101.74 0.399 MovieLens 1660 2 21.30 0.301 Delicious 739 105 550.35 0.087 The user tag networks in Last.fm and MovieLens are denser than Delicious
Inter-diversity (Zhou et al., 2008) ---measure the personalization Experiments Evaluation Metrics P@N Recall Inter-diversity (Zhou et al., 2008) ---measure the personalization Baseline model The tag-based collaborative filtering (Tso-Sutter et al. 2008)
The performance of different tag-aware recommendation methods Experiments Results The performance of different tag-aware recommendation methods Table 3. The performance of the diffusion-based methods and the tag-based collaborative filtering (TBCF: tag-based collaborative filtering; TNDR_PR: the diffusion-based method using PageRank node weighting approach; TNDR_HITS: the diffusion-based method using HITS node weighting approach. ) Collection Metric TBCF TNDR_PR TNDR_HITS Last.fm P@5 3.020% 3.126%† 3.182%† P@10 2.387% 2.394%† 2.421%† Recall 1.648% 1.667%† IntDiv 83.820% 93.046%† 92.149%† MovieLens 1.322% 1.930%†* 1.629%† 0.979% 1.429%†* 1.701% 2.424%†* 88.030% 97.831%† 97.783%† Delicious 0.579% 0.277% 0.202% 0.428% 0.195% 0.287% 0.131% 0.135% 97.100% 99.729%† 99.591%† Figures with † denote improvements over the baseline model. Stars indicate statistically significant improvements over the baseline model according to the Wilcoxon test at the level of 0.05. Precision, recall and personalization are improved.
The influence of the refinement process Experiments Results The influence of the refinement process Table 4. The performance of different diffusion-based methods(TNDR_ILW: the diffusion-based method in which the TF-IUF approach is used to initialize tag weights; TNDR_1: the diffusion-based method in which tag weights are initialized as 1. ) Collections Metrics TNDR_ILW TNDR_ILW_PR TNDR_ILW_HITS TNDR_1 TNDR_1_PR TNDR_1_HITS Last.fm P@5 2.861% 3.126%†* 3.182%†* 2.903% 3.224%†* 3.168%† P@10 2.268% 2.394%†* 2.421%† 2.331% 2.394%† 2.435%† R 1.562% 1.648%† 1.667%† 1.605% 1.677%† IntDiv 94.472% 93.046% 92.149% 92.397% 91.825% 91.639% MovieLens 1.830% 1.930%† 1.629% 1.529% 1.729%† 1.579%† 1.391% 1.429%† 1.203% 1.291%† 1.316%† 2.360% 2.424%† 2.041% 2.190%† 2.233%† 97.849% 97.831% 97.783% 97.738% 97.746%† 97.720% Delicious 0.264% 0.277%† 0.202% 0.252%† 0.195% 0.145% 0.170%† 0.195%† 0.135% 0.131% 0.097% 0.114%† 0.131%† 99.750% 99.729% 99.591% 99.666% 99.628% 99.525% Figures with † denote improvements over the baseline model. Stars indicate statistically significant improvements over the baseline model according to the Wilcoxon test at the level of 0.05. The refinement methods can be regarded as a necessary process for the proposed diffusion-based methods.
The influence of the damping factor Experiments Results The influence of the damping factor A proper value for the damping factor is dependent on the collection. In order to recommend diverse items, only the initial tag weights should be applied.
The relationships among tags Discussion Implication The relationships among tags Apply the network theory in personalized recommendation Practical guidance: Effectiveness A real-time dynamic Different recommendation strategies
Different node weighting methods Larger datasets Discussion Limitation Our method achieves improved performance in datasets with denser user tag networks generalization: in-depth case analysis of user examples Different node weighting methods Larger datasets
Further explore the relationships of tags in network theory Discussion Future study Further explore the relationships of tags in network theory More statistical properties (such as degree centrality, betweeness centrality and community structure) of user tag networks
ACKNOWLEDGEMENT Mao, J., Lu, K., Li, G., & Yi, M. (2015). Profiling users with tag networks in diffusion-based personalized recommendation. Journal of Information Science, 0165551515603321. We thank the anonymous reviewers for their comments that have contributed to important improvements of the paper. We also thank Professor Dietmar Wolfram for his help.
Thank you!