Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:

Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter: Ying-Ying, Chen

Outline Introduction Related Works Approach – Hybrid User Profiles – Content-Based User Profiles – Linking Tags to User Interests Experimental Results Conclusions and Comments 2009/10/232Speaker: Ying-ying, Chen

Introduction User profiling is an essential component of personal information agents and recommendation systems in general. Content-based recommendation approaches rely on profiles were collected from observation of the browsing history or documents read by the user. 2009/10/233Speaker: Ying-ying, Chen

Introduction Recently, collaborative or social tagging sites have achieved widespread success on the Web. In these sites, user annotate resources using a freely chosen set of keywords or tags commonly known as folksonomy. The activities carried out by users in social tagging systems, including posting resources or assigning tags to resources, have become a novel resource of information about user interests. 2009/10/23Speaker: Ying-ying, Chen4

Introduction This paper propose to integrate content-based profiles representing long-term user interests gathered by recommenders through observation of browsing activities with tag-based profiles acquired by capturing the user interaction with one or more collaborative tagging systems. Hybrid profiles can be exploited to assist users in finding resources, people or tags within social tagging systems. 2009/10/23Speaker: Ying-ying, Chen5

Related Works Vector of weighted tag – a vector of weighted tags is obtained using tag frequency of occurrence in there sources a user tagged and it is applied to rank Web search results according to their similarity with this tag vector. TBProfile – It uses weighted vector of tags to represent user interests, but tag weights are based on inverse user frequency. 2009/10/23Speaker: Ying-ying, Chen6

Related Works Using a single vector of weighted tags has some drawbacks. – More frequent tags lose specificity. – Unique vector or tag cloud can’t embrace diverse interests spanning across different domains. Graph-based cluster[Au Yeoung at al.] Multiple tag-clouds 2009/10/23Speaker: Ying-ying, Chen7

Related Works A number of problems result from the free- form nature of tagging. – Ambiguity – Synonymy Solve: contextualizing tags based on the knowledge of user information preferences. 2009/10/23Speaker: Ying-ying, Chen8

Approach – Hybrid User Profiles Folksonomies are the primary structure underlying collaborative tagging systems. Folksonomy can be defined as a tuple F := (U, T, R, Y, ≺ ) U: users, R: resources, T: tags Y: the user-based assignment of tags to resources by a ternary relation. Y ⊆ U × T × R ≺ : a user-specific sub-tag/super-tag-relation ≺⊆ U × T × T 2009/10/23Speaker: Ying-ying, Chen9

Approach – Hybrid User Profiles The collection of all tag assignments of a single user constitutes a personomy, P u. P u := (T u, R u, I u, ≺ u ) with I u :={(t, r) ∈ T × R|(u, t, r) ∈ Y }, T u := (I u ), R u := (I u ), and ≺ u := {(t 1, t 2 ) ∈ T × T |(u, t 1, t 2 ) ∈≺ } 2009/10/23Speaker: Ying-ying, Chen10

Approach Overview 2009/10/23Speaker: Ying-ying, Chen11

Approach – Content-Based User Profiles WebDCC (Web Document Conceptual Clustering) – Input : Web Pages – Output : Hierarchy of concepts – User Profile. – Instance are represented using bag-of words approach for document representation. – It builds hierarchy of Concepts. – Each node is Concept and leafs are clusters. – Category is considered to be any set of instances and concept is the internal representation of a category. 2009/10/23Speaker: Ying-ying, Chen12

Approach – Content-Based User Profiles User Profile 2009/10/23Speaker: Ying-ying, Chen13

Approach – Content-Based User Profiles Agents capture experiences regarding user interests such as Web pages a user read or bookmarked for future reading, read news, etc. Experiences are vector representations of information items based on the vector space model. Di = {(t1,w1),..., (tm,wm)} 2009/10/23Speaker: Ying-ying, Chen14

Approach – Content-Based User Profiles Hierarchies of concepts produced by this algorithm are classification trees. – Root→most general concept – Terminal concept→cluster WebDCC integrates classification and learning by sorting each experience through the concept hierarchy and simultaneously updating it. 2009/10/23Speaker: Ying-ying, Chen15

Approach – Content-Based User Profiles hierarchy consists of a number of concepts C = {c1, c2,..., cn} In order to automatically assign experiences to concepts with a description given by set of term ci = {(t1,w1),..., (tm,wm)} – weight associated to the term in the category ci. This description constitutes a linear classifier for the category. 2009/10/23Speaker: Ying-ying, Chen16

Approach – Content-Based User Profiles WebDCC aims at obtaining a hierarchical set of linear classifiers, each of which is based on a set of relevant features. This goal is achieved by combining – feature selection algorithm to choose the appropriate terms at each node in the tree – supervised learning algorithm to construct a classifier for that node 2009/10/23Speaker: Ying-ying, Chen17

feature selection algorithm A feature selection threshold, ; is defined in the [0; 1] range such that the weight required for a feature to be selected needs to be higher than. A simple and effective approach to weigh terms is the document frequency, denoted by DF(tk); which is the number of instances in which the term tk occurs. 2009/10/23Speaker: Ying-ying, Chen18

supervised learning algorithm Each node in the hierarchy acts as a linear classifier which is compared with the resource to be classified – prototype p ci – category c i – d are the documents belonging to the category c i A resource is classified in a certain category if it exceeds a minimum similarity to the category prototype. 2009/10/23Speaker: Ying-ying, Chen19

Approach – Content-Based User Profiles Given the cluster s ji belonging to the category ci, which is composed of the vector representations corresponding to a set of documents, the centroid vector p sji is defined as follows: 2009/10/23Speaker: Ying-ying, Chen20

Approach – Content-Based User Profiles As the result of resource comparison with the prototypes, the resource is assigned to the cluster with the closest centroid below the category c i, C={C sport,C politics }, the Classify function applied to each of them might return the following result: {(C sport,0.97),(C politics,0.14)} 2009/10/23Speaker: Ying-ying, Chen21

Approach – Content-Based User Profiles Provided that the similarity is higher than a minimum similarity threshold δ.Experiences no similar enough to any existent centroid according to this threshold cause the creation of new singleton clusters. 2009/10/23Speaker: Ying-ying, Chen22

Approach – Content-Based User Profiles Cluster cohesiveness – n r : the size of the s r If the cohesiveness value is higher than a threshold φ ; a new concept is created. Otherwise, no updating in the hierarchy takes place. 2009/10/23Speaker: Ying-ying, Chen23

Approach- Linking Tags to User Interests In order to build hybrid profiles, categories representing user interests in content-based profiles are populated with the tags users frequently associate to resources in that categories. Tagged resources have to be first categorized according to the current representation of user interests given by the interest hierarchy. 2009/10/23Speaker: Ying-ying, Chen24

Approach- Linking Tags to User Interests For each cluster in the hierarchy, a set of the most frequently used tags is extracted to represent the corresponding tag assignment preferences for the experiences or resources belonging to this cluster. The set of tags related to a cluster s ji within the category c i can be defined within the personomy P u as follows: T sji = {t ∈ T |(t,r) ∈ I u ∧ r ∈ s ij } 2009/10/23Speaker: Ying-ying, Chen25

Approach- Linking Tags to User Interests Where the tag-frequency for a tag t in Ts ji is the number of times the tag was used to tag resources belonging to the cluster as follows: = |{r ∈ R|(t,r) ∈ I u ∧ r ∈ s ij }| 2009/10/23Speaker: Ying-ying, Chen26

Experimental Results Experiments were performed using data collected from del.icio.us social bookmarking system. 2009/10/23Speaker: Ying-ying, Chen27

2009/10/23Speaker: Ying-ying, Chen28

Experimental Results For a given user u ∈ U and a given resource r ∈ R, a tag recommender system tries to find a set of tags ˜ T (u,r) ⊆ T for the user to annotate the resource. Training set 80% of the total tagged bookmarks Testing set containing the remaining 20% 2009/10/23Speaker: Ying-ying, Chen29

Experimental Results The quality of a given list of top-N recommendations was evaluated considering the number of hits. – Number of hits is the number of tag assignments in the test set that were also present in the top-N recommended tags. – N is the total number of recommendations. 2009/10/23Speaker: Ying-ying, Chen30

Experimental Results High values of hit-rate indicate that the algorithm was able to predict the assignments in the test sets of the corresponding users. – ˜ T (u,r) →the set of recommended tags – tags (u,r) →the set of real tags assigned by the user to the resource. 2009/10/23Speaker: Ying-ying, Chen31

Experimental Results F-measure was used to combine precision and recall values: 2009/10/23Speaker: Ying-ying, Chen32

Experimental Results Precision increases as the similarity threshold grows, since clusters are smaller in size and recommendations are based on fewer, but highly similar resources. Conversely, recall tends to decrease since smaller clusters offer less tag diversity. The best values of hit-rate can be found in the interval 0.1 ≤ δ ≤ 0.3, within which also the best relation between precision and recall is attained for most users. 2009/10/23Speaker: Ying-ying, Chen33

Experimental Results Hybrid profiles were compared with tag recommendation based on two different approaches commonly used in folksonomies: – Most popular tags by user(MPTU) Tags are sorted according to their frequency of occurrence in the user resources and the top-N tags are in turn applied to make recommendations. Tag- based profiles consisting of a single vector of tags. – Most popular tags by resource(MPTR) It is based on collective knowledge instead of person alone. 2009/10/23Speaker: Ying-ying, Chen34

Experimental Results recommendations based on hybrid profiles consistently reached higher hit-rates than the approaches based on tag popularity. 2009/10/23Speaker: Ying-ying, Chen35

Experimental Results The differences in the performance of hybrid profiles with respect to MPTU and MPTR tested with a paired two-tailed t-test resulted statistical significant at a level of α =0.05 with p-values 0.0119 and 0.0001 respectively. 2009/10/23Speaker: Ying-ying, Chen36

Conclusions and Comments Experimental results showing that hybrid profiles are able to out perform two commonly used recommendation methods based on tag popularity. Future – Non-obviousness – Discriminating power Comments – The experimental sample are too small – The possibility of tag-based profile 2009/10/23Speaker: Ying-ying, Chen37

Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:

Similar presentations

Presentation on theme: "Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:

Similar presentations

Presentation on theme: "Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:"— Presentation transcript:

Similar presentations

About project

Feedback