Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG
Outline Introduction System Design – Overview – User Interest Profile – Search Result Personalization – Adaptive adjustment Evaluation Conclusion 2
Introduction Social network have experienced explosive growth in the past few years. Social online activities carry valuable information about users’ background and interests. How to choose right sources? – Availability – Privacy – Accuracy 3
Introduction In this article, proposing a personalization framework that infers users’ interests and preferences through public activities on a variety of online social systems. – Retrieve information and creates an interest profile for each user. – Based on interest profile to personalize. – Automatically adjust weights of different information. 4
Social activity ……. System user ……………………………………………… User Interest profile User Interest profile User Interest profile User Interest profile Personalization Adaptive adjustment
System Design---Overview User Interest Profile create an interest profile for each user Receiving a query from a user – Search engine returns a number of webpages – Retrieve interest vector from interest profile – Compute interest score based on how well the webpage matches the user’s interest Search Result Personalization – Combined both scores into final score Adaptive Adjustments – Personalization degree – The weights of different social information sources (relevance score) 5
user User Interest profile + query Search Engine Webpage1 Webpage2 Webpage3 Webpage4 Webpage5. Search Result Relevance score Interest Vector Keyword : t Score : s Cosine similarity interest score + Personalization
User Interest Profile Three parts : – Creating interest vectors – Combining interest vectors – Updating interest vectors Pre-definition : – A user interest profile is represented as {V, W, p} – V : {v 1, ……, v k } a set of interest vectors – W : {w 1, ……, w k } weight of the corresponding interest vectors – p : real number called the personalization degree 6
1. Facebook 2. Twitter 3. Bookmarks System user {V, W, p} V = {v 1, v 2, v 3 } v 1 : user information from Facebook v 2 : user information from Twitter v 3 : user information from Bookmarks w 1 ~ w 3 : corresponding weight
Creating interest vectors There are different ways to create an vector – Depending on information source Text resources : – Keywords : most important keywords – Score : the number of the texts contain this keyword Tag-based resources : – Keywords : tags are treated as keywords – Score : the number of people have tagged the user with the keyword For each user, normalize the scores into [0,1] 7
Combining interest vectors 8
1. Facebook 2. Twitter 3. Bookmarks System user {V, W, p} Rice(4) Noodle(2) Spaghetti(2). T 1 : { Rice, Noodle, Spaghetti } s(t) = 4* * *0.4 = 3.2
Updating interest vectors Periodically crawl new data from social systems Integrate new information Add new social information source – Add new interest vector and make use of new data Give higher probability to new data 9
Search Result Personalization Relevance score – The search engine will then return a list of webpages – 1 / (1+k) : kth webpage in the result list Interest score – Cosine similarity between the word vector of the webpage and overall interest vector Final score – g f (x) = g r (x) * (1-p) + g i (x) * p 10
Adaptive adjustment Adjusting personalization degree – S u : the set of search results that are actually clicked by the user u – L g : original list of results returned by the search engine – L p : final list of result returned by personalized search system 11
NDCG : Normalized Discounted Cumulative Gain Calculate two values NDCG(L g, S u ) NDCG(L p, S u ) x : top or x of L g or L p r i : 0 or 1 r i = 1 : ith element of L is in S r i = 0 : Otherwise
LgLpSu AAA DBB FGC NDCG(L g, S u ) = Z 3 ( 1/ ) = Z 3 NDCG(L p, S u ) = Z 3 ( 1/1 + 1/log ) = 1… * (Z 3 ) NDCG(L p, S u ) NDCG(L g, S u ) > personalization degree
Adaptive adjustment Adjusting source weights – S u : the set of search results that are actually clicked by the user u – v i : the interest vector of the ith information source 12
Su A B C As v 1 : Facebook v 2 : Twitter h 1 ( v 1, S u ) = cos(v 1, A) + cos(v 1, B) + cos(v 1, C) h 2 ( v 2, S u ) = cos(v 2, A) + cos(v 2, B) + cos(v 2, C) The average of h = (h 1 +h 2 ) /2 h 1 and h 2 which is greater than the average of h ?
Evaluation Experiment – Blogs – Social bookmarks – Mutual tags 208 users – At least 10 blogs – No less than 10 people tags – Bookmarked 20 webpages or more 13
Evaluation Method and Metrics Use 25% bookmarks to create interest profile The other 75% is the testing corpus For ith user u i, randomly choose 30 words Search query consisting of the word was issued on behalf of u i Search query consists of a word t L t [1,k] is the list of top k results returned by the search system S t is the set of webpages that have been tagged with t by u i 14
Evaluation Method and Metrics Compute the average value of the recall over the 30 search queries issued for u i Improvement percentage r a && r b is the average recall of approaches A and B 15
Experimental Results Personalization v.s. Non-personalization 16
17
Experimental Results Active users v.s. Less active users 18
19
Experimental Results Multiple sources v.s. Single source 20
21
Experimental Results Effectiveness of adaption – Personalization degree adjustment (PDA) – Source weight initialization (SWI) – Source weight adjustment (SWA) 22
23
Conclusion Propose a personalization framework – Infer users’ preferences from their activities on lots of online social systems – Create user interest profiles – Integrate information from different information resources – How to personalize – Adaptive 24