How are web search queries distributed? Taken from Damon Horowitz’s talk slides.
How are web search queries distributed? Adapted from Damon Horowitz’s talk slides. Web search works well! Web search is a good start; more effort needed, possibly on top. Based on opinion of friends.
Social Search General Remarks Search in text corpora (IR). Search in a linked environment (authority, hub, pagerank). What if my social context should impact search results? ◦ E.g.: users in a SN post reviews/ratings on items. ◦ “Items” = anything they want to talk about and share their opinions with their friends and implicitly recommend (against).
Social Search – The Problem & Issues Search results for a user should be influenced by how he/his friends rated the items, in addition to quality of match as determined by IR methods and/or by pagerank-like methods. Transitive friends’ ratings may matter too, up to some distance. Users may just comment on an item w/o explicitly rating it.
More Issues Factoring in transitive friends somewhat similar to Katz: longer the geodesic from u to v, the less important v’s rating is to u. Trust may be a factor. There is vast literature on trust computation. May need to analyze opinions (text) and translate into strength (score) and polarity (good or bad?).
Other approaches Google Social Search ◦ Search for “barcelona” returns results including searcher’s friends’ blogs. ◦ Relevant users need to connect their facebook, twitter,... Accounts to their google profile. ◦ Of particular value when serahcing for local resources such as shows and restaurants. But this does not use user-generated content for ranking. Aardvark – part of Google labs that was shut down -- is an interesting approach to social search. There are other companies such as sproose.com. See wikipedia for a list and check them out. (sproose seems to take reviews into account in ranking.) [defunct now?] Some papers develop notions of SocialRank, UserRank, FolkRank, similar to PageRank (see references in Schenkel et al. 2008 [details later]). Part I based on: Damon Horowitz and Sepandar D. Kamvar. The Anatomy of a Large-Scale Social Search Engine. WWW 2010.
The Aardvark Approach Classic web search – roots in IR; authority centric: return most relevant docs as answers to a search query. Alternative paradigm: consult village wise people. Web search – keyword based. Social search – natural language; social intimacy/trust instead of authority. E.g.: what’s a good bakery in the Mag mile area in Chicago? What’s a good handyman, who is not too expensive, is punctual, and honest? These queries are normally handled offline, by asking real people. Social search seeks to make them online. Note: long, subjective, and contextualized queries. Can you think of similar systems that already exist? Hint: what do you do when you encounter diffculties with a new computer, system, software, tool?
Index what? User’s existing social habitat – LI, FB contacts; common groups such as school attended, employer, …; can invite additional contacts. Topics/areas of expertise: learned from ◦ Self declaration ◦ Peer endorsement (a la LI) ◦ Activities on LI, FB, Twitter, etc ◦ Activites (asking/answering [or not] questions) on Aardvark. Forward Index: user (id), topics of expertise sorted by strength, answer quality, response time, … Inverted Index: for each topic, list of users sorted on expertise, plus answer quality, response time, etc.
Query Life Cycle Transport Layer Routing Engine Conversation Manager
Query Answering Model Prob. that u_i is an expert in topic t. Prob. that question q is In topic t. Prob. That u_i can successfully answer a question from u_j. Usually based on strength of social connections/trust etc. Prob. that u_i can successfully answer question q from u_j. All this is fine. But it’s important to Engage a large #high quality question askers and answerers to make and keep The system useful.
Question Analysis Semi-automated: ◦ Soft classification into topics – ◦ Filter out non-qns, inappropriate and trivial qns. ◦ KeywordMatchTopicMapper map keywords/terms in question to topics in user profile. ◦ TaxonomyTopicMapper places question on a taxonomy covering popular topics. ◦ LocationMatching. Human judges assign scores to topics (evaluation).
Overall ranking Aggregation of three kinds of scores: ◦ Topic expertise. ◦ Social proximity/match between asker and answerer. ◦ Availability of answerer (can be learned from online activity patterns, load, etc.) Answerers contacted in priority order. Variety of devices supported. See paper for more details and for experimental results.
SocialWisdom for Search and Recommendation Ralf Schenkel et al. IEEE DE Bullet. June 2008. Expand scope of RecSys by storing (in a relational DB) other info.: Users(username, location, gender,...) Friendships(user1, user2, ftype, fstrength) Documents(docid, description,...) Linkage(doc1, doc2, ltype, lweight) Tagging(user, doc, tag, tweight) Ontology(tag1, tag2, otype, oweight) Rating(user, doc, assessment) Just modeling/scoring aspects; scalability ignored for now.
Friendship types and search modes Social – computed from explicit social graph, say using inverse distance. Could be based on others like Katz. Spiritual – derived based on overlap in activities (rating, reviews, tagging,...). Global – all users given equal weight = 1/|U|. All measures normalized so the weights on all o/g edges from a user sum to 1. Combos possible: F(u,u’) = aF so (u,u’) + bF sp (u,u’) + cF gl (u,u’), with a+b+c = 1.
Scoring documents for tags – digress into BM25 BM25 – state of the art IR model. idf(t i ) (k 1 +1)tf(D, t i ) score(D,t i ) = -------------------------- tf(D, t i ) + k 1 (1-b+b.len(D)/avgdl) k 1, b tunable parameters. #docs – n(ti)+0.5 idf(D, t i ) = log ------------------- n(t i )+0.5 tf = term frequency, idf = inverse doc frequency.; avgdl = avg doc length, n(t i ) = #docs containing t i.
Adapt to social search (k 1 + 1) · |U| · sf u (d, t) s u (d, t) = ---------------------------- · idf(t) k 1 + |U| · sf u (d, t) |U|=#users. |D| − df(t) + 0.5 idf(t) = log --------------------- df(t) + 0.5 |D|=#docs, df(t) = #docs tagged t. sf u (d, t) = ∑ v ЄU F u (v) tf v (D,t). BTW, when we say docs, think items!
Tag expansion Sometimes (often?) users may use related tags: e.g., tag an automobile as “Ferrari” and as “car”. tsim(t,t’) = P[t|t’] = df(t&t’)/df(t’). //error in the paper.// Then sf u *(d, t) = max t‘ ЄT tsim(t,t’). sf u (d, t‘). Plug in sf u *(d,t) in place of sf u (d,t) and we are all set.
Socially aware Tag Expansion Who tagged the documents and what is the strength of their connection to u? tsim u (t,t’) = ∑ v ЄU F u (v).df v (t&t’)/df v (t’). Score for a query: s* u (d, t 1,..., t n ) = ∑ t i s* u (d,t i ). Experiments – see paper: librarything.com, mixed results. Measured improvement in precision@top-10 and NDCG@top-10.
Lessons and open challenges Socializing search across the board is a bad idea. Need to understand which kind of queries can benefit from what kind of settings (a, b, c values). Examples below. 1. Queries w/ global information need: perform best when a= b= 0; e.g., “Houdini”, “search engines”, “English grammar”; fairly precise queries; reasonably clear what are quality results.
Lessons & Challenges (contd.) 2. Queries with a subjective taste (a social aspect): perform best when a≈1; e.g., “wizard”; produces a large number of results but user may like only particular types of novels such as “Lord of the Rings”; the tag “wizard” may be globally infrequent but frequent among user’s friends. 3. Queries with a spiritual information need: perform best when b ≈ 1; e.g., “Asia travel guide”; very general, need to make full use of users similar (in taste) to searcher. (Think recommendations.)
Lessons & Challenges (contd.) 4. Queries with a mixed information need: perform best when a≈b≈0.5; e.g.,“mystery magic”. Challenges: The above is an ad hoc classification. Need more thorough studies and deeper insights. Can the system “learn” the correct setting (a,b,c values) for a user or for a group? The usual scalability challenges: see following references. Project opportunity here.
Follow-up Reading (Efficiency) S. Amer-Yahia, M. Benedikt, P. Bohannon. Challenges in Searching Online Communities. IEEE Data Eng. Bull. 30(2), 2007. R. Schenkel, T. Crecelius, M. Kacimi, S. Michel, T. Neumann, J.X. Parreira, G. Weikum. Efficient Top-k Querying over Social-Tagging Networks. SIGIR 2008. M.V. Vieira, B.M. Fonseca, R. Damazio, P.B. Golgher, D. de Castro Reis, B. Ribeiro-Neto. Efficient Search Ranking in Social Networks. CIKM 2007.
Follow-up Reading (Temporal Evolution, Events, Networks,...) N. Bansal, N. Koudas. Searching the Blogosphere. WebDB 2007. M. Dubinko, R. Kumar, J. Magnani, J. Novak, P. Raghavan, A. Tomkins. Visualizing Tags over Time. ACM Transactions on the Web, 1(2), 2007. S Bao, G Xue, X Wu, Y Yu, B Fei. Optimizing web search using social annotations. WWW 2007. Anish Das Sarma, Alpa Jain, and Cong Yu. Dynamic Relationship and Event Discovery. In WSDM, Hong Kong, China 2011. Sihem Amer-Yahia, Michael Benedikt, Laks Lakshmanan, Julia Stoyanovich. Efficient Network-aware Search in Collaborative Tagging Sites VLDB 2008, 2008 We will revisit social search later in your talks.