# Query Suggestion Using Hitting Time Qiaozhu Mei, Dengyong Zhou, Kenneth Church University of Illinois at Urbana-Champaign Microsoft Research, Redmond.

## Presentation on theme: "Query Suggestion Using Hitting Time Qiaozhu Mei, Dengyong Zhou, Kenneth Church University of Illinois at Urbana-Champaign Microsoft Research, Redmond."— Presentation transcript:

Query Suggestion Using Hitting Time Qiaozhu Mei, Dengyong Zhou, Kenneth Church University of Illinois at Urbana-Champaign Microsoft Research, Redmond

Motivating Examples 2 MSG 1. Difficult for a user to express information need 2. Difficult for a Search engine to infer information need Query Suggestions: Accurate to express the information need; Easy to infer information need Sports center Food Additive

Motivating Examples (Cont.) 3 Welcome to the hotel california Suggestions hotel california eagles hotel california hotel california band hotel california by the eagles hotel california song lyrics of hotel california listen hotel california eagle

Motivating Examples: Personalization 4 Mountain safety research Metropolis Street Racer Molten salt reactor Mars Sample Return Magnetic Stripe Reader … MSR Actually Looking for Microsoft Research…

Research Questions 5 How can we generate query suggestions in a principled way? Can we generate personalized query suggestions using the same method? Can this method be generalized to other search related tasks?

6 Rest of This Talk Random Walk, Hitting Time, and Bipartite Graph Generating Query Suggestion Personalized Query Suggestion Experiments Discussion and Summary

Random Walk and Hitting Time 7 i k A j P = 0.7 P = 0.3 Hitting Time –T A : the first time that the random walk is at a vertex in A Mean Hitting Time –h i A : expectation of T A given that the walk starts from vertex i 0.3 0.7

Computing Hitting Time 8 i k A j T A : the first time that the random walk is at a vertex in A Iterative Computation h i A : expectation of T A given that the walk starting from vertex i h = 0 h i A = 0.7 h j A + 0.3 h k A + 1 0.7 Apparently, h i A = 0 for those

Bipartite Graph and Hitting Time 9 Expected proximity of query i to the query A : hitting time of i A, h i A Bipartite Graph: - Edges between V 1 and V 2 - No edge inside V 1 or V 2 - Edges are weighted - e.g., V1 = query; V2 = Url A i j w(i, j) = 3 4 5 0.7 0.4 V1V1 V2V2 7 1 A i j 4 5 0.7 0.4 V1V1 V2V2 7 1 A k i j 4 5 0.7 0.4 V1V1 V2V2 7 1 convert to a directed graph, even collapse one group

Generate Query Suggestion 10 T aa american airline mexiana www.aa.com www.theaa.com/travelwatch/ planner_main.jsp en.wikipedia.org/wiki/Mexicana 300 15 QueryUrl Construct a (kNN) subgraph from the query log data (of a predefined number of queries/urls) Compute transition probabilities p(i j) Compute hitting time h i A Rank candidate queries using h i A

Intuition Why it works? –A url is close to a query if freq(q, url) dominates the number of clicks on this url (most people use q to access url) –A query is close to the target query if it is close to many urls that are close to the target query 11

Personalized Query Suggestion Queries are ambiguous Different user different information need different query suggestions Simple approach: build the graph, compute hitting time solely based on the users history Data Sparseness –E.g., you cannot see a query if you never used it Alternative: modify the bipartite graph instead of rebuilding all 12

Personalize the Bipartite Graph 13 T aa american airline alcoholics anonymous www.aa.com www.theaa.com/travelwatch/planner_main.jsp www.alcoholics-anonymous.org QueryUrl en.wikipedia.org/wiki/Alcoholics_Anonymous P aa + user pseudo query: Introduce a pseudo (personali zed query) Reweight edges using personalized Probs. Key: How to compute –From w(url, user, query) – Sparse data! –Compute a smoothed p(Url | User, Query)

Personalization with Backoff (Mei and Church 08) 14 156.111.188.243 156.111.188.* 156.111.*.* 156.*.*.* *.*.*.* Full personalization: sparse data! No personalization: lose the opportunity Personalization with backoff: We dont have enough data for everyone! - Backoff to classes of users (e.g., IP)

Experiments Query Suggestion using Query Logs –commercial search engine log (1.5 year) –637 million queries; 585 million urls –Query-click bipartite graph Author/keyword suggestion using DBLP – titles and authors from DBLP –110k of papers, 580k authors –Coauthor graph, keyword graph, author-keyword bipartite graph Baselines: nearest neighbor; personalized pagerank 15

Result: Query Suggestion 16 Hitting time wikipedia friends friends tv show wikipedia friends home page friends warner bros the friends series friends official site friends(1994) Google friendship friends poem friendster friends episode guide friends scripts how to make friends true friends Yahoo secret friends friends reunited hide friends hi 5 friends find friends poems for friends friends quotes Query = friends

Result: Query Suggestion (II) 17 Yahoo aa route planner aa route finder aa airlines aa meetings aa autoroute aa road map Live aa route finder aa route planner aa airlines american airlines aa meeting aa road map Query = aa Hitting time alcoholics anonymous automobile association theaa american airlines american air american airline ticket reservation Hitting Time learning to rank ndcg measure ir ndcg lambdarank Chris burges pairwise test Query = ranknet

Results: Personalized Query Suggestion Query = msr 18 No personalization mountian safety research msrcorp msr outdoor equipment msr camp stoves msr snowshoes msr racing Personalized Microsoft research research what is research research website microsoft research and development yahoo research labs

Result: Author Suggestion Query = Jon Kleinberg 19 Hitting time Aleksandrs Slivkins Mark Sandler Tom Wexler Lars Backstrom Elliot Anshelevich Xiangyang Lan Nearest Neighbor; Prabhakar Raghavan Eva Tardos Daniel P. Huttenlocher David Kempe Amit Kumar Andrew Tomkins Favor students, especially current students (personalized Pagerank is similar) Famous researchers + former students

Query = olap Dimension updates OLAP data OLAP cubes OLAP queries View size Hierarchical cluster Result: Keyword Suggestion Query = social network Knowledge collaboration Community structure Resource organization Information kiosks Efficient searching Network extraction 20 Query = pagerank Pagerank computation Ranking systems Pagerank approximation Incremental computations Web spam Iterative computation

Result: Keyword Suggestion for Author 21 Baselines mining data frequent Efficient pattern data mining Baselines learning statistical kernel markov inference model Hitting Time large databases frequent pattern sequential pattern pattern mining frequent multi dimensional Query = Michael I. Jordan Query = Jiawei Han Hitting time Dirichlet process approximate inference dirichlet mean field supervised learning graphic models

Discussions Hitting time effectively boosts infrequent queries –Nearest Neighbor & personalized pagerank favorites frequent queries Fast convergence: a few iterations and a subgraph gets most of the value No parameter to tune Can be generalized to many other tasks (on different graphs) 22

Ranking on Query log Graph and Search Tasks Query Query: query suggestion Url Url: finding related pages www.cs.jhu.edu/~brill "research.microsoft.com/users/brill IP IP:finding similar users Url Query: Annotation, Summarization, ads term Query Url: Search IP, Query Url: Personalized Search IP, Query Query: Personalized Query Suggestion Many other opportunities!

Summary Generate query suggestions using hitting time on query-click graph Personalized query suggestion Generalizable to other search tasks Future work: –Different types of graphs: e.g., query sessions –Combine with other features –Large scale evaluation 24

Thanks! 25

Download ppt "Query Suggestion Using Hitting Time Qiaozhu Mei, Dengyong Zhou, Kenneth Church University of Illinois at Urbana-Champaign Microsoft Research, Redmond."

Similar presentations