Query Suggestion Using Hitting Time Qiaozhu Mei, Dengyong Zhou, Kenneth Church University of Illinois at Urbana-Champaign Microsoft Research, Redmond.

Slides:



Advertisements
Similar presentations
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers New York University Stern School Victor Sheng Foster Provost Panos.
Advertisements

Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Random Sampling from a Search Engines Index Ziv Bar-Yossef Maxim Gurevich Department of Electrical Engineering Technion.
1 Random Sampling from a Search Engines Index Ziv Bar-Yossef Department of Electrical Engineering, Technion Maxim Gurevich Department of Electrical Engineering,
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.
How To Use OPAC.
Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Entity-Centric Document Filtering: Boosting Feature Mapping through Meta-Features.
- A Powerful Computing Technology Department of Computer Science Wayne State University 1.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS.
Randomized Algorithms Randomized Algorithms CS648 1.
Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia This work was done when Vincent.
How To Use Google Forms to Create A Test Quick Easy Self-Graded!! Instant Reports.
ABC Technology Project
ACM CIKM 2008, Oct , Napa Valley 1 Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang.
© 2005 AT&T, All Rights Reserved. 11 July 2005 AT&T Enhanced VPN Services Performance Reporting and Web Tools Presenter : Sam Levine x111.
©2013 PROS, Inc. All rights reserved. Confidential and Proprietary. PROS Connect User Community Website and Support Portal Prepared by Christine Lambden.
Location-Based Social Networks Yu Zheng and Xing Xie Microsoft Research Asia Chapter 8 and 9 of the book Computing with Spatial Trajectories.
Text Categorization.
CAR Training Module PRODUCT REGISTRATION and MANAGEMENT Module 2 - Register a New Document - Without Alternate Formats (Run as a PowerPoint show)
University of Minnesota Location-based & Preference-Aware Recommendation Using Sparse Geo-Social Networking Data Location-based & Preference-Aware Recommendation.
Scale Free Networks.
Squares and Square Root WALK. Solve each problem REVIEW:
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
What’s New in WatchGuard Dimension v1.2
Music Recommendation by Unified Hypergraph: Music Recommendation by Unified Hypergraph: Combining Social Media Information and Music Content Jiajun Bu,
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
Week 1.
We will resume in: 25 Minutes.
INFORMATION SOLUTIONS Citation Analysis Reports. Copyright 2005 Thomson Scientific 2 INFORMATION SOLUTIONS Provide highly customized datasets based on.
CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
all-pairs shortest paths in undirected graphs
Application of Ensemble Models in Web Ranking
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
1 Search Engines What is the Internet? The Web is only part of the Internet The Internet is a computer network connecting millions of computers.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
The PageRank Citation Ranking “Bringing Order to the Web”
2010 © University of Michigan 1 Text Retrieval and Data Mining in SI - An Introduction Qiaozhu Mei School of Information Computer Science and Engineering.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Query Suggestions Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Post-Ranking query suggestion by diversifying search Chao Wang.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Exploring Social Tagging Graph for Web Object Classification
Text Retrieval and Data Mining in SI - An Introduction
Presentation transcript:

Query Suggestion Using Hitting Time Qiaozhu Mei, Dengyong Zhou, Kenneth Church University of Illinois at Urbana-Champaign Microsoft Research, Redmond

Motivating Examples 2 MSG 1. Difficult for a user to express information need 2. Difficult for a Search engine to infer information need Query Suggestions: Accurate to express the information need; Easy to infer information need Sports center Food Additive

Motivating Examples (Cont.) 3 Welcome to the hotel california Suggestions hotel california eagles hotel california hotel california band hotel california by the eagles hotel california song lyrics of hotel california listen hotel california eagle

Motivating Examples: Personalization 4 Mountain safety research Metropolis Street Racer Molten salt reactor Mars Sample Return Magnetic Stripe Reader … MSR Actually Looking for Microsoft Research…

Research Questions 5 How can we generate query suggestions in a principled way? Can we generate personalized query suggestions using the same method? Can this method be generalized to other search related tasks?

6 Rest of This Talk Random Walk, Hitting Time, and Bipartite Graph Generating Query Suggestion Personalized Query Suggestion Experiments Discussion and Summary

Random Walk and Hitting Time 7 i k A j P = 0.7 P = 0.3 Hitting Time –T A : the first time that the random walk is at a vertex in A Mean Hitting Time –h i A : expectation of T A given that the walk starts from vertex i

Computing Hitting Time 8 i k A j T A : the first time that the random walk is at a vertex in A Iterative Computation h i A : expectation of T A given that the walk starting from vertex i h = 0 h i A = 0.7 h j A h k A Apparently, h i A = 0 for those

Bipartite Graph and Hitting Time 9 Expected proximity of query i to the query A : hitting time of i A, h i A Bipartite Graph: - Edges between V 1 and V 2 - No edge inside V 1 or V 2 - Edges are weighted - e.g., V1 = query; V2 = Url A i j w(i, j) = V1V1 V2V2 7 1 A i j V1V1 V2V2 7 1 A k i j V1V1 V2V2 7 1 convert to a directed graph, even collapse one group

Generate Query Suggestion 10 T aa american airline mexiana planner_main.jsp en.wikipedia.org/wiki/Mexicana QueryUrl Construct a (kNN) subgraph from the query log data (of a predefined number of queries/urls) Compute transition probabilities p(i j) Compute hitting time h i A Rank candidate queries using h i A

Intuition Why it works? –A url is close to a query if freq(q, url) dominates the number of clicks on this url (most people use q to access url) –A query is close to the target query if it is close to many urls that are close to the target query 11

Personalized Query Suggestion Queries are ambiguous Different user different information need different query suggestions Simple approach: build the graph, compute hitting time solely based on the users history Data Sparseness –E.g., you cannot see a query if you never used it Alternative: modify the bipartite graph instead of rebuilding all 12

Personalize the Bipartite Graph 13 T aa american airline alcoholics anonymous QueryUrl en.wikipedia.org/wiki/Alcoholics_Anonymous P aa + user pseudo query: Introduce a pseudo (personali zed query) Reweight edges using personalized Probs. Key: How to compute –From w(url, user, query) – Sparse data! –Compute a smoothed p(Url | User, Query)

Personalization with Backoff (Mei and Church 08) * *.* 156.*.*.* *.*.*.* Full personalization: sparse data! No personalization: lose the opportunity Personalization with backoff: We dont have enough data for everyone! - Backoff to classes of users (e.g., IP)

Experiments Query Suggestion using Query Logs –commercial search engine log (1.5 year) –637 million queries; 585 million urls –Query-click bipartite graph Author/keyword suggestion using DBLP – titles and authors from DBLP –110k of papers, 580k authors –Coauthor graph, keyword graph, author-keyword bipartite graph Baselines: nearest neighbor; personalized pagerank 15

Result: Query Suggestion 16 Hitting time wikipedia friends friends tv show wikipedia friends home page friends warner bros the friends series friends official site friends(1994) Google friendship friends poem friendster friends episode guide friends scripts how to make friends true friends Yahoo secret friends friends reunited hide friends hi 5 friends find friends poems for friends friends quotes Query = friends

Result: Query Suggestion (II) 17 Yahoo aa route planner aa route finder aa airlines aa meetings aa autoroute aa road map Live aa route finder aa route planner aa airlines american airlines aa meeting aa road map Query = aa Hitting time alcoholics anonymous automobile association theaa american airlines american air american airline ticket reservation Hitting Time learning to rank ndcg measure ir ndcg lambdarank Chris burges pairwise test Query = ranknet

Results: Personalized Query Suggestion Query = msr 18 No personalization mountian safety research msrcorp msr outdoor equipment msr camp stoves msr snowshoes msr racing Personalized Microsoft research research what is research research website microsoft research and development yahoo research labs

Result: Author Suggestion Query = Jon Kleinberg 19 Hitting time Aleksandrs Slivkins Mark Sandler Tom Wexler Lars Backstrom Elliot Anshelevich Xiangyang Lan Nearest Neighbor; Prabhakar Raghavan Eva Tardos Daniel P. Huttenlocher David Kempe Amit Kumar Andrew Tomkins Favor students, especially current students (personalized Pagerank is similar) Famous researchers + former students

Query = olap Dimension updates OLAP data OLAP cubes OLAP queries View size Hierarchical cluster Result: Keyword Suggestion Query = social network Knowledge collaboration Community structure Resource organization Information kiosks Efficient searching Network extraction 20 Query = pagerank Pagerank computation Ranking systems Pagerank approximation Incremental computations Web spam Iterative computation

Result: Keyword Suggestion for Author 21 Baselines mining data frequent Efficient pattern data mining Baselines learning statistical kernel markov inference model Hitting Time large databases frequent pattern sequential pattern pattern mining frequent multi dimensional Query = Michael I. Jordan Query = Jiawei Han Hitting time Dirichlet process approximate inference dirichlet mean field supervised learning graphic models

Discussions Hitting time effectively boosts infrequent queries –Nearest Neighbor & personalized pagerank favorites frequent queries Fast convergence: a few iterations and a subgraph gets most of the value No parameter to tune Can be generalized to many other tasks (on different graphs) 22

Ranking on Query log Graph and Search Tasks Query Query: query suggestion Url Url: finding related pages "research.microsoft.com/users/brill IP IP:finding similar users Url Query: Annotation, Summarization, ads term Query Url: Search IP, Query Url: Personalized Search IP, Query Query: Personalized Query Suggestion Many other opportunities!

Summary Generate query suggestions using hitting time on query-click graph Personalized query suggestion Generalizable to other search tasks Future work: –Different types of graphs: e.g., query sessions –Combine with other features –Large scale evaluation 24

Thanks! 25