Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Evaluating Search Engine
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Query-Based Outlier Detection in Heterogeneous Information Networks Jonathan Kuck 1, Honglei Zhuang 1, Xifeng Yan 2, Hasan Cam 3, Jiawei Han 1 1 University.
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007.
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Web Document Clustering By Sang-Cheol Seok. 1.Introduction: Web document clustering? Why ? Two results for the same query ‘amazon’ Google : currently.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Intent Subtopic Mining for Web Search Diversification Aymeric Damien, Min Zhang, Yiqun Liu, Shaoping Ma State Key Laboratory of Intelligent Technology.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Algorithmic Detection of Semantic Similarity WWW 2005.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments CIKM2004 Speaker : Yao-Min Huang Date : 2005/03/10.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
Post-Ranking query suggestion by diversifying search Chao Wang.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
1 CS 430: Information Discovery Lecture 5 Ranking.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Cohesive Subgraph Computation over Large Graphs
User Modeling for Personal Assistant
Evaluation Anisio Lacerda.
CS 430: Information Discovery
Lin Lu, Margaret Dunham, and Yu Meng
Presentation transcript:

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Outline Introduction Framework of the Proposed Method Mining Query Concepts Concept Sequence Suffix Tree Experimental Evaluation Summary 2

Introduction What is query suggestion in search engine?  Guess user’s search intent ( user query )  suggest queries Why query suggestion is important?  Easy to issue appropriate query? No!  A “bottleneck issue” of search engine usability (Google, Yahoo, Bing, Baidu, etc) 3 Better describe user’s information need?

Introduction Major existing approaches (with search log data) :  Approach I: clustering queries using clicked URL data to find similar queries,  Approach II: mining pairs of queries which are adjacent or co-occur in the same query session, 4 Fig1: An example of search log data

Introduction Key Limitation:  None of them are context-aware: do not consider the immediately preceding queries as context,  The clustering algorithms cannot scale up to very large data well. An example:  “apple”  “steve jobs”  “apple” 5 User’s search intent? 1.8 billion query (151 million unique), 2.6 billion clicked URL(114 million unique)

Proposed Method Framework 6 Key steps:  Capture the context: concept sequence  Quickly find the queries that many users ask in that context Clustering queries Concept Sequence Suffix Tree

An example of click-through bipartites data from search log: 7 Mining Query Concepts For each query : a -normalized vector,

Key challenges to cluster queries:  Search log click-through bipartite could be huge: e.g., 151 million unique queries  Number of clusters is unknown  Extremely high dimensionality of query vector: 114 million unique URLs  Search logs increase dynamically Existing query clustering algorithms:  Hierarchical agglomerative method  DBSCAN method (Wen, WWW’01)  K-means, etc. 8 Mining Query Concepts

Proposed clustering method: 9 Mining Query Concepts

for each query :  Step 1: first find the closest cluster to among the clusters obtained so far  Step 2: compute the diameter of cluster  Step 3: 1) diameter, is assigned to, 2) otherwise, create a new cluster containing only quite efficient:  Only need one scan of queries  Can run efficiently on a PC of 2GM main memory 10 Mining Query Concepts

Tricks for algorithm efficiency improvement:  A dimension array data structure used in step 1 (sparse data)  Prune edges of low weights 11 Mining Query Concepts

Extract query sessions data  each individual user’s behavior (query/click) data  segment into sessions (time interval>30mins)  discard the click event data 12 Concept Sequence Suffix Tree Fig: An example of search log data

Concept sequence suffix tree  A structure used to efficiently find (search) the queries that many users ask in that context (concept sequence) 13 Concept Sequence Suffix Tree Fig: An example

Algorithm to build concept sequence suffix tree:  1) Map training session data to  2) Enumerate subsequence of (distributed, map-duce)  3) Get all frequent concept subsequences  4) Organize these into concept sequence suffix tree 14 Concept Sequence Suffix Tree

Algorithm for organizing into concept sequence suffix tree : 15 Concept Sequence Suffix Tree

Organize into concept sequence suffix tree : 1) start from root node (empty), and scan through all frequent concept subsequence cs 2) for each first find node corresponding to if cr doesn’t exist, create it 3) update the list of candidate concepts of if is among the top K (a specified threshold, e.g., K=5) candidates so far; 4) representative query of the top K candidate concepts are candidate suggestions for sequence 16 Concept Sequence Suffix Tree

Review an example of Concept Sequence Suffix Tree: 17 Concept Sequence Suffix Tree

Online query suggestion algorithm: 18 Concept Sequence Suffix Tree

For a query sequence :  Map it to concept sequence : if is a new query, stop mapping, and returned concept sequence corresponding to ;  Search the tree to find the longest matched subsequence of the form  Use candidate suggestions for as query suggestion for 19 Concept Sequence Suffix Tree

Review an example of Concept Sequence Suffix Tree: 20 Concept Sequence Suffix Tree

Experimental Evaluation Training Data:  A commercial search engine search log (Bing) in US  1.8 billion queries (151 million unique ), 2.6 billion URL clicks (115 million unique), 840million sessions Baseline algorithms:  Adjacency: given, rank based on frequency of  N-Gram: given, rank based on frequency of Test set data:  Test -0: 1000 randomly selected single-query case sessions  Test-1: 1000 randomly selected multi-query case sessions 21

Experimental Results Coverage of suggestion: 22 Fig: The coverage of the three methods on (a) Test-0 and (b) Test-1

Experimental Results Quality of suggestion: (collect relevance grading from 10 judges) 23 Fig: The quality of the three methods on (a) Test-0 and (b) Test-1

Summary Three things to know:  Some basics about query suggestion using search log  The proposed efficient query clustering algorithm for search- log click-through bipartites data  The proposed efficient context-aware query suggestion method using concept sequence suffix tree 24 Hints: “concept” level N-gram with varied length N + A structure for efficient search

Thank You! 25