Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

Outline Introduction Framework of the Proposed Method Mining Query Concepts Concept Sequence Suffix Tree Experimental Evaluation Summary 2

Introduction What is query suggestion in search engine?  Guess user’s search intent ( user query )  suggest queries Why query suggestion is important?  Easy to issue appropriate query? No!  A “bottleneck issue” of search engine usability (Google, Yahoo, Bing, Baidu, etc) 3 Better describe user’s information need?

Introduction Major existing approaches (with search log data) :  Approach I: clustering queries using clicked URL data to find similar queries,  Approach II: mining pairs of queries which are adjacent or co-occur in the same query session, 4 Fig1: An example of search log data

Introduction Key Limitation:  None of them are context-aware: do not consider the immediately preceding queries as context,  The clustering algorithms cannot scale up to very large data well. An example:  “apple”  “steve jobs”  “apple” 5 User’s search intent? 1.8 billion query (151 million unique), 2.6 billion clicked URL(114 million unique)

Proposed Method Framework 6 Key steps:  Capture the context: concept sequence  Quickly find the queries that many users ask in that context Clustering queries Concept Sequence Suffix Tree

An example of click-through bipartites data from search log: 7 Mining Query Concepts For each query : a -normalized vector,

Key challenges to cluster queries:  Search log click-through bipartite could be huge: e.g., 151 million unique queries  Number of clusters is unknown  Extremely high dimensionality of query vector: 114 million unique URLs  Search logs increase dynamically Existing query clustering algorithms:  Hierarchical agglomerative method  DBSCAN method (Wen, WWW’01)  K-means, etc. 8 Mining Query Concepts

Proposed clustering method: 9 Mining Query Concepts

for each query :  Step 1: first find the closest cluster to among the clusters obtained so far  Step 2: compute the diameter of cluster  Step 3: 1) diameter, is assigned to, 2) otherwise, create a new cluster containing only quite efficient:  Only need one scan of queries  Can run efficiently on a PC of 2GM main memory 10 Mining Query Concepts

Tricks for algorithm efficiency improvement:  A dimension array data structure used in step 1 (sparse data)  Prune edges of low weights 11 Mining Query Concepts

Extract query sessions data  each individual user’s behavior (query/click) data  segment into sessions (time interval>30mins)  discard the click event data 12 Concept Sequence Suffix Tree Fig: An example of search log data

Concept sequence suffix tree  A structure used to efficiently find (search) the queries that many users ask in that context (concept sequence) 13 Concept Sequence Suffix Tree Fig: An example

Algorithm to build concept sequence suffix tree:  1) Map training session data to  2) Enumerate subsequence of (distributed, map-duce)  3) Get all frequent concept subsequences  4) Organize these into concept sequence suffix tree 14 Concept Sequence Suffix Tree

Algorithm for organizing into concept sequence suffix tree : 15 Concept Sequence Suffix Tree

Organize into concept sequence suffix tree : 1) start from root node (empty), and scan through all frequent concept subsequence cs 2) for each first find node corresponding to if cr doesn’t exist, create it 3) update the list of candidate concepts of if is among the top K (a specified threshold, e.g., K=5) candidates so far; 4) representative query of the top K candidate concepts are candidate suggestions for sequence 16 Concept Sequence Suffix Tree

Review an example of Concept Sequence Suffix Tree: 17 Concept Sequence Suffix Tree

Online query suggestion algorithm: 18 Concept Sequence Suffix Tree

For a query sequence :  Map it to concept sequence : if is a new query, stop mapping, and returned concept sequence corresponding to ;  Search the tree to find the longest matched subsequence of the form  Use candidate suggestions for as query suggestion for 19 Concept Sequence Suffix Tree

Review an example of Concept Sequence Suffix Tree: 20 Concept Sequence Suffix Tree

Experimental Evaluation Training Data:  A commercial search engine search log (Bing) in US  1.8 billion queries (151 million unique ), 2.6 billion URL clicks (115 million unique), 840million sessions Baseline algorithms:  Adjacency: given, rank based on frequency of  N-Gram: given, rank based on frequency of Test set data:  Test -0: 1000 randomly selected single-query case sessions  Test-1: 1000 randomly selected multi-query case sessions 21

Experimental Results Coverage of suggestion: 22 Fig: The coverage of the three methods on (a) Test-0 and (b) Test-1

Experimental Results Quality of suggestion: (collect relevance grading from 10 judges) 23 Fig: The quality of the three methods on (a) Test-0 and (b) Test-1

Summary Three things to know:  Some basics about query suggestion using search log  The proposed efficient query clustering algorithm for search- log click-through bipartites data  The proposed efficient context-aware query suggestion method using concept sequence suffix tree 24 Hints: “concept” level N-gram with varied length N + A structure for efficient search

Thank You! 25

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Similar presentations

Presentation on theme: "Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Similar presentations

Presentation on theme: "Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1."— Presentation transcript:

Similar presentations

About project

Feedback