Presentation is loading. Please wait.

Presentation is loading. Please wait.

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Similar presentations


Presentation on theme: "Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1."— Presentation transcript:

1 Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1

2 Outline Introduction Framework of the Proposed Method Mining Query Concepts Concept Sequence Suffix Tree Experimental Evaluation Summary 2

3 Introduction What is query suggestion in search engine?  Guess user’s search intent ( user query )  suggest queries Why query suggestion is important?  Easy to issue appropriate query? No!  A “bottleneck issue” of search engine usability (Google, Yahoo, Bing, Baidu, etc) 3 Better describe user’s information need?

4 Introduction Major existing approaches (with search log data) :  Approach I: clustering queries using clicked URL data to find similar queries,  Approach II: mining pairs of queries which are adjacent or co-occur in the same query session, 4 Fig1: An example of search log data

5 Introduction Key Limitation:  None of them are context-aware: do not consider the immediately preceding queries as context,  The clustering algorithms cannot scale up to very large data well. An example:  “apple”  “steve jobs”  “apple” 5 User’s search intent? 1.8 billion query (151 million unique), 2.6 billion clicked URL(114 million unique)

6 Proposed Method Framework 6 Key steps:  Capture the context: concept sequence  Quickly find the queries that many users ask in that context Clustering queries Concept Sequence Suffix Tree

7 An example of click-through bipartites data from search log: 7 Mining Query Concepts For each query : a -normalized vector,

8 Key challenges to cluster queries:  Search log click-through bipartite could be huge: e.g., 151 million unique queries  Number of clusters is unknown  Extremely high dimensionality of query vector: 114 million unique URLs  Search logs increase dynamically Existing query clustering algorithms:  Hierarchical agglomerative method  DBSCAN method (Wen, WWW’01)  K-means, etc. 8 Mining Query Concepts

9 Proposed clustering method: 9 Mining Query Concepts

10 for each query :  Step 1: first find the closest cluster to among the clusters obtained so far  Step 2: compute the diameter of cluster  Step 3: 1) diameter, is assigned to, 2) otherwise, create a new cluster containing only quite efficient:  Only need one scan of queries  Can run efficiently on a PC of 2GM main memory 10 Mining Query Concepts

11 Tricks for algorithm efficiency improvement:  A dimension array data structure used in step 1 (sparse data)  Prune edges of low weights 11 Mining Query Concepts

12 Extract query sessions data  each individual user’s behavior (query/click) data  segment into sessions (time interval>30mins)  discard the click event data 12 Concept Sequence Suffix Tree Fig: An example of search log data

13 Concept sequence suffix tree  A structure used to efficiently find (search) the queries that many users ask in that context (concept sequence) 13 Concept Sequence Suffix Tree Fig: An example

14 Algorithm to build concept sequence suffix tree:  1) Map training session data to  2) Enumerate subsequence of (distributed, map-duce)  3) Get all frequent concept subsequences  4) Organize these into concept sequence suffix tree 14 Concept Sequence Suffix Tree

15 Algorithm for organizing into concept sequence suffix tree : 15 Concept Sequence Suffix Tree

16 Organize into concept sequence suffix tree : 1) start from root node (empty), and scan through all frequent concept subsequence cs 2) for each first find node corresponding to if cr doesn’t exist, create it 3) update the list of candidate concepts of if is among the top K (a specified threshold, e.g., K=5) candidates so far; 4) representative query of the top K candidate concepts are candidate suggestions for sequence 16 Concept Sequence Suffix Tree

17 Review an example of Concept Sequence Suffix Tree: 17 Concept Sequence Suffix Tree

18 Online query suggestion algorithm: 18 Concept Sequence Suffix Tree

19 For a query sequence :  Map it to concept sequence : if is a new query, stop mapping, and returned concept sequence corresponding to ;  Search the tree to find the longest matched subsequence of the form  Use candidate suggestions for as query suggestion for 19 Concept Sequence Suffix Tree

20 Review an example of Concept Sequence Suffix Tree: 20 Concept Sequence Suffix Tree

21 Experimental Evaluation Training Data:  A commercial search engine search log (Bing) in US  1.8 billion queries (151 million unique ), 2.6 billion URL clicks (115 million unique), 840million sessions Baseline algorithms:  Adjacency: given, rank based on frequency of  N-Gram: given, rank based on frequency of Test set data:  Test -0: 1000 randomly selected single-query case sessions  Test-1: 1000 randomly selected multi-query case sessions 21

22 Experimental Results Coverage of suggestion: 22 Fig: The coverage of the three methods on (a) Test-0 and (b) Test-1

23 Experimental Results Quality of suggestion: (collect relevance grading from 10 judges) 23 Fig: The quality of the three methods on (a) Test-0 and (b) Test-1

24 Summary Three things to know:  Some basics about query suggestion using search log  The proposed efficient query clustering algorithm for search- log click-through bipartites data  The proposed efficient context-aware query suggestion method using concept sequence suffix tree 24 Hints: “concept” level N-gram with varied length N + A structure for efficient search

25 Thank You! 25


Download ppt "Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1."

Similar presentations


Ads by Google