Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.

Slides:

Advertisements

Similar presentations

Towards Context-Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs Huanhuan Cao 1, Daxin Jiang 2, Jian Pei 3, Enhong.

Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.

WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Personal Name Classification in Web queries Dou Shen*, Toby Walker*, Zijian Zheng*, Qiang Yang**, Ying Li* *Microsoft Corporation ** Hong Kong University.

Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.

Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1

Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.

Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.

Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.

FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.

COMP 630L Paper Presentation Javy Hoi Ying Lau. Selected Paper “A Large Scale Evaluation and Analysis of Personalized Search Strategies” By Zhicheng Dou,

1 Web Query Classification Query Classification Task: map queries to concepts Application: Paid advertisement 问题：百度 /Google 怎么赚钱？

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.

TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.

«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,

Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.

PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.

11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,

1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,

Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.

Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization --- Lei Tang, Jianping Zhang and Huan Liu.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Algorithmic Detection of Semantic Similarity WWW 2005.

Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.

Hongbo Deng, Michael R. Lyu and Irwin King

Post-Ranking query suggestion by diversifying search Chao Wang.

Learning User Behaviors for Advertisements Click Prediction Chieh-Jen Wang & Hsin-Hsi Chen National Taiwan University Taipei, Taiwan.

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

NTU & MSRA Ming-Feng Tsai

11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

Bringing Order to the Web : Automatically Categorizing Search Results Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Hao Chen Susan Dumais.

An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.

1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.

Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.

Topic Modeling for Short Texts with Auxiliary Word Embeddings

Bridging Domains Using World Wide Knowledge for Transfer Learning

Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.

Detecting Online Commercial Intention (OCI)

Mining Query Subtopics from Search Log Data

Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

WSExpress: A QoS-Aware Search Engine for Web Services

Topic: Semantic Text Mining

Presentation transcript:

Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University of Science and Technology of China, 2 Hong Kong University of Science and Technology, 3 Microsoft Corporation 4 Microsoft Research Asia

Motivation Understanding Web user's information need is one of the most important problems in Web search. Such information could generally help improving the quality of many Web search services such as: – Ranking – Online advertising – Query suggestion, etc.

Challenges The main challenges of query classification: – Lack of feature information – Ambiguity – Multiple intents The first problem has been studied widely: – Query expansion by top search results – Leverage a web directory However, the second and the third problems are far away from being closed.

Why context is useful? Context means the previous queries and clicked URLs in the same session given a query. It’s assumed that: – Context has semantic relation with the current query. – Context may help to label appropriate categories for current query. It makes sense to exploit context for specifying the current query.

… Michael JordanExample

Example Chicago Bulls Basketball NBA… Michael Jordan

Hierarchical Dirichlet Process LDA Graphical Model Michael JordanExample

Overview Problem statement Model query context by CRF Features of CRF Experiment Conclusion and future work

Problem Statement: Context In a user search session, suppose the user has raised a series of queries as q 1 q 2 …q T-1 and clicked some returned URLs U 1 U 2 …U T-1 ; If the user raises a query q T at time T, we call q 1 q 2 …q T-1 and U 1 U 2 …U T-1 as query context of q T And we call q t t (t ∈ [1, T - 1]) as contextual queries of q T.

Query Context U_1 Q_1 U_2 Q_2 U_3 Q_3 U_... Q_...Q_T Query Context of {Q_T}

Problem Statement: QC with context and Taxonomy The objective of query classification (QC) with context is to classify a user query q T into a ranked list of K categories c T1, c T2,..., c TK, among N c categories {c 1,c 2,…,c Nc }, given the context of q T. A target taxonomy Υ is a tree of categories where {c 1,c 2,…,c Nc } are leaf nodes of this tree.

Modeling Query Context by CRF where q represents q 1 q 2 …q t

Why CRF? The two main advantages of CRF are: – 1) It can incorporate general feature functions to model the relation between observations and unobserved states; – 2) It doesn't need prior knowledge of the type of conditional distribution. Given 1), we can incorporate some external web knowledge. Given 2), we don’t need any assumptions of the type of p(c|q).

Features of CRF When we use CRF to model query context, one of the most important part is to choose effective feature functions. We should consider: – Relevance between queries and category labels for leveraging local information of queries; – Relevance between adjacent labels for leveraging contextual information.

Relevance between queries and category labels Term occurrence – The terms of q t are obvious features for supporting c t – Due to the limited size of training data, many useful terms indicating category information may be uncovered. General label confidence – Leverage an external web directory such as Google Directory; – where M means the number of returned results and M ct,qt means the number of returned results with label c t after mapping.

Relevance between queries and category labels Click-aware label confidence – Combining the click-information with the knowledge of a external web directory; – – CConf(c t,u t ) can be calculated by multiple approaches. – Here, we use VSM to calculate cosine similarity between term vectors of c t and u t

Relevance between Adjacent Labels Direct relevance between adjacent labels – Occurrence of adjacent label pair – The weight implies how likely the two labels co-occur Taxonomy based relevance between adjacent labels – Limited by the sampling approach and size of the training data, some reasonable adjacent label pairs may not occur proportionally or even not occur at all. – Consider indirect relevance between adjacent labels by considering the taxonomy.

Experiment Data set: – 10,000 random selected sessions from one day’s search log of a commercial search engine. – Three labelers firstly label all possible categories with KDDCUP’05 taxonomy for each unique query of the training data.

Examples of multiple category queries A large ratio of multiple category queries implies the difficulty of QC without context.

Label Sessions Then the three human labelers are asked to cross label each session of the data set with a sequence of level-2 category labels. For each query, a labeler gives a most appropriate category label by considering: – Query itself; – The query context; – Clicked URLs of the query.

Tested Approaches Baselines: – Non context-aware baseline: Bridging classifier(BC) proposed by Shen et al. – Naïve context-aware baseline: Collaborating classifier(CC). Combine a test query and the previous query to classify with BC. CRFs: – CRF-B: CRF with basic features including term occurrence, general label confidence and direct relevance between adjacent labels) – CRF-B-C: CRF with basic features + click-aware label confidence) – CRF-B-C-T: CRF with basic features + click-aware label confidence + taxonomy based relevance)

Evaluation Metrics Given a test session q 1 q 2 …q T, we let the q T be the test query and let queries q 1 q 2 …q T-1 and corresponding clicked URL sets U 1 U 2 …U T-1 be the query context. For q T,we evaluate a tested approach by: – Precision(P): δ(c T ∈ C T,K )/K – Recall(R): δ(c T ∈ C T,K ) – F 1 score(F 1 ): 2*P*R/(P+R) Where c T means the ground truth label and C T,K means a set of the top K labels. δ(*) is a Boolean function of indicating whether * is true (=1) or false (=0).

Overall results 1) The naïve context- aware baseline consistently outperforms the non context-aware baseline. 2) CRFs consistently outperform the two baselines. 3) CRF-B-C-T > CRF-B-C > CRF-B: click information and taxonomy based relevance are useful.

Case study Context about travel Click a travel guide web page Give the most appropriate label in the first position

Efficiency of Our Approach Offline training: – Each iteration takes about 300ms – Time cost of training a CRF is acceptable Online cost: – Calculating features Label confidence

Conclusion and Future work In this paper, we propose a novel approach for query classification by modeling query context via CRFs. Experiments on a real search log clearly show that our approach outperforms a non context-aware baseline and a naive context-aware baselines. Current approach cannot leverage the contextual information of the beginning queries of sessions, which make us carry on our following researches for leveraging more contextual information out of sessions.

Thanks