Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Slides:



Advertisements
Similar presentations
Towards Context-Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs Huanhuan Cao 1, Daxin Jiang 2, Jian Pei 3, Enhong.
Advertisements

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Query Classification Using Asymmetrical Learning Zheng Zhu Birkbeck College, University of London.
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Introduction to Information Retrieval
Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Page-level Template Detection via Isotonic Smoothing Deepayan ChakrabartiYahoo! Research Ravi KumarYahoo! Research Kunal PuneraUniv. of Texas at Austin.
COMP 630L Paper Presentation Javy Hoi Ying Lau. Selected Paper “A Large Scale Evaluation and Analysis of Personalized Search Strategies” By Zhicheng Dou,
Presented by Zeehasham Rasheed
1 Web Query Classification Query Classification Task: map queries to concepts Application: Paid advertisement 问题:百度 /Google 怎么赚钱?
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
WebPage Summarization Using Clickthrough Data JianTao Sun & Yuchang Lu, TsingHua University, China Dou Shen & Qiang Yang, HK University of Science & Technology.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Classical Music for Rock Fans?: Novel Recommendations for Expanding User Interests Makoto Nakatsuji, Yasuhiro Fujiwara, Akimichi Tanaka, Toshio Uchiyama,
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Source-Selection-Free Transfer Learning
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
1 SIGIR 2004 Web-page Classification through Summarization Dou Shen Zheng Chen * Qiang Yang Presentation : Yao-Min Huang Date : 09/15/2004.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Query Segmentation Using Conditional Random Fields Xiaohui and Huxia Shi York University KEYS’09 (SIGMOD Workshop) Presented by Jaehui Park,
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Enhancing Web Search by Promoting Multiple Search Engine Use Ryen W. W., Matthew R. Mikhail B. (Microsoft Research) Allison P. H (Rice University) SIGIR.
Post-Ranking query suggestion by diversifying search Chao Wang.
Learning User Behaviors for Advertisements Click Prediction Chieh-Jen Wang & Hsin-Hsi Chen National Taiwan University Taipei, Taiwan.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Learning in a Pairwise Term-Term Proximity Framework for Information Retrieval Ronan Cummins, Colm O’Riordan Digital Enterprise Research Institute SIGIR.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Context-Aware Ranking in Web Search (SIGIR 10’) Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, Hang Li 2010/10/26.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Conditional Random Fields & Table Extraction Dongfang Xu School of Information.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Context-Aware Modeling and Recognition of Activities in Video
A Markov Random Field Model for Term Dependencies
Topic: Semantic Text Mining
Presentation transcript:

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR Summarized and presented by Sang-il Song, IDS Lab., Seoul National University

Copyright  2010 by CEBT Query Classification  Query Classification (QC) Understanding user’s search intent Classifying user queries into predefined target categories. Difference from traditional text classification – Queries are usually very short – Many queries are ambiguous, so that it belongs to multiple categories Approaches – Augmenting the queries with extra data (search results) – Leveraging unlabeled data to help improve the accuracy of supervised learning – Expanding training data by automatically labeling some queries in some click-through data via a self-training These approaches doesn’t consider user behavior history 2

Copyright  2010 by CEBT Context Query Classification  Motivation Example Query “Jaguar” w.o. context – Ambiguous that user is interested in “car” or “animal” Query “jaguar” before “BMW” – Clear that User is interested in “car”  Context Information Adjacent queries Clicked URLs  This paper is modeling context information with CRF 3

Copyright  2010 by CEBT User Session  User search session Series of observation Each consists of a query and a set of URL, clicked by user for 4

Copyright  2010 by CEBT Taxonomy  Taxonomy Tree of categories Each node corresponds to a predefined category 5

Copyright  2010 by CEBT Conditional Random Field  Undirected graphical model  input sequence  p ij depends on feature function  Motivation for using CRF Suitable for capturing context information Doesn’t need any prior knowledge Flexible to richer feature 6 s2 s1 s3 s4 p 11 p 22 p 44 p 33 p 23 p 21 p 24 p12 p13 p14 p32 p42 p41 p43 p31 p34

Copyright  2010 by CEBT Context-Aware QC with CRF world cup worldcup.fifa.com fifa fifa10.ea.com fifa news fifaworldcup.ea.com soccer game Category Label 7

Copyright  2010 by CEBT Conditional Probability  Conditional Probability Category label sequence Observation sequence Conditional Probability – Z(o) : normalization factor Potential function – fk : feature function – lk : weight of fk 8

Copyright  2010 by CEBT Training and Classification  Training Given Training Data Objective – find a set of parameters – Maximize the conditional log-likelihood:  Inferring the category label ct for the test query as 9

Copyright  2010 by CEBT Features FeatureWhat does it use? local feature Query terms Pseudo feedbackExternal Web directory Implicit feedback External Web directory + click information contextual feature Direct Association between adjacent labels Previous labels Taxonomy-based association between adjacent labels Taxonomy structure  Feature 10

Copyright  2010 by CEBT Local Feature  Query Terms Elementary feature too sparse – training data couldn’t cover terms sufficiently  Pseudo feedback Using top M results returned by an external Web directory Mapping its category label to a category in the target taxonomy General label confidence – Meaning the number of returned related search results of whose category labels are after mapping 11

Copyright  2010 by CEBT Local Features (contd.)  Implicit feedback Similar to Pseudo feedback, but using click information click-based label confidence score Calculating 1.Using Web Directory, get corresponding categories 2.Obtain a document collection for each possible query 3.Build a Vector Space Model for each category 4.Use cosine Similarity term vector of and snippets of the 12

Copyright  2010 by CEBT Contextual Features  Direct Association between adjacent labels Using occurrence of a pair of labels The Higher the weight, the larger the probability transits into  Taxonomy-based association between adjacent labels Limited by size of training data, some transition may not occur. Using Structure of Taxonomy The association between two sibling categories stronger than that of two non-sibling categories 13

Copyright  2010 by CEBT Experimental Setup  Taxonomy of ACM KDD Cup’05 Target Taxonomy 7 level-one category 67 level-two category  Data set Extracting 10,000 sessions from one day’s search log Each session contains at least two queries Three human labelers label the queries of each session 14

Copyright  2010 by CEBT Baseline  Bridging classifier (BC) Training a classifier on an intermediate taxonomy Bridging the queries and the target taxonomy in the online step of QC Outperforming the winning approach in KDD Cup’ 05  Collaborating classifier (CC) Naïve context-aware approach Define score function of query q and category c by BC Using current query and past query, association of previous category and estimated category 15

Copyright  2010 by CEBT Evaluation  For a test query, true category label  Given the classification results is a set of the top K predicted category labels  Recall  Precision  F 1 Score 16

Copyright  2010 by CEBT Results CRF-B: CRF with Basic Features – Query terms, General label confidence and Direct association between adjacent labels CRF-B-C: CRF-B + Click-based label confidence CRF-B-C-T: CRF-B-C + Taxonomy-based association 17 The average overall recall

Copyright  2010 by CEBT Results (contd.) The average overall F 1 score 18 The average overall precision

Copyright  2010 by CEBT Case Study Without considering context, Many possible search intents – General information of Santa Fe => Information\Local & Regional – Travel information of Santa Fe => Living\Travel & Vacation 19

Copyright  2010 by CEBT Conclusions  Novel Approach for leveraging context information to classify queries by modeling search through CRFs  This approach consistently outperforms a non-context-aware baseline and a naïve context-aware baselines The effectiveness of context information 20

Copyright  2010 by CEBT Discussions  Experiments on real data set clearly show that this approach outperforms non-context-aware baseline  The first-query problem Not being able to find a search context if query is located at the beginning of the session  Experiments are too simple size of session height of taxonomy 21

Q & A Thank you