Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Alexander Kotov and ChengXiang Zhai University of Illinois at Urbana-Champaign.
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
A UTOMATICALLY A CQUIRING A S EMANTIC N ETWORK OF R ELATED C ONCEPTS Date: 2011/11/14 Source: Sean Szumlanski et. al (CIKM’10) Advisor: Jia-ling, Koh Speaker:
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.
Short Text Understanding Through Lexical-Semantic Analysis
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Towards Natural Question-Guided Search Alexander Kotov ChengXiang Zhai University of Illinois at Urbana-Champaign.
Date: 2012/3/5 Source: Marcus Fontouraet. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou 1 Efficiently encoding term co-occurrences in inverted.
FINDING RELEVANT INFORMATION OF CERTAIN TYPES FROM ENTERPRISE DATA Date: 2012/04/30 Source: Xitong Liu (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor : Dr. Koh Jia-Ling Speaker : Tu.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
Leveraging Knowledge Bases for Contextual Entity Exploration Categories Date:2015/09/17 Author:Joonseok Lee, Ariel Fuxman, Bo Zhao, Yuanhua Lv Source:KDD'15.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Queensland University of Technology
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
Compact Query Term Selection Using Topically Related Text
Topic: Semantic Text Mining
Presentation transcript:

Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries

 Introduction Query Ambiguity  Interactive Sense Feedback Sense Detection Sense Presentation  Experiments Upper-bound performance User study  Conclusion Outline

Ambiguity of query terms is a common cause of inaccurate retrieval results. Challenging task: Fully automatic sense identification and disambiguation Ambiguous queries contain one or several polysemous terms. cardinals Query Birds Sports Clergy ?? Query ambiguity is one of the main reasons for poor retrieval results - Difficult queries are often ambiguous

Solution Diversification Target sense is minority sense Relevance Feedback Top documents irrelevant doesn’t help users submit ambiguous queries spend some time and effort perusing search results,not realizing that the sense of a polysemous term that they had in mind is not the most common sense in the collection being searched.

Can search systems improve the results for difficult queries by naturally leveraging user interaction to resolve lexical ambiguity? Ideally, sense suggestions can be presented as clarification questions “Did you mean as ?” Users can leverage their intelligence and world knowledge to interpret the signals from the system and make the final decision. search systems can infer the collection-specific senses of query terms

Compare algorithms based on their upper bound retrieval performance and select the best performing one. Propose several methods for concise representation of the discovered senses. Evaluate the effectiveness of each method with the actual retrieval performance of user sense selections.

Step 1 construct a contextual term similarity matrix Step 2 construct a query term similarity graph Step 3 Label and present the senses to the users Step 4 Update the query Language Model using user feedback

Algorithm w1 w2 w3 S:sparse matrix S ij : the strength of semantic relatedness of the words wi and wj in a document collection C. Mutual Information Hyperspace Analog to Language X w and X v are binary variables indicating whether w or v are present or absent in a document.

p(X w = 1)c(X w = 1)/N p(X w = 0)1 − p(X w = 1) p(X v = 1)c(X v = 1)/N p(X v = 0)1 − p(X v = 1) p(X w = 1, X v = 1)c(X w = 1, X v = 1)/N p(X w = 1, X v = 0)[c(X w = 1) − c(X w = 1, X v = 1)]/N p(X w = 0, X v = 1)[c(X v = 1) − c(X w = 1, X v = 1)]/N p(X w = 0, X v = 0)1− p(X w = 1, X v = 0) −p(X w = 0, X v = 1) − p(X w = 1, X v = 1) Two words tend to occur in the same document,the more semantically related they are. Mutual information measures the strength of association between the two words and can be considered as a measure of their semantic relatedness. MI Sum=1

Document 1 Word1(D1,D3,D4,D6) Word2(D2,D3,D4,D6) Word3( D1, D3, D6 ) Document 2 Document 3 Document 4 Document 5 Document 6 Word4(D1,D3,D4,D6)

HAL(Hyperspace Analog to Language): Constructing the HAL space for an n-term vocabulary involves traversing a sliding window of width w over each word in the corpus, ignoring punctuation, sentence and paragraph boundaries. HAL space matrix (H) for the sentence “the effects of pollution on the population” Slide window size=10 The eff of poll on the pop Center words before and after the center word W=10 eff of poll on the pop W= CenterHAL space matrix for the sentence “the effects of pollution on the population” Center 5

 Global co-occurrence matrix: first produced by merging the row and column corresponding to each term in the HAL space matrix. Each term t corresponds to a row in the global co-occurrence matrix H t = {(t 1, c 1 ),..., (t m, c m )}: Number of co-occurrences of the term t with all other terms in the vocabulary.

Normalized to obtain the contextual term similarity matrix for the collection: theeffofpollonpop T11/347/34 5/34 T27/2005/204/203/201/20 T37/235/230 4/232/23 T47/244/245/240 3/24 T57/233/234/235/2304/23 T65/151/152/153/154/150 contextual term similarity matrix w1 w2 w3 w4 w5 w6 theeffofpollonpop T T T T T T

Step 1 construct a contextual term similarity matrix Step 2 construct a query term similarity graph Step 3 Label and present the senses to the users Step 4 Update the query Language Model using user feedback

2. For each query term construct a query term similarity graph Sense Detection Algorithm: Cluster the term graph Cluster the term graph (cluster = sense ) Clustering algorithms: Community clustering (CC) Clustering by committee (CBC)

Step 1 construct a contextual term similarity matrix Step 2 construct a query term similarity graph Step 3 Label and present the senses to the users Step 4 Update the query Language Model using user feedback

3.Label and present the senses to the users ─ using the top k terms with the highest probability in the sense language model ─ selecting a small number of the most representative terms from the sense language model as a sense label Sense Presentation : Label: 41 2

Step 1 construct a contextual term similarity matrix Step 2 construct a query term similarity graph Step 3 Label and present the senses to the users Step 4 Update the query Language Model using user feedback

4.Update the query Language Model using user feedback w1 w2 w3 w4 w Select “Fiscal”

 KL-divergence retrieval model θ q :query language model θ Di : document language model each document D i in the document collection C = {D 1,..., D m }. A B C D Θq (2) (3) (2) (3) query A B C D E ΘD (2) (2) (1) (3) (2) document A: 0.2*log(0.2/0.2)=0 B : 0.3*log(0.3/0.2)= C : 0.2*log(0.2/0.1)=0.2 D KL = D: 0.3*log(0.3/0.3)=0 E : 0 increase KL value: increase

 Datasets: 3 TREC collections  Upper bound experiments: try all detected senses for all query terms and study the potential of using sense feedback for improving retrieval results.  User study: present the labeled sense to the users and see whether users can recognize the best-performing sense; determine the retrieval performance of user selected senses.

( for difficult topics ) * indicates statistically significant difference relative to KL (95% confidence level), according to the Wilcoxon signed-rank test. † indicates statistically significant difference relative to KL-PF (95% confidence level), according to the Wilcoxon signed-rank test.

Sense feedback improved more difficult queries than Pseudo feedback(PF) in all datasets TotalDiffNorm PFSF Diff+Norm+Diff+Norm+ AP ROBUST AQUAINT

Community Clustering (CC) outperforms Clustering by Committee (CBC) HAL scores are more effective than Mutual Information (MI) Sense Feedback performs better than PF on difficult query sets

Sample senses discovered by using the community clustering algorithm in combination with the HAL scores: cancer research different types of cancer cancer treatment cancer statistics in the US senses discovered for the term “cancer” in the query “radio waves and brain cancer”

 50 AQUAINT queries along with senses determined using CC and HAL  Senses presented as: 1, 2, 3 sense label terms using the labeling algorithm ( LAB1, LAB2, LAB 3 ) 3 and 10 terms with the highest score from the sense language model ( SLM3, SLM10 )  From all senses of all query terms users were asked to pick one sense using each of the sense presentation method p(term| )

Users selected the optimal query term for disambiguation for more than half of the queries Percentage of users selecting the optimal sense of the optimal term for sense feedback (in boldface) and the optimal term, but suboptimal sense (in parenthesis).

Users sense selections do not achieve the upper bound, but consistently improve over the baselines (0.2286)

 Interactive sense feedback as a new alternative feedback method  Proposed methods for sense detection and representation that are effective for both normal and difficult queries  Promising upper bound performance all collections  User studies demonstrated that users can recognize the best-performing sense in over 50% of the cases user-selected senses can effectively improve retrieval performance for difficult queries

END

Mean Average Precision (mAP) Supplementary material R q1 ={d 3,d 56, d 129 } Three relevant documents ranked at 3, 8, 15 Precisions are 1/3, 2/8, 3/15 Average Precision=( )/3 =0.26 MAP=( )/2=0.42 R q2 ={d 3, d 9, d 25,d 56,d 123 } Three relevant documents ranked at 3, 8, 15 Precisions are 1/1,2/3,3/6,4/10,5/15 Average Precision= ( )/5=0.58