Alexander Kotov and ChengXiang Zhai University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 116.
Advertisements

1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 107.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 40.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 28.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 75.
Alexander Kotov, ChengXiang Zhai, Richard Sproat University of Illinois at Urbana-Champaign.
ACM SIGIR 2009 Workshop on Redundancy, Diversity, and Interdependent Document Relevance, July 23, 2009, Boston, MA 1 Modeling Diversity in Information.
Image Retrieval With Relevant Feedback Hayati Cam & Ozge Cavus IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK Hayati CAM Ozge CAVUS.
Active Feedback: UIUC TREC 2003 HARD Track Experiments Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Web Document Clustering: A Feasibility Demonstration Hui Han CSE dept. PSU 10/15/01.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Using Natural Language Program Analysis to Locate and understand Action-Oriented Concerns David Shepherd, Zachary P. Fry, Emily Hill, Lori Pollock, and.
11 September 2002IR/LM workshop, Amherst1 Information retrieval, language and ‘language models’ Stephen Robertson Microsoft Research Cambridge and City.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Important Task in Patents Retrieval Recall is an Important Factor Given Query Patent -> the Task is to Search all Related Patents Patents have Complex.
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Overview of Search Engines
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Information Retrieval and Web Search Relevance Feedback. Query Expansion Instructor: Rada Mihalcea Class web page:
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
COMP423.  Query expansion  Two approaches ◦ Relevance feedback ◦ Thesaurus-based  Most Slides copied from ◦
Evaluation Experiments and Experience from the Perspective of Interactive Information Retrieval Ross Wilkinson Mingfang Wu ICT Centre CSIRO, Australia.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Tag Data and Personalized Information Retrieval 1.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
1 Query Operations Relevance Feedback & Query Expansion.
Towards Natural Question-Guided Search Alexander Kotov ChengXiang Zhai University of Illinois at Urbana-Champaign.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Vector Space Models.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Information Retrieval
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Automatic Labeling of Multinomial Topic Models
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
Automatic Labeling of Multinomial Topic Models Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai DAIS The Database and Information Systems Laboratory.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
A Study of Poisson Query Generation Model for Information Retrieval
Context-Sensitive IR using Implicit Feedback Xuehua Shen, Bin Tan, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Query expansion COMP423. Menu Query expansion Two approaches Relevance feedback Thesaurus-based Most Slides copied from
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Exploiting Wikipedia as External Knowledge for Document Clustering
Course Summary (Lecture for CS410 Intro Text Info Systems)
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Presentation transcript:

Alexander Kotov and ChengXiang Zhai University of Illinois at Urbana-Champaign

Roadmap Query Ambiguity Interactive Sense Feedback Experiments Upper-bound performance User study Summary Future work

Query ambiguity birds? sports?clergy? Ambiguous queries contain one or several polysemous terms Query ambiguity is one of the main reasons for poor retrieval results (difficult queries are often ambiguous) Senses can be major and minor, depending on the collection Automatic sense disambiguation proved to be a very challenging fundamental problem in NLP and IR [Lesk 86, Sanderson 94]

Query ambiguity baseball college team bird sports intent: roman catholic cardinals 4 bird

Query ambiguity top documents irrelevant; relevance feedback wont help Did you mean cardinals as a bird, team or clerical? 5 target sense is minority sense; even diversity doesnt help Can search systems improve the results for difficult queries by naturally leveraging user interaction to resolve lexical ambiguity?

Roadmap Query Ambiguity Interactive Sense Feedback Experiments Upper-bound performance User study Summary Future work

Interactive Sense Feedback Uses global analysis for sense identification: does not rely on retrieval results (can be used for difficult queries) identifies collection-specific senses and avoids the coverage problem identifies both majority and minority senses domain independent Presents concise representations of senses to the users: eliminates the cognitive burden of scanning the results Allows the users to make the final disambiguation choice: leverages user intelligence to make the best choice 7

Questions How can we automatically discover all the senses of a word in a collection? How can we present a sense concisely to a user? Is interactive sense feedback really useful?

Algorithm for Sense Feedback 1. Preprocess the collection to construct a V x V global term similarity matrix (rows: all semantically related terms to each term in V) 2. For each query term construct a term graph 3. Cluster the term graph (cluster = sense) 4. Label and present the senses to the users 5. Update the query LM using user feedback 9

Sense detection Methods for term similarity matrix construction: Mutual Information (MI) [Church 89] Hyperspace Analog to Language (HAL) scores [Burgess 98] Clustering algorithms: Community clustering (CC) [Clauset 04] Clustering by committee (CBC) [Pantel 02] 10

Sense detection 11

Sense representation Sort terms in the cluster according to the sum of weights of edges to neighbors Algorithm for sense labeling: While exist uncovered terms: 2. Select uncovered term with the highest weight and add it to the set of sense labels 3. Add the terms related to the selected term to the cover Label: 41

Roadmap Query Ambiguity Interactive Sense Feedback Experiments Upper-bound performance User study Summary Future work

Experimental design Datasets: 3 TREC collections AP88-89, ROBUST04 and AQUAINT Upper bound experiments: try all detected senses for all query terms and study the potential of using sense feedback for improving retrieval results User study: present the labeled sense to the users and see whether users can recognize the best-performing sense; determine the retrieval performance of user selected senses

Upper-bound performance Community Clustering (CC) outperforms Clustering by Committee (CBC) HAL scores are more effective than Mutual Information (MI) Sense Feedback performs better than PRF on difficult query sets 15

UB performance for difficult topics Sense feedback outperforms PRF in terms of MAP and particularly in terms of (boldface = statistically significant (p<.05) w.r.t. KL; underline = w.r.t. to KL-PF) 16 KLKL-PFSF AP88-89 MAP ROBUST04 MAP AQUAINT MAP

UB performance for difficult topics Sense feedback improved more difficult queries than PF in all datasets 17 TotalDiffNorm PFSF Diff+Norm+Diff+Norm+ AP ROBUST AQUAINT

User study 50 AQUAINT queries along with senses determined using CC and HAL Senses presented as: 1, 2, 3 sense label terms using the labeling algorithm ( LAB1, LAB2, LAB 3 ) 3 and 10 terms with the highest score from the sense language model ( SLM3, SLM10 ) From all senses of all query terms users were asked to pick one sense using each of the sense presentation method Query LM was updated with the LM of the selected sense and retrieval results for the updated query used for evaluation

User study Query #378: Sense 1Sense 2Sense 3 european0.044yen0.056exchange0.08 eu0.035frankfurt0.045stock0.075 union0.035germany0.044currency0.07 economy0.032franc0.043price0.06 country0.032pound0.04market0.055 LAB1: [european] [yen] [exchange] LAB2: [european union] [yen pound] [exchange currency] LAB3: [european union country] [yen pound bc] [exchange currency central] SLM3: [european eu union] [yen frankfurt germany] [exchnage stock currency] European Union? Currency? euroopposition

User study Users selected the optimal query term for disambiguation for more than half of the queries; Quality of sense selections does not improve with more terms in the label 20 LAB1LAB2LAB3SLM3SLM10 USER 118 (56)18 (60)20 (64)36 (62)30 (60) USER 224 (54)18 (50)12 (46)20 (42)24 (54) USER 328 (58)20 (50)22 (46)26 (48)22 (50) USER 418 (48)18 (50)18 (52)20 (48)28 (54) USER 526 (64)22 (60)24 (58)24 (56)16 (50) USER 622 (62)26 (64)26 (60)28 (64)30 (62)

User study Users sense selections do not achieve the upper bound, but consistently improve over the baselines (KL MAP=0.0474; PF MAP=0.0371) Quality of sense selections does not improve with more terms in the label 21 LAB1LAB2LAB3SLM3SLM10 USER USER USER USER USER USER

Roadmap Query Ambiguity Interactive Sense Feedback Experiments Upper-bound performance User study Summary Future work

Summary Interactive sense feedback as a new alternative feedback method Proposed methods for sense detection and representation that are effective for both normal and difficult queries Promising upper bound performance all collections User studies demonstrated that users can recognize the best-performing sense in over 50% of the cases user-selected senses can effectively improve retrieval performance for difficult queries 23

Future work Further improve approaches to automatic sense detection and labeling (e.g, using Wikipedia) Implementation and evaluation of sense feedback in a search engine application as a complimentary strategy to results diversification 24