Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.

Slides:

Advertisements

Similar presentations

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Advertisements

Introduction to Information Retrieval (Part 2) By Evren Ermis.

Evaluating Search Engine

Information Retrieval in Practice

Deliverable #3: Document and Passage Retrieval Ling 573 NLP Systems and Applications May 10, 2011.

Search Engines and Information Retrieval

Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.

IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.

Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University.

Question Answering using Language Modeling Some workshop-level thoughts.

© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.

1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.

Query Reformulation: User Relevance Feedback. Introduction Difficulty of formulating user queries –Users have insufficient knowledge of the collection.

An investigation of query expansion terms Gheorghe Muresan Rutgers University, School of Communication, Information and Library Science 4 Huntington St.,

Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.

Overview of Search Engines

Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.

1 Probabilistic Language-Model Based Document Retrieval.

Evaluation David Kauchak cs458 Fall 2012 adapted from:

Evaluation David Kauchak cs160 Fall 2009 adapted from:

Search Engines and Information Retrieval Chapter 1.

Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.

Philosophy of IR Evaluation Ellen Voorhees. NIST Evaluation: How well does system meet information need? System evaluation: how good are document rankings?

Relevance Models for QA Project Update University of Massachusetts, Amherst AQUAINT meeting December, 2002 Bruce Croft and James Allan, PIs.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.

Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.

1 Query Operations Relevance Feedback & Query Expansion.

Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.

Evaluation INST 734 Module 5 Doug Oard. Agenda  Evaluation fundamentals Test collections: evaluating sets Test collections: evaluating rankings Interleaving.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Modeling term relevancies in information retrieval using Graph Laplacian Kernels Shuguang Wang Joint work with Saeed Amizadeh and Milos Hauskrecht.

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

© 2004 Chris Staff CSAW’04 University of Malta of 15 Expanding Query Terms in Context Chris Staff and Robert Muscat Department of.

Query Suggestion. n A variety of automatic or semi-automatic query suggestion techniques have been developed  Goal is to improve effectiveness by matching.

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

Performance Measures. Why to Conduct Performance Evaluation? 2 n Evaluation is the key to building effective & efficient IR (information retrieval) systems.

Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.

Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.

Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

Aruna Balasubramanian, Yun Zhou, W Bruce Croft, Brian N Levine and Arun Venkataramani Department of Computer Science, University of Massachusetts, Amherst.

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,

1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.

Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.

Michael Bendersky, W. Bruce Croft Dept. of Computer Science Univ. of Massachusetts Amherst Amherst, MA SIGIR

CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.

Indri at TREC 2004: UMass Terabyte Track Overview Don Metzler University of Massachusetts, Amherst.

(Pseudo)-Relevance Feedback & Passage Retrieval Ling573 NLP Systems & Applications April 28, 2011.

The Effect of Database Size Distribution on Resource Selection Algorithms Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University.

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

A Study of Poisson Query Generation Model for Information Retrieval

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University

Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.

Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)

Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.

University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G

Information Retrieval in Practice

John Lafferty, Chengxiang Zhai School of Computer Science

A Neural Passage Model for Ad-hoc Document Retrieval

Presentation transcript:

Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst

Topics Clarity applied to TREC QA questions Clarity applied to Web questions Clarity used to predict query expansion

Actually predicting quality of retrieved passages (or documents) Basic result: We can predict retrieval performance (with some qualifications)  Works well on TREC ad-hoc queries  Can set thresholds automatically  Works with most TREC QA question classes For example:  Where was Tesla born? Clarity score 3.57  What is sake? Clarity score 1.28 Predicting Question Quality

Clarity score computation Question Q, text Question Q, text... Passages, A... Passages ranked by P(A|Q) retrieve model question- related language model question- related language Compute divergence Clarity Score Log P terms Where was Tesla born? “nikola” “tesla” “born” “yugoslavia” “unit” “film”

Predicting Ad-Hoc Performance Correlations with Av. Precision for TREC Queries Av. Precision vs. Clarity for 100 TREC title queries. Optimal and automatic threshold values shown CollectionQueriesNum.RP-Value AP · TREC · TREC · TREC · TREC · TREC · 10 -8

Passage-Based Clarity Passages:  Whole sentence based, 250 character maximum  From top retrieved docs  Passage models smoothed with all of TREC-9 Measuring performance:  Average precision (rather than MRR)  Top ranked passages used to estimate clarity scores  Top 100 gives 99% of max correlation

Question Type# of QsRank Correlation (R) P-Value Amount Famous Location Person Time Miscellaneous Correlation by Question Type

Strong on average (R=0.255, P=10 -8 ) Allows prediction of question performance Challenging cases: Amount and Famous General comments on difficulty:  Questions have been preselected to be good questions for TREC QA track  Questions are less ambiguous in general than short queries Correlation Analysis

Precision vs. Clarity (Location Qs) Average Precision Clarity Score Where was Tesla born? Where is Venezula? What is the location of Rider College? What was Poe’s birthplace?

High clarity, low ave. prec.  Answerless, coherent context  “What was Poe’s birthplace?” “birthplace” and “Poe” do not co-occur Bad candidate passages Variant “Where was Poe born?” performs well, predicts well Low clarity, high ave. prec.  Very rare, often few correct passages  “What is the location of Rider College?” One passage containing correct answer Cannot increase language coherence among passages Ranked first, so average precision 1 Predictive Mistakes Ave. Precision Clarity Score

“Who is Zebulon Pike?”  Many correct answers decrease clarity of good ranked list “Define thalassemia.”  Passages using term are highly coherent, but often do not define it Challenging Types: Famous Average Precision Clarity Score Who is Zebulon Pike? Define thalassemia.

Web Experiments 445 well-formed questions randomly chosen from the Excite log WT10g test collection Human predicted values of quality  “Where can I purchase an inexpensive computer?” Clarity 0.89, human predicted ineffective  “Where can I find the lyrics to Eleanor Rigby?” Clarity 8.08, human predicted effective Result: Clarity scores are significantly correlated with human predictions

Distribution of Clarity Scores ClassNumberAverage Clarity P-value Predicted effective Predicted ineffective

Predicting When to Expand Questions Best simple strategy: always use expanded questions  e.g. Always use relevance model retrieval But some questions do not work well when expanded  NRRC workshop looking at this Can clarity scores be used to predict which?  Initial idea: “Do ambiguous queries get worse when expanded?” Not always.  New idea: Perform the expansion retrieval. “Can we use a modified clarity score to guess if the expansion helped?” Yes.

Using Clarity to Predict Expansion Evaluated using TREC ad-hoc data Choice: query-likelihood retrieval or relevance model retrieval Ranked list clarity: measure coherence of ranked list  Mix documents according to their rank alone  For example: top 600 documents, linearly decreasing weights Compute improvement in ranked list clarity scores  First thought: if difference positive, choose relevance model results  Best thought: if difference is higher than some threshold, choose relevance model results

Clarity and Expansion Results Choosing expansion using this method produces 51% of optimal improvement for TREC-8 Choosing when to expand has more impact in TREC-8 where expanded query performance is more mixed (only marginally better, on average, than unexpanded)  In TREC-7, only 4 queries perform really badly with relevance model and Clarity method predicts 2 of them. CollectionBaseline LMRelevance Model Clarity Prediction (predict best) Optimal (choose best) TREC TREC

Predicting Expansion Improvements Original Clarity Change in Ave. Precision killer bee attacks Legionnaires disease tourists, violence women clergy Stirling engine cosmic events

Predicting Expansion Improvements Change in Clarity (new ranked list – old) Change in Ave. Precision killer bee attacks Legionnaires disease tourists, violence women clergy Stirling engine cosmic events

Future Work Continue expansion experiments  with queries and questions Understanding the role of the corpus  predicting when coverage is inadequate  more experiments on Web, heterogeneous collections Providing a Clarity tool  user interface or data for QA system?  efficiency Better measures...