Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst.

Similar presentations


Presentation on theme: "Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst."— Presentation transcript:

1 Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst

2 Topics Clarity applied to TREC QA questions Clarity applied to Web questions Clarity used to predict query expansion

3 Actually predicting quality of retrieved passages (or documents) Basic result: We can predict retrieval performance (with some qualifications)  Works well on TREC ad-hoc queries  Can set thresholds automatically  Works with most TREC QA question classes For example:  Where was Tesla born? Clarity score 3.57  What is sake? Clarity score 1.28 Predicting Question Quality

4 Clarity score computation Question Q, text Question Q, text... Passages, A... Passages ranked by P(A|Q) retrieve model question- related language model question- related language Compute divergence Clarity Score Log P terms Where was Tesla born? “nikola” “tesla” “born” “yugoslavia” “unit” “film”

5 Predicting Ad-Hoc Performance Correlations with Av. Precision for TREC Queries Av. Precision vs. Clarity for 100 TREC title queries. Optimal and automatic threshold values shown CollectionQueriesNum.RP-Value AP88+89101-2001000.3681.2 · 10 -4 TREC-4201-250500.4903.0 · 10 -4 TREC-5251-300500.4596.5 · 10 -4 TREC-7351-400500.5772.7 · 10 -5 TREC-8401-450500.4942.7 · 10 -4 TREC7+8351-4501000.5364.8 · 10 -8

6 Passage-Based Clarity Passages:  Whole sentence based, 250 character maximum  From top retrieved docs  Passage models smoothed with all of TREC-9 Measuring performance:  Average precision (rather than MRR)  Top ranked passages used to estimate clarity scores  Top 100 gives 99% of max correlation

7 Question Type# of QsRank Correlation (R) P-Value Amount 350.171 0.16 Famous 760.148 0.10 Location 1000.308 0.0011 Person 900.245 0.010 Time 480.350 0.0082 Miscellaneous 1390.266 0.00090 Correlation by Question Type

8 Strong on average (R=0.255, P=10 -8 ) Allows prediction of question performance Challenging cases: Amount and Famous General comments on difficulty:  Questions have been preselected to be good questions for TREC QA track  Questions are less ambiguous in general than short queries Correlation Analysis

9 Precision vs. Clarity (Location Qs) Average Precision Clarity Score Where was Tesla born? Where is Venezula? What is the location of Rider College? What was Poe’s birthplace?

10 High clarity, low ave. prec.  Answerless, coherent context  “What was Poe’s birthplace?” “birthplace” and “Poe” do not co-occur Bad candidate passages Variant “Where was Poe born?” performs well, predicts well Low clarity, high ave. prec.  Very rare, often few correct passages  “What is the location of Rider College?” One passage containing correct answer Cannot increase language coherence among passages Ranked first, so average precision 1 Predictive Mistakes 0 3 1 0 Ave. Precision Clarity Score

11 “Who is Zebulon Pike?”  Many correct answers decrease clarity of good ranked list “Define thalassemia.”  Passages using term are highly coherent, but often do not define it Challenging Types: Famous Average Precision Clarity Score Who is Zebulon Pike? Define thalassemia.

12 Web Experiments 445 well-formed questions randomly chosen from the Excite log WT10g test collection Human predicted values of quality  “Where can I purchase an inexpensive computer?” Clarity 0.89, human predicted ineffective  “Where can I find the lyrics to Eleanor Rigby?” Clarity 8.08, human predicted effective Result: Clarity scores are significantly correlated with human predictions

13 Distribution of Clarity Scores ClassNumberAverage Clarity P-value Predicted effective 2232.030.00026 Predicted ineffective 2221.810.00020

14 Predicting When to Expand Questions Best simple strategy: always use expanded questions  e.g. Always use relevance model retrieval But some questions do not work well when expanded  NRRC workshop looking at this Can clarity scores be used to predict which?  Initial idea: “Do ambiguous queries get worse when expanded?” Not always.  New idea: Perform the expansion retrieval. “Can we use a modified clarity score to guess if the expansion helped?” Yes.

15 Using Clarity to Predict Expansion Evaluated using TREC ad-hoc data Choice: query-likelihood retrieval or relevance model retrieval Ranked list clarity: measure coherence of ranked list  Mix documents according to their rank alone  For example: top 600 documents, linearly decreasing weights Compute improvement in ranked list clarity scores  First thought: if difference positive, choose relevance model results  Best thought: if difference is higher than some threshold, choose relevance model results

16 Clarity and Expansion Results Choosing expansion using this method produces 51% of optimal improvement for TREC-8 Choosing when to expand has more impact in TREC-8 where expanded query performance is more mixed (only marginally better, on average, than unexpanded)  In TREC-7, only 4 queries perform really badly with relevance model and Clarity method predicts 2 of them. CollectionBaseline LMRelevance Model Clarity Prediction (predict best) Optimal (choose best) TREC 70.1880.2370.2390.262 TREC 80.2470.2480.2690.289

17 Predicting Expansion Improvements Original Clarity Change in Ave. Precision killer bee attacks Legionnaires disease tourists, violence women clergy Stirling engine cosmic events

18 Predicting Expansion Improvements Change in Clarity (new ranked list – old) Change in Ave. Precision killer bee attacks Legionnaires disease tourists, violence women clergy Stirling engine cosmic events

19 Future Work Continue expansion experiments  with queries and questions Understanding the role of the corpus  predicting when coverage is inadequate  more experiments on Web, heterogeneous collections Providing a Clarity tool  user interface or data for QA system?  efficiency Better measures...


Download ppt "Predicting Question Quality Bruce Croft and Stephen Cronen-Townsend University of Massachusetts Amherst."

Similar presentations


Ads by Google