Rutgers Information Interaction Lab at TREC 2005: Trying HARD N.J. Belkin, M. Cole, J. Gwizdka, Y.-L. Li, J.-J. Liu, G. Muresan, D. Roussinov*, C.A. Smith, A. Taylor, X.-J. Yuan Rutgers University; *Arizona State University
Our Major Goal Clarification forms (CFs) are simulations of user-system interaction Users are unwilling to engage in explicit interaction unless payoff is high, and interaction is understood as relevant Is explicit interaction worthwhile, and if so, under what circumstances?
General Approach to the Question Use relatively “standard” interactive elicitation techniques to enhance/ disambiguate original query Compare results to baseline Compare results to baseline plus relatively “standard” non-interactive query enhancement techniques, in particular, pseudo-rf
Methods for Automatic Query Enhancement Pseudo-relevance feedback (standard Lemur) Language modeling-based query expansion (clarity), derived from collection Web-based query expansion
Methods for User-Based Query Enhancement User selection of terms suggested by “clarity” and web methods (user selection based on Koenemann & Belkin, 1996; Belkin, et al., 2000) Elicitation of extended information problem descriptions (elicitation based on Kelly, Dollu & Fu, 2004; 2005)
Hypotheses for Automatic Enhancement H1: Query expansion using “clarity”- derived terms will improve performance over baseline & baseline + pseudo-rf H2: Query expansion using web-derived terms will improve performance, ditto H2b: Query expansion using both clarity- and web-derived terms will improve performance, ditto
Hypotheses for User-Based Query Enhancement H3: Query expansion with terms selected by the user from those suggested by clarity- and web-derived terms will improve performance, over everything else H4: Query expansion using “problem statements” elicited from users will increase performance over baseline & baseline + pseudo-rf
Hypothesis for When Elicitation is Useful H5: The effectiveness of query expansion using problem statements will be negatively correlated with query clarity.
Query Run Designations RUTGBL: Baseline query (title + description) RUTGBF3: Baseline + pseudo-rf (Lemur) RUTGWS1: Baseline + 0.1(Web-suggested terms) RUTGLS1: Baseline + 0.1(clarity-suggested terms) RUTGAS1: Baseline + 0.1(all suggested terms) RUTGUS1: Baseline + 0.1(terms selected by user) RUTGUG1: Baseline + 0.1(user-generated terms) RUTGALL: Baseline + all suggested terms and all user-generated terms
Identification of Suggested Terms Clarity: Compute query clarity for topic baseline (Lemur QueryClarity); sort terms accordingly; choose top ten Web: Next slide, please
Title: human smuggling Description: Identify incidents of human smuggling Navigation by Expansion Paradigm (NBE) aliens arrested border haitians trafficked undocumented
Navigation by Expansion Paradigm (NBE) Step1: Overview of the surroundings –Produces words and phrases “clearly related” to the topic –Internet mining: topic sent to Google –Logistic regression on the “signal to noise” ratio: Signal = df(results)/#results Noise = df(web)/#web Pr = 1 – exp (-(signal/noise – 1)/a) Step2: Valid “moves” identified –Related concepts from step 1 and those that Are present in AQUAINT Would affect search results if selected: impact estimate = P*df*idf Step 3: Selected moves executed –E.g. by query expansion: Score = original score + expansion score * expansion factor
“Combination” Run Combining pseudo-rf with user-selected terms from CF1 (run RUTBE) R-Prec. for RUTBE Substantially better than all other runs, but not comparable, because using different ranking function (BM25) and different differential weighting (0.3 for added terms) Indicative of possible improvements
User Selection (CF1)
User Generation (CF2)
System Implementation Lemur 3.1, 4.0, 4.1, using StructQueryEval Could we ask for somewhat more detailed documentation from the Lemur group?
Comparison to Other Sites R-precision MAP MeanSDMeanSDMeanSD Overall Baseline median RUTGBL Overall Final median RUTGALL0.299* **0.31
R-Precision for Test Runs
Summary of Significant Differences, R-Prec. BLAS1LS1WS1US1UG1BF3ALL BL AS1 * LS1 * n/s---- WS1 * n/s ---- US1 * n/s ---- UG1 * n/s ---- BF3 n/s n/s ---- ALL n/s n/s ----
Varying Weights of Baseline Terms w.r.t.CF2 Terms
Varying Weights of CF2 Terms w.r.t. Baseline Terms
CF2 & Baseline Terms, Equal Weights Run nameR-PrecisionPrecision at 10Mean Average Precision MeanSDMeanSDMeanSD RUTGBL Q * Q * Q ** **0.175 Q1Q20.298* ** **0.190 Q1Q30.313* *** **0.186 Q1Q2Q30.314** *** **0.190
Results w.r.t. Hypotheses H1, H2, H3, H4 weakly supported w.r.t. baseline, not to pseudo-rf H5 not supported –No correlation between baseline query clarity, and effectiveness of expanding with CF2 terms
Discussion (1) Both automatic and user-based query enhancement improved performance over baseline, but not over pseudo-rf No significant differences in performance between any enhancement methods, except Q1 v. Q1+Q3 (r-precision, vs )
Discussion (2) Some benefit both from automatic methods, and to explicit interaction with user, which require some effort from the user that goes beyond initial query formulation This interpretation of the results depends on the assumption that title+description queries are accurate simulations of user behavior
(Tentative) Conclusions Results indicate that invoking user interaction for query clarification is unlikely to be cost effective Alternative might be to develop ways to encourage more elaborate query formulation in the first instance, enhanced with automatic methods. Subsequent enhancement could be via implicit sources of evidence, rather than explicit questioning, requiring no additional effort from the user.