Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.

Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper Speaker: Tom Nov 8 th, 2011

It is very difficult to improve retrieval models BM25 [Robertson et al. 1994] Pivoted length normalization (PIV) [Singhal et al. 1996] Query likelihood with Dirichlet prior (DIR) [Ponte & Croft 1998; Zhai & Lafferty 2001] PL2 [Amati & Rijsbergen 2002] 2 17 years 15 years 10 years 9 years All these models remain strong baselines today after so many years!

3 1. Why does it seem to be so hard to beat these state-of-the-art retrieval models {BM25, PIV, DIR, PL2 …}? 2. Are they hitting the ceiling?

Key heuristic in all effective retrieval models: term frequency (TF) normalization by document length [Singhal et al. 96; Fang et al. 04] BM25 DIR: Query likelihood with Dirichlet prior 4 PIV and PL2 implement similar retrieval heuristics Term Frequency Document length Term discrimination

However, the component of TF normalization by document length is NOT lower-bounded properly BM25 DIR: Query likelihood with Dirichlet prior 5 When a document is very long, its score from matching a query term could be too small!

As a result, long documents could be overly penalized D 2 matches the query term, while D 1 does not Score PL2 S(D 2 ) < S(D 1 ) Score DIR S(D 2 ) < S(D 1 )

Empirical evidence: long documents indeed overly penalized 7 Prob. of relevance/retrieval: the probability of a randomly selected relevant/retrieved document having a certain document length [Singhal et al. 96] Relevance Retrieval Relevance Document length

8 Functionality analysis of retrieval models Bug TF normalization not lower-bounded properly, and long documents overly penalized Are these retrieval models sharing this similar bug because they all violate some necessary retrieval heuristics? Can we formally capture these necessary heuristics? White-box Testing

Two novel heuristics for regulating the interactions between TF and doc. length There should be a sufficiently large gap between the presence and absence of a query term –Document length normalization should not cause a very long document with a non-zero TF to receive a score too close to or even lower than a short document with a zero TF A short document that only covers a very small subset of the query terms should not easily dominate over a very long document that contains many distinct query terms 9 LB2 LB1

Lower-bounding constraint 1 (LB1): Occurrence > Non-Occurrence 10 D1:D1: w Score(Q, D 1 ) = Score(Q, D 2 ) Score(Q’, D 1 ) < Score(Q’, D 2 ) Q: w D2:D2: wq Q’: w q

Lower-bounding constraint 2 (LB2): First Occurrence > Repeated Occurrence 11 D1:D1: q1q1 Score(Q, D 1 ) = Score(Q, D 2 ) D2:D2: q1q1 D1’:D1’: q1q1 q1q1 D2’:D2’: q1q1 q2q2 Q: q1q1 q2q2 Score(Q, D 1 ’) < Score(Q, D 2 ’)

BM25 satisfies LB1 but violates LB2 LB1 is satisfied unconditionally LB2 is equivalent to: 12 (Parameters: k 1 > 0 && 0 < b < 1) Long documents tend to violate LB2 Large b or k 1 violates LB2 easily

DIR satisfies LB2 but violates LB1 LB2 is equivalent to: LB1 is equivalent to: 13 Long documents tend to violate LB1 satisfied unconditionally! Large µ or non-discriminative terms violate LB1 easily

No retrieval model satisfies both constraints 14 ModelLB1LB2Parameter and/or query restrictions BM25YesNob and k 1 should not be too large PIVYesNos should not be too large PL2No c should not be too small DIRNoYesµ should not be too large; query terms should be discriminative Can we "fix" this problem for all the models in a general way?

Solution: a general approach to lower- bounding TF normalization The score of a document D from matching a query term t: 15 Term discrimination BM25 DIR PIV and PL2 also have their corresponding components

Solution: a general approach to lower- bounding TF normalization (Cont.) Objective: an improved version that does not hurt other retrieval heuristics, but A heuristic solution: 16 l can be absorbed into δ which satisfies all retrieval heuristics that are satisfied by

Example: BM25+, a lower-bounded version of BM25 17 BM25: BM25+: BM25+ incurs almost no additional computational cost Similarly, we can also improve PIV, DIR, and PL2, leading to PIV+, DIR+, and PL2+ respectively

BM25+ can satisfy both LB1 and LB2 Similarly to BM25, BM25+ satisfies LB1 LB2 can also be satisfied unconditionally if: 18 Experiments show later that setting δ = 1.0 works very well

The proposed approach can fix or alleviate the problem of all these retrieval models 19 BM25+Yes PIV+Yes PL2+Yes DIR+AlleviatedYes BM25YesNo PIVYesNo PL2No DIRNoYes Current retrieval models Improved retrieval models LB1LB2

Experiment Setup Standard TREC document collections –Web: WT2G, WT10G, and Terabyte –News: Robust04 Standard TREC query sets: –Short (the title field): e.g., “Iraq foreign debt reduction” –Verbose (the description field): e.g., “Identify any efforts, proposed or undertaken, by world governments to seek reduction of Iraq's foreign debt ” 2-fold cross validation for parameter tuning 20

BM25+ improves over BM25 significantly 21 BM25+ performs better on Web data than on News data Web News Superscripts 1/2/3/4 indicating significance at the 0.05/0.02/0.01/0.001 level δ = 1.0 works well, confirming constraint analysis that BM25+ performs better on verbose queries ? Short Verbose σ = 2.31σ = 2.63 σ = 1.19

BM25 overly penalizes long documents more seriously for verbose queries 22 The “condition” that BM25 violates LB2 is (monotonically decreasing with b & k 1 ) The optimal settings of b & k 1 are larger for verbose queries

The improvement indeed comes from alleviating the problem of overly-penalizing long docs 23 BM25+ (verbose) BM25+ (short) BM25 (short) BM25 (verbose)

DIR+ improves over DIR significantly 24 Fixing δ = 0.05 works very well DIR+ performs better on verbose than on short queries Superscripts 1/2/3/4 indicating significance at the 0.05/0.02/0.01/0.001 level Short Verbose ? DIR can only satisfy LB1 if Optimal µ settings

PL2+ improves over PL2 significantly 25 Fixing δ = 0.8 works very well PL2+ performs better on verbose than on short queries Superscripts 1/2/3/4 indicating significance at the 0.05/0.02/0.01/0.001 level Short Verbose Optimal settings of c: the smaller, the more dangerous

PIV+ works as we expected 26 PIV+ does not consistently outperform PIV, as we expected Superscripts 1 indicating significance at the 0.05 level PIV can satisfy LB2 if It’s fine, as the optimal settings of s are very small

27 1. Why does it seem to be so hard to beat these state-of-the-art retrieval models {BM25, PIV, DIR, PL2 …}? 2. Are they hitting the ceiling? We weren’t able to figure out their deficiency analytically. No, they haven’t hit the ceiling yet!

Conclusions Reveal a common deficiency of current retrieval models Propose two novel formal constraints Show that current retrieval models do not satisfy both constraints, and that retrieval performance tends to be poor if either constraint is violated Develop a general and efficient solution, which has been shown analytically to fix/alleviate the problem of current retrieval models Demonstrate the effectiveness of the proposed algorithms across different collections for different types of queries 28

Our models {BM25+, DIR+, PL2+} can potentially replace current state-of-the-art retrieval models {BM25, DIR, PL2} 29 BM25: BM25+:

Future work This work has demonstrated the power of doing axiomatic analysis to fix deficiencies of retrieval models. Are there any other deficiencies of current retrieval models? If so, can we solve them with axiomatic analysis? Can we go beyond bag of words with constraint analysis? Can we find a comprehensive set of constraints that are sufficient for deriving a unique (optimal) retrieval function 30

Thanks! 31

Sensitivity of δ in BM25+ 32

Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.

Similar presentations

Presentation on theme: "Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper.

Similar presentations

Presentation on theme: "Lower-Bounding Term Frequency Normalization Yuanhua Lv and ChengXiang Zhai University of Illinois at Urbana-Champaign CIKM 2011 Best Student Award Paper."— Presentation transcript:

Similar presentations

About project

Feedback