Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.

Similar presentations


Presentation on theme: "A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie."— Presentation transcript:

1 A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie Mellon University

2 Research Questions General: What role is smoothing playing in the language modeling approach? Specific: – Is the good performance due to smoothing? – How sensitive is retrieval performance to smoothing? – Which smoothing method is the best? – How do we set smoothing parameters?

3 Outline A General Smoothing Scheme and TF-IDF weighting Three Smoothing Methods Experiments and Results

4 Retrieval as Language Model Estimation Document ranking based on query likelihood (Ponte & Croft 98, Miller et al. 99, Berger & Lafferty 99, Hiemstra 2000, etc.) Retrieval problem  Estimation of p(w i |d) Document language model

5 Why Smoothing? Zero probability – If w does not occur in d, then p(w|d) =0, and any query with word w will have a zero probability. Estimation inaccuracy – A document is a very small sample of words, and the maximum likelihood estimate will be inaccurate.

6 Language Model Smoothing (Illustration) P(w) w Max. Likelihood Estimate Smoothed LM (linear interpolation)

7 A General Smoothing Scheme All smoothing methods try to – discount the probability of words seen in a document – re-allocate the extra probability so that unseen words will have a non-zero probability Most use a reference model (collection language model) to discriminate unseen words Discounted ML estimate Collection language model

8 Smoothing & TF-IDF Weighting Plug in the general smoothing scheme to the query likelihood retrieval formula, we obtain Ignore for ranking IDF weighting TF weighting Doc length normalization (long doc is expected to have a smaller  d ) Smoothing with p(w|C)  TF-IDF + length norm.

9 Three Smoothing Methods Simplified Jelinek-Mercer: Shrink uniformly toward p(w|C) Dirichlet prior (Bayesian): Assume pseudo counts  p(w|C) Absolute discounting: Subtract a constant 

10 Experiments FBIS LA FT TREC8 Small WEB (~2GB) Disk4 & 5 -CR (~2GB) Collections Queries TREC 351 – 400 (Title + Long) TREC 401 – 450 (Title + Long) 18 combinations

11 Results Performance is sensitive to smoothing Type of queries makes a difference! – More smoothing is needed for long queries than title queries – Precision is more sensitive to smoothing for long queries – Dirichlet prior is the best for title queries – Jelinek-Mercer is most sensitive to the length/type of queries

12 Figure Explanation Optimal parameter settings Optimal range Smoothing parameter (e.g.,, , or  ) Avg. precision 1.0 More smoothing

13 Title queries vs. Long queries (Jelinek-Mercer on FBIS, FT, and LA) Optimal Title query Optimal Long query

14 Per-query Optimal range of (JM on Trec8) Title queries Long queries wide range flat curve less sensitive narrow range peaked curve more sensitive

15 Dirichlet Prior Absolute Discounting More on Precision Sensitivity Small DB Large DB optimal more smoothing flatter

16 Comparison of Three Methods

17 A Possible Explanation of Observations The Dual Role of Smoothing – Estimation role: Accurate estimation of p(w|d) – Query modeling role: Generation of common/non- informative words in query Title queries have no (few) non-informative words, so – Performance is affected primarily by the estimation role of smoothing only – They need less smoothing

18 A Possible Explanation (cont.) Long queries have more non-informative words, so – Performance is affected by both roles of smoothing – They need more smoothing (extra smoothing is for query modeling) Dirichlet is best for title queries, because it is good for playing the estimation role JM performs not so well on title queries, but much better on long queries, because it is good for playing the query modeling role, but not so good for the estimation role.

19 The Lemur Toolkit Language Modeling and Information Retrieval Toolkit Under development at CMU and Umass All experiments reported here were run using Lemur http://www.cs.cmu.edu/~lemur Contact us if you are interested in using it

20 Conclusions and Future Work Smoothing  TF-IDF + doc length normalization Retrieval performance is sensitive to smoothing Sensitivity depends on query type – More sensitive for long queries than for title queries – More smoothing is needed for long queries All three methods can perform well when optimized – Dirichlet prior is especially good for title queries – Both Dirichlet prior and JM are good for long queries – Absolute discounting has a relatively stable optimal setting

21 Conclusions and Future Work (cont.) Smoothing plays two different roles – Better estimation of p(w|d) – Generation of common/non-informative words in query Future work – More evaluation (types of queries, smoothing methods) – De-couple the dual role of smoothing (e.g., two-stage smoothing strategy) – Train query-specific smoothing parameters with past relevance judgments and other data (e.g., position selection translation model)

22 The End Thank you!

23 Dirichlet Prior is good for title queries JM gains most from long queries Dirichlet is the best JM gains most

24 Avg Pr. Pr@10doc and Pr@20doc

25 JM is most sensitive to query length

26 Backoff version of all three methods Adopt the same discounting of p ml (w|d) Let the collection model p(w|C) affect ONLY the unseen words, NOT the seen words

27 Recall Sensitivity Pattern is Similar Small DB Large DB Jelinek-MercerDirichlet PriorAbsolute Discounting

28 Interpolation vs. Backoff (JM on FBIS) interpolation backoff

29 Dirichlet is the best for title queries

30 Interpolation vs. Backoff (Dir. & A.D. on FBIS) Dirichlet PriorAbsolute Discounting

31 Precision Increase & Pattern Shift on Long Queries (  =0.3) long query (2-stage smoothing) long query (single smoothing ) title query title long single long 2-stage

32 A 2-stage Smoothing Strategy De-couple the two roles – Let Dirichlet play the estimation role – Let Jelinek-Mercer play the query modeling role p ml (w|d) Dirichlet prior  Doc background p(w|C) w (query word) Jelinek-Mercer Qry background p(w|Q)

33 Effect of 2-stage Smoothing Improve performance on long queries Reveal a consistent sensitivity pattern of precision to Dirichlet prior  Achieve better precision with more meaningful parameter settings

34 Better Precision & More Meaningful Parameter Setting

35 Title queries vs. Long queries (Jelinek-Mercer on TREC7, TREC8, and WEB) large optimal more smoothing flatter less sensitive


Download ppt "A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie."

Similar presentations


Ads by Google