Language Modeling Approaches for Information Retrieval Rong Jin.

Language Modeling Approaches for Information Retrieval Rong Jin

A Probabilistic Framework for Information Retrieval d1…d1000q: ‘bush Kerry’ ??? Estimating some statistics  for each document Estimating likelihood p(q|  )

A Probabilistic Framework for Information Retrieval  Three fundamental questions What statistics  should be chosen to describe the characteristics of documents ? How to estimate this statistics ? How to compute the likelihood of generating queries given the statistics  ?

Unigram Language Model  Probabilities for single word p(w)  ={p(w) for any word w in vocabulary V}  Estimate an unigram language model Simple counting Given a document d, count term frequency c(w,d) for each word w. Then, p(w) = c(w,d)/|d|

Statistical Inference  C1: h, h, h, h, t, h  bias b1 = 5/6  C2: t, t, h, t, h, h  bias b2 = 1/2  C3: t, h, t, t, t, h  bias b3 = 1/3  Why counting provide a good estimate of coin bias?

Maximum Likelihood Estimation (MLE)  Observation o={o 1, o 2,…, o n }  Maximum likelihood estimation E.g.: o={h, h, h, t, h,h}  Pr(o|b) = b 5 (1-b) 

Unigram Language Model  Observation: d={tf 1, tf 2,…, tf n }  Unigram language model  ={p(w 1 ), p(w 2 ),…, p(w n )}  Maximum likelihood estimation

Maximum A Posterior Estimation  Consider a special case: we only toss each coin twice  C1: h, t  b1=1/2  C2: h, h  b2=1  C3: t, t  b3 = 0 ? MLE estimation is poor when the number of observations is small. This is called “sparse data” problem !

Solution to Sparse Data Problems  Shrinkage  Maximum a posterior (MAP) estimation  Bayesian approach

Shrinkage: Jelinek Mercer Smoothing  Linearly interpolate between document language model and the collection language model Estimation based on individual document Estimation based on the corpus 0 < < 1: is a smoothing parameter

Smoothing & TF-IDF Weighting Are they totally irrelevant ?

Smoothing & TF-IDF Weighting Similar to TF.IDF weighting irrelevant to documents

Maximum A Posterior Estimation  Introduce a prior on b Most of coins are more or less unbiased A Dirichlet prior on b

Maximum A Posterior Estimation  Observation o={o 1, o 2,…, o n }  Maximum A Posterior Estimation

Maximum A Posterior Estimation  Observation o={o 1, o 2,…, o n }  Maximum A Posterior Estimation Pseudo counts (or pseudo experiments)

Dirichlet Prior  Given a distribution  A Dirichlet distribution for p is defined as  i are called hyper- parameters

Dirichlet Prior  Example:  Full Dirichlet distribution:   (x) is gamma function

Dirichlet Prior  Dirichlet is a distribution of distribution  The prior knowledge about distribution p is encode in hyper-parameters  The maximum point of Dirichlet distribution is at p i = (  i -1)/(  1 +  2 +…+  n -n)  p i   i and  i =c p i +1,  Example: Prior knowledge: most coins are fair  b=1-b=1/2  1 =  2 = c 

Unigram Language Model  Simple counting  zero probabilities  Introduce Dirichlet priors to smooth the language model  How to construct the Dirichlet prior?

Dirichlet Prior for Unigram LM  Prior for what distribution?  d ={p(w 1 |d), p(w 2 |d),…, p(w n |d)} How to determine the appropriate value for the hyper-parameters  i

Determine Hyper-parameters  The most likely determined language model by Dirichlet distribution is p(w i |  d )   i  What is most likely p(w i |  d ) without looking into the content of the document d?

Determine Hyper-parameters  The most likely p(w i |  d ) without looking into the content of the document d is the unigram probability of the collection:  c ={p(w 1 |c), p(w 2 |c),…, p(w n |c)}  So what is appropriate value for  i

Determine Hyper-parameters  The most likely p(w i |  d ) without looking into the content of the document d is the unigram probability of the collection:  c ={p(w 1 |c), p(w 2 |c),…, p(w n |c)}  So what is appropriate value for  i ?

Dirichlet Prior for Unigram LM  MAP estimation for best unigram language model  Solution:

Dirichlet Prior for Unigram LM  MAP estimation for best unigram language model  Solution: Pseudo term frequency

Dirichlet Prior for Unigram LM  MAP estimation for best unigram language model  Solution: Pseudo document length

Dirichlet Smoothed Unigram LM  What does p(w|d) looks like if s is small?  What does p(w|d) looks like if s is large?

Dirichlet Smoothed Unigram LM No longer zero probabilities

Dirichlet Smoothed Unigram LM  Step 1: compute the collection based unigram language model by simple counting  Step 2: for each document d k, compute its smoothed unigram language model as

Dirichlet Smoothed Unigram LM  For a given query q={tf 1 (q), tf 2 (q),…, tf n (q)} For each document d, compute likelihood The larger the likelihood, the more relevant the document is to the query

Smoothing & TF-IDF Weighting Are they totally irrelevant ?

Smoothing & TF-IDF Weighting

Document normalization

Smoothing & TF-IDF Weighting TF.IDF

Shrinkage vs. Dirichlet Smoothing  Linearly interpolate between document language model and the collection language model JM Smoothing Dirichlet Smoothing Linear weight is a constant for JM smoothing It is document dependent for Dirichlet smoothing

Current Probabilistic Framework for Information Retrieval d1…d1000q: ‘bush Kerry’ ??? Estimating some statistics  for each document Estimating likelihood p(q|  )

Current Probabilistic Framework for Information Retrieval d1…d1000q: ‘bush Kerry’ 11  1000 22 Estimating some statistics  for each document Estimating likelihood p(q|  )

Current Probabilistic Framework for Information Retrieval q: ‘bush Kerry’ 11  1000 22 Estimating likelihood p(q|  )

Bayesian Approach d1…d1000q: ‘bush Kerry’ 11  1000 22 Estimating some statistics  for each document Estimating likelihood p(q|  ) 11 11 We need to consider the uncertainty in model inference

Bayesian Approach d 11 22 nn … p(d|  i ) q p(q|  i )

Bayesian Approach d 11 22 nn … p(d|  i ) q p(q|  i ) Assume that p(d) and p(  i ) follow uniform distributions

Language Modeling Approaches for Information Retrieval Rong Jin.

Similar presentations

Presentation on theme: "Language Modeling Approaches for Information Retrieval Rong Jin."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Language Modeling Approaches for Information Retrieval Rong Jin.

Similar presentations

Presentation on theme: "Language Modeling Approaches for Information Retrieval Rong Jin."— Presentation transcript:

Similar presentations

About project

Feedback