Information Retrieval Models: Probabilistic Models

Name: Information Retrieval Models: Probabilistic Models
Uploaded: 2017-10-08T00:10:39+00:00
Duration: PTM10S21
Channel: Marybeth Blankenship
Description: Information Retrieval Models: Probabilistic Models

Information Retrieval Models: Probabilistic Models
ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

Modeling Relevance: Raodmap for Retrieval Models
Relevance constraints [Fang et al. 04] Relevance (Rep(q), Rep(d)) Similarity P(r=1|q,d) r {0,1} Probability of Relevance P(d q) or P(q d) Probabilistic inference Div. from Randomness (Amati & Rijsbergen 02) Generative Model Regression Model (Fuhr 89) Prob. concept space model (Wong & Yao, 95) Different inference system Inference network model (Turtle & Croft, 91) Different rep & similarity Vector space model (Salton et al., 75) Prob. distr. (Wong & Yao, 89) … Doc generation Query Learn. To Rank (Joachims 02, Berges et al. 05) Classical prob. Model (Robertson & Sparck Jones, 76) LM approach (Ponte & Croft, 98) (Lafferty & Zhai, 01a)

What is the probability that THIS document is relevant to THIS query?
The Basic Question What is the probability that THIS document is relevant to THIS query? Formally… 3 random variables: query Q, document D, relevance R {0,1} Given a particular query q, a particular document d, p(R=1|Q=q,D=d)=?

Probability of Relevance
Three random variables Query Q Document D Relevance R  {0,1} Goal: rank D based on P(R=1|Q,D) Evaluate P(R=1|Q,D) Actually, only need to compare P(R=1|Q,D1) with P(R=1|Q,D2), I.e., rank documents Several different ways to refine P(R=1|Q,D)

Probabilistic Retrieval Models: Intuitions
Suppose we have a large number of relevance judgments (e.g., clickthroughs: “1”=clicked; “0”= skipped) We can score documents based on P(R=1|Q1, D1)=1/2 P(R=1|Q1,D2)=2/2 P(R=1|Q1,D3)=0/2 … Query(Q) Doc (D) Rel (R) ? Q D Q D Q D Q D Q D … Q D Q D Q D Q D Q D What if we don’t have (sufficient) search log? We can approximate p(R=1|Q,D)! Different assumptions lead to different models

Refining P(R=1|Q,D) Method 1: conditional models
Basic idea: relevance depends on how well a query matches a document Define features on Q x D, e.g., #matched terms, # the highest IDF of a matched term, #doclen, or even score(Q,D) given any other retrieval function,… P(R=1|Q,D)=g(f1(Q,D), f2(Q,D),…,fn(Q,D), ) Using training data (known relevance judgments) to estimate parameter  Apply the model to rank new documents Will be covered more later in learning to rank

Document Generation Model of relevant docs for Q
Model of non-relevant docs for Q Assume independent attributes A1…Ak ….(why?) Let D=d1…dk, where dk {0,1} is the value of attribute Ak (Similarly Q=q1…qk )

Robertson-Sparck Jones Model (Robertson & Sparck Jones 76)
(RSJ model) Two parameters for each term Ai: pi = P(Ai=1|Q,R=1): prob. that term Ai occurs in a relevant doc qi = P(Ai=1|Q,R=0): prob. that term Ai occurs in a non-relevant doc How to estimate parameters? Suppose we have relevance judgments, “+0.5” and “+1” can be justified by Bayesian estimation

RSJ Model: No Relevance Info (Croft & Harper 79)
How to estimate parameters? Suppose we do not have relevance judgments, - We will assume pi to be a constant - Estimate qi by assuming all documents to be non-relevant N: # documents in collection ni: # documents in which term Ai occurs

RSJ Model: Summary The most important classic prob. IR model
Use only term presence/absence, thus also referred to as Binary Independence Model Essentially Naïve Bayes for doc ranking Most natural for relevance/pseudo feedback When without relevance judgments, the model parameters must be estimated in an ad hoc way Performance isn’t as good as tuned VS model

Improving RSJ: Adding TF
Basic doc. generation model: Let D=d1…dk, where dk is the frequency count of term Ak 2-Poisson mixture model Many more parameters to estimate! (how many exactly?)

BM25/Okapi Approximation (Robertson et al. 94)
Idea: Approximate p(R=1|Q,D) with a simpler function that share similar properties Observations: log O(R=1|Q,D) is a sum of term weights Wi Wi= 0, if TFi=0 Wi increases monotonically with Tfi Wi has an asymptotic limit The simple function is

Adding Doc. Length & Query TF
Incorporating doc length Motivation: The 2-Poisson model assumes equal document length Implementation: “Carefully” penalize long doc Incorporating query TF Motivation: Appears to be not well-justified Implementation: A similar TF transformation The final formula is called BM25, achieving top TREC performance

The BM25 Formula “Okapi TF/BM25 TF”

Extensions of “Doc Generation” Models
Capture term dependence (Rijsbergen & Harper 78) Alternative ways to incorporate TF (Croft 83, Kalt96) Feature/term selection for feedback (Okapi’s TREC reports) Estimate of the relevance model based on pseudo feedback, to be covered later [Lavrenko & Croft 01]

What You Should Know Basic idea of probabilistic retrieval models
How to use Bayes Rule to derive a general document-generation retrieval model How to derive the RSJ retrieval model (i.e., binary independence model) Assumptions that have to be made in order to derive the RSJ model The two-Poisson model and the heuristics introduced to derive BM25

Information Retrieval Models: Probabilistic Models

Similar presentations

Presentation on theme: "Information Retrieval Models: Probabilistic Models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Information Retrieval Models: Probabilistic Models

Similar presentations

Presentation on theme: "Information Retrieval Models: Probabilistic Models"— Presentation transcript:

Similar presentations

About project

Feedback