Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland.

Similar presentations


Presentation on theme: "Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland."— Presentation transcript:

1 Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland

2 In this part 4 Probability Ranking Principle –simple case –case with retrieval costs 4 Binary Independence Retrieval (BIR) –Estimating the probabilities 4 Binary Independence Indexing (BII) –dual to BIR

3 The Basics 4 Bayesian probability formulas 4 Odds:

4 The Basics Document Relevance: Note:

5 Probability Ranking Principle 4 Simple case: no selection costs. 4 x is relevant iff p(R|x) > p(NR|x) 4 (Bayes’ Decision Rule) 4 PRP in action: Rank all documents by p(R|x).

6 Probability Ranking Principle 4 More complex case: retrieval costs. – C - cost of retrieval of relevant document –C’ - cost of retrieval of non-relevant document –let d, be a document 4 Probability Ranking Principle: if for all d’ not yet retrieved, then d is the next document to be retrieved

7 Next: Binary Independence Model

8 Binary Independence Model 4 Traditionally used in conjunction with PRP 4 “Binary” = Boolean: documents are represented as binary vectors of terms: – – iff term i is present in document x. 4 “Independence”: terms occur in documents independently 4 Different documents can be modeled as same vector.

9 Binary Independence Model 4 Queries: binary vectors of terms 4 Given query q, –for each document d need to compute p(R|q,d). –replace with computing p(R|q,x) where x is vector representing d 4 Interested only in ranking 4 Will use odds:

10 Binary Independence Model Using Independence Assumption: Constant for each query Needs estimation So :

11 Binary Independence Model Since x i is either 0 or 1: Let Assume, for all terms not occuring in the query (q i =0) Then...

12 All matching terms Non-matching query terms Binary Independence Model All matching terms All query terms

13 Binary Independence Model Constant for each query Only quantity to be estimated for rankings Retrieval Status Value:

14 Binary Independence Model All boils down to computing RSV. So, how do we compute c i ’s from our data ?

15 Binary Independence Model Estimating RSV coefficients. For each term i look at the following table: Estimates: Add 0.5 to every expression

16 PRP and BIR: The lessons 4 Getting reasonable approximations of probabilities is possible. 4 Simple methods work only with restrictive assumptions: –term independence –terms not in query do not affect the outcome –boolean representation of documents/queries –document relevance values are independent 4 Some of these assumptions can be removed

17 Next: Binary Independence Indexing

18 Binary Independence Indexing vs. Binary Independence Retrieval BIRBIR BIIBII 4 Many Documents, One Query 4 Bayesian Probability: 4 Varies: document representation 4 Constant: query (representation) 4 One Document, Many Queries 4 Bayesian Probability 4 Varies: query 4 Constant: document

19 Binary Independence Indexing 4 “Learnng” from queries –More queries: better results 4 p(q|x,R) - probability that if document x had been deemed relevant, query q had been asked 4 The rest of the framework is similar to BIR

20 Binary Independence Indexing: Key Assumptions 4 Term occurrence in queries is conditionally independent: depends only 4 Relevance of document representation x w.r.t. query q depends only on the terms present in the query (q i =1) 4 For each term i not used in representation x of document d (x i =0): –only positive occurrences of terms count

21 Binary Independence Indexing constant Equal to 1


Download ppt "Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland."

Similar presentations


Ads by Google