Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.

Similar presentations


Presentation on theme: "Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as."— Presentation transcript:

1 Modern Information Retrieval Chapter 2 Modeling

2 Probabilistic model the appearance or absent of an index term in a document is interpreted either as evidence that the document is relevant or that it is irrelevant to a query  establish a weight for each term

3 a collection of N documents  R of which are relevant R t of which contain term t  f t of which contain t  these values can be obtained from a training set with relevance judgments

4

5 computing probabilities  P r [relevant t]=R t f t  P r [irrelevant t]=(f t -R t ) f t  P r [relevant t ]=(R-R t )/(N-f t )  P r [irrelevant t ]=(N-f t -(R-R t ))/(N-f t )

6 computing weight W t for t W t = P r [relevant t]  P r [irrelevant t ] P r [irrelevant t] P r [relevant t ] = R t /f t  (N-f t -(R-R t ))/(N-f t ) (f t -R t )/f t (R-R t )/(N-f t ) = R t /(R-R t ) (f t -R t )/(N-f t -(R-R t ))

7  W t >1 indicates that the appearance of t supports the document is relevant  W t <1 indicates that the appearance of t suggests the document is irrelevant  N=20, R=13, R t =11, f t =12  W t =33  N=20, R=13, R t =4, f t =7  W t =0.59  W t =1 indicates that t is neutral

8  negative weight indicates that the document is predicted to be irrelevant  zero weight indicates that the document is neutral

9 Comparison the Boolean model is the weakest model  no partial matching the vector model and probabilistic model are comparative while the vector model is more popular  term frequency is not considered in the probabilistic model


Download ppt "Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as."

Similar presentations


Ads by Google