Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Similar presentations


Presentation on theme: "A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee."— Presentation transcript:

1 A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee

2 2 outline Introduction Model –Overview –Variants –Potential Functions –Training Experimental Results Conclusions

3 3 Introduction There is rich history of statistical models for information retrieval, including the binary independence model (BIM), language modeling, inference network model, and so on. It is well known that dependencies exist between terms in a collection of text. For example, with a SIGIR proceeding, occurrences of certain pairs of terms are correlated, such as information and retrieval.

4 4 Introduction Unfortunately, estimating statistical models for general term dependencies is infeasible, due to data sparsity. For this reason, most retrieval models assume some form of independence exists between terms. Most work on modeling term dependencies in the past has focused on phrases/proximity or term co-occurrences. Most of these models only consider dependencies between pairs of terms. Several recent studies have examined term dependence models for the language modeling framework.

5 5 Model Markov random fields (MRF), also called undirected graphical models, are commonly used in the statistical machine learning domain to succinctly model joint distributions. We use MRFs to model the joint distribution over queries Q and documents D, parameterized by Λ.

6 6 Model A markov random field is constructed from a graph G. The nodes in the graph represent random variables, and the edges define the independence semantics between the random variables. In this model, we assume G consists of query nodes and a document node D, such as the graphs in the figure. : the set of cliques in G : a non-negative potential function over clique configurations parameterized by Λ :normalizes the distribution

7 7 Model For ranking purposes we compute the posterior: As noted above, all potential functions must be non- negative, and are must commonly parameterized as: : real-valued feature function over clique values : the weight given to that particular feature function

8 8 Model Substituting this back into ranking function, we end up with the following ranking function To utilize the model, the following steps must be taken for each query Q : –Construct a graph representing the query term dependencies to model –Define a set of potential functions over the cliques of this graph –Rank documents in descending order of

9 9 Model We now describe and analyze three variants of the MRF model, each with different underlying dependence assumptions. –Full independence (FI) –Sequential dependence (SD) –Full dependence (FD)

10 10 Model The full independence variant makes the assumption that query terms are independent given some document D. The likelihood of query term occurring is not affected by the occurrence of any other query term, or more succinctly,. The sequential dependence variant assumes a dependence between neighboring query terms. Formally, this assumption states that only for nodes that are not adjacent to.

11 11 Model The full dependence variant, all query terms are in some way dependent on each other. Graphically, a query of length n translates into the complete graph, which includes edges from each query node to the document node D.

12 12 Model The potential functions φ play a very important role in how accurate our approximation of the true joint distribution is. For example : Consider a document D on the topic of information retrieval. Using the sequential dependence variant, we would expect, as the term information and retrieval are much more “compatible” with the topicality of document D than the terms information and assurance.

13 13 Model Since documents are ranked by Equation 1, it is also important that the potential functions can be computed efficiently. Based on these criteria and previous research on phases and term dependence, we focus on three types of potential functions. These potential functions are attempt to abstract the idea of term co-occurrence.

14 14 Model Since potentials are over cliques in the graph, we now proceed to enumerate all of the possible ways graph cliques are formed in our model and how potential functions are defined for each. The simplest type of clique that can appear in our graph is a 2-clique consisting of an edge between a query term and the document D.

15 15 Model In keeping with simple to compute measures, we define this potential as: : a smoothed language modeling estimate : the number of the terms w occurs in document D : the number of times term w occurs in the entire collection : total number of terms in the document D : the length of the collection

16 16 Model Next, we consider cliques that contain two or more query terms. For example: In the query train station security measures, if any of the sub-phrases, train station, train station security, station security measures, or security measures appear in a document then there is strong evidence in favor of relevance.

17 17 Model Therefore, for every clique that contains a contiguous set of two or more terms and the document node D, we apply the following “ordered” potential function: : the number of times term ω occurs in the entire collection : total number of terms in the document D : the length of the collection : the number of the times the exact phrase occurs in document D

18 18 Model Although the occurrence of contiguous sets of query terms provide strong evidence of relevance, it is also the case that the occurrence of non-contiguous sets of query terms can provide valuable evidence. In the previous example, documents containing the terms train and security within some short proximity of one another also provide additional evidence towards relevance.

19 19 Model For our purposes, we construct an “unordered” potential function over cliques that consist of sets of two or more query terms and the document node D. Such potential functions have the following from: : the number of the times the terms appear ordered or unordered with a window N terms. : the number of times term ω occurs in the entire collection : total number of terms in the document D : the length of the collection

20 20 Model Using these potential functions, we derive the following specific ranking function:

21 21 Experimental Results We make use of the Associated Press and Wall Street Journal sub-collections of TREC, which are small homogeneous collections, and two web collections, WT10g and GOV2, which are considerably larger and less homogeneous.

22 22 Experimental Results Full independence

23 23 Experimental Results Sequential dependence

24 24 Experimental Results Full dependence

25 25 Conclusions In this paper, we develop as general term dependence model that can make use of arbitrary text feature. Three variants of the model are described, where each capture different dependencies between query terms.

26 26 Markov Random Fields Let be random variables taking values in some finite set S, and let be a finite graph such that, whose elements will sometime be called sites. For a set, let define its neighbor (or boundary) set: all elements in that have a neighbor in A. For, let. The random variables are said to define a Markov random field if, for any vector :

27 27 Potentials A potential is a function indexed by subsets of N on the space. We will write potentials as for,. Given a full set of potentials, the energy of a configuration w will be defined as: Using the energy, we can define a probability measure, P, from a set of potentials by:


Download ppt "A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee."

Similar presentations


Ads by Google