# An Overview of Bayesian Network-based Retrieval Models Juan Manuel Fernández Luna Departamento de Informática Universidad de Jaén Department.

## Presentation on theme: "An Overview of Bayesian Network-based Retrieval Models Juan Manuel Fernández Luna Departamento de Informática Universidad de Jaén Department."— Presentation transcript:

An Overview of Bayesian Network-based Retrieval Models Juan Manuel Fernández Luna Departamento de Informática Universidad de Jaén jmfluna@ujaen.es Department of Computing Science, University of Glasgow October, 21 th - 2002

Bayesian Network- based Retrieval Models 2 Layout Introduction Introduction to Belief Networks Bayesian Network-based IR Models Inference Network Model Belief Network Model Bayesian Network Retrieval Model Relevance Feedback Other applications Bibliography

Bayesian Network- based Retrieval Models 3 Introduction 1. Query and document characterizations are incomplete. 2. The query is a vague description of the users´ information need. 3. Computing relevance degree: 1 and 2 + A) different representations that a concept may have, B) these concepts are not independent among them. Information Retrieval Uncertain process

Bayesian Network- based Retrieval Models 4 Introduction Probabilistic models tried to overcome these problems… Researchers focused their attention on Belief networks in order to apply them to IR because: They show a high performance in actual problems characterised by uncertainty.

Bayesian Network- based Retrieval Models 5 Introduction to Belief Networks Graphical models able to represent and efficiently manipulate n-dimensional probability distributions. The knowledge obtained from a problem is encoded in a Belief network by means of the quantitative and qualitative componets:

Bayesian Network- based Retrieval Models 6 Introduction to Belief Networks Qualitative part: Directed Acyclic Graph. G=(V,E): 1.V (Nodes) Random variables, and 2.E (Arcs) (In)dependence relationships.

Bayesian Network- based Retrieval Models 7 Introduction to Belief Networks Quantitative part A set of conditional distributions: 1.Drawn from the graph structure, 2.representing the strength of the relationships, 3.stored in each node. Belief Network Bayesian Network (Conditional probability distributions)

Bayesian Network- based Retrieval Models 8 Introduction to Belief Networks

Bayesian Network- based Retrieval Models 9 Introduction to Belief Networks Taking into account these (in)dependences, the joint probability distribution could be restored from the network: Pa(X i ) being the set of parents of the variable X i. This previous expression implies an important saving in the storage space.

Bayesian Network- based Retrieval Models 10 Introduction to Belief Networks Construction: –Manual, using an expert´s knowledge. –Automatic, by means of a learning algorithm. Inference: Given a set of evidences, E, to obtain the probability with which a variable can take a certain value. p(S=T | W=T)=0.430, p(R=T| W=T)= 0.708

Bayesian Network- based Retrieval Models 11 Bayesian Network-based IR Models Inference Network Model Belief Network Model Peter Bruza´s Index Belief Expressions Maria Indrawan et al.´s Model Bayesian Network Retrieval Model

Bayesian Network- based Retrieval Models 12 d1d1 d2d2 d j-1 djdj q1q1 q2q2 inn r1r1 r2r2 r3r3 rmrm Link Matrices Inference: Instantiating each document, dj, and computing p(inn | dj). Inference Network Model

Bayesian Network- based Retrieval Models 13 Belief Network Model d1d1 d2d2 d j-1 djdj Q t1t1 t2t2 t3t3 tmtm 2 M assigments unfeasible Probabilities are defined in such a way that only one configuration is evaluated

Bayesian Network- based Retrieval Models 14 Bayesian Network Retrieval Model 1.There are strong relationships among a document and the terms that index it. 2.Document relationships are only present by means of the terms that index them. 3.Documents are conditional independent given the terms by which they were indexed. Guidelines to build the BNR Model:

Bayesian Network- based Retrieval Models 15 Bayesian Network Retrieval Model Term Subnetwork Document Subnetwork T i {¬t i, t i } D j {¬d j, d j }

Bayesian Network- based Retrieval Models 16 Bayesian Network Retrieval Model All the terms are independent among them: Simple Bayesian Network Retrieval Model T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 Term Subnetwork D1D1 D2D2 D3D3 D4D4 Document Subnetwork

Bayesian Network- based Retrieval Models 17 Bayesian Network Retrieval Model Probability Distributions: Term nodes: p(t j )=1/M, p(¬t j )=1-p(t j ) Document nodes: p(D j | Pa(D j )), D j But... If a document has been indexed by 30 terms, we need to estimate and store 2 30 probabilities. Problem!!!!

Bayesian Network- based Retrieval Models 18 Bayesian Network Retrieval Model Solution: Probability functions pa(D j ) being a configuration of the parents of D j.

Bayesian Network- based Retrieval Models 19 Bayesian Network Retrieval Model Retrieval: 1.Instantiate T Q Q to Relevant. 2.Run a propagation algorithm in the network. 3.Rank the documents according p(d j | Q), D j Problem: Great amount of nodes and existing cycles in the graph General purpose propagation algorithms can´t be applied due to efficiency considerations.

Bayesian Network- based Retrieval Models 20 Bayesian Network Retrieval Model Solution: Taking advantage of: The kind of probability function used, and The topology. Propagation is substituted by Evaluation of the probability function in each document node

Bayesian Network- based Retrieval Models 21 Bayesian Network Retrieval Model Result: An efficient and exact propagation. Including Query term frequencies:

Bayesian Network- based Retrieval Models 22 Bayesian Network Retrieval Model Removing the term independency restricction: We are interested in representing the main relationships among terms in the collection. Term subnetwork Polytree Why? There is a set of efficient learning and propagation algorithms available for this topology.

Bayesian Network- based Retrieval Models 23 Bayesian Network Retrieval Model T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 Term Subnetwork D1D1 D2D2 D3D3 D4D4 Document Subnetwork

Bayesian Network- based Retrieval Models 24 Bayesian Network Retrieval Model Probability distributions: Marginal Distributions (root term nodes): (M being the number of terms in the collection)

Bayesian Network- based Retrieval Models 25 Bayesian Network Retrieval Model Conditional Distributions (document nodes): Probability functions Conditional Distributions (term nodes with parents): (based on Jaccard´s coefficient)

Bayesian Network- based Retrieval Models 26 Bayesian Network Retrieval Model Retrieval: Tq Q Relevant p(d j |Q)?? But... Due to the complexity of the whole network we can not run an exact propagation algorithm. Solution: PROPAGATION + EVALUATION

Bayesian Network- based Retrieval Models 27 Bayesian Network Retrieval Model Propagation: Running the exact Pearl´s propagation algorithm in the polytree (term subnetwork), p(t i |Q), T i, are computed. Evaluation: Evaluation of a probability function in the Document Subnetwork, computing p(d j |Q), D j, incorporating p(t i |Q).

Bayesian Network- based Retrieval Models 28 Bayesian Network Retrieval Model Given a document, D j : 1.Compute p(d j |d i ), D i. 2.Select those documents with greatest probability of relevance with respect to D j. 3.Link D j with all these documents. Adding document relationships

Bayesian Network- based Retrieval Models 29 Bayesian Network Retrieval Model But... Instead of linking the documents in the document subnetwork... Term Subnetwork D2D2 D3D3 D4D4 D5D5 D6D6 D7D7 D1D1 D` 2 D` 3 D´ 4 D´ 5 D´ 6 D´ 7 D` 1

Bayesian Network- based Retrieval Models 30 Bayesian Network Retrieval Model 1. We don´t have to restimate probability distributions in the document nodes. 2. Propagation: Evaluation of a probability function in the second document layer Efficiency. Advantages of this topology:

Bayesian Network- based Retrieval Models 31 Bayesian Network Retrieval Model 1. Compute p(d j |Q), D j (1 st document layer) 2. Compute p(d´ j |Q), D´ j (2 nd document layer) Where S j is a normalising constant Retrieval?

Bayesian Network- based Retrieval Models 32 Bayesian Network Retrieval Model Reducing the propagation time in the Term Subnetwork: 1.Representing only the best relationships among terms. 2.Modifying Pearl´s propagation algorithm. 3.Changing the Term subnetwork topology.

Bayesian Network- based Retrieval Models 33 Bayesian Network Retrieval Model 1. Representing only the best term relationships Problems: Automatically learning the relationships among terms could imply that some relationships are not strong enough. Retrieval effectiveness could be damaged If the number of terms is very high, the learning stage could be time-consuming.

Bayesian Network- based Retrieval Models 34 Bayesian Network Retrieval Model Solution: Selection of best terms Collection Classification algorithm Non-selected terms Selected terms Polytree learning

Bayesian Network- based Retrieval Models 35 Bayesian Network Retrieval Model Advantages: -Reduction of learning time -Representation of the best relationships among terms -Faster propagation.

Bayesian Network- based Retrieval Models 36 Bayesian Network Retrieval Model Classification algorithm: K-means, with Euclidean distance Objects : Terms Attributes : Term discrimination value (tdv) Inverse Document Frequency (idf) Classes : Good terms: higher tdv, and medium-high idf. Rest of the terms.

Bayesian Network- based Retrieval Models 37 Bayesian Network Retrieval Model 2. Modifying Pearl´s algorithm. In large polytrees, the belief of a great number of terms, those furthest from query terms, will not be updated after propagating. So...Why is the propagation algorithm still running?

Bayesian Network- based Retrieval Models 38 Bayesian Network Retrieval Model Radial Propagation r=2

Bayesian Network- based Retrieval Models 39 Bayesian Network Retrieval Model Linear Propagation

Bayesian Network- based Retrieval Models 40 Bayesian Network Retrieval Model 3. Changing the Term Subnetwork topology. In certain cases, the polytree topology of the Term subnetwork, even using the term selection approach, could not be very appropriate. An alternative topology: Two term layers 1.Preserving accuracy of term relationships represented in the graph. 2.Providing an efficient inference mechanism.

Bayesian Network- based Retrieval Models 41 Bayesian Network Retrieval Model Document Subnetwork T´ 2 T´ 3 T´ 4 T´ 5 T´ 6 T´ 7 T´ 1 T2T2 T3T3 T4T4 T5T5 T6T6 T7T7 T1T1

Bayesian Network- based Retrieval Models 42 Bayesian Network Retrieval Model Relationships ara captured using the coocurrences among terms. The probability of relevance in the second term layer is computed by means of:

Bayesian Network- based Retrieval Models 43 Relevance Feedback in B.N. Models Inference and Belief Network Models: Modifying link matrices and adding new links (and also new document nodes in the second). Bayesian Network Model: Inclusion of new evidences from the inspection of the document ranking using partial evidences. (Advantage: neither graph structure modification nor probability matrix re-estimation).

Bayesian Network- based Retrieval Models 44 Other applications: Indexing Hypertext User profiling WWW Structured documents Image retrieval Document classification Filtering

Bayesian Network- based Retrieval Models 45 Bibliography Bruza, P. & van de Gaag, L.C. (1996). Index Expression Belief Network for Information Disclosure. International Journal of Expert Systems. 7(2), 107-138. de Campos, L.M.; Fernández-Luna, J.M. & Huete, J.F. (2000). Building Bayesian network-based information retrieval systems. Proc. of the 2 nd LUMIS Workshop. 543- 550. de Campos, L.M.; Fernández-Luna, J.M. & Huete, J.F. (2001). Relevance Feedback in the Bayesian Network Retrieval Model: An Approach Based on Term Instantiation. Lecture Notes in Computer Science. 2189. 13 – 23. de Campos, L.M.; Fernández-Luna, J.M. & Huete, J.F. (2001). Document Instantiation for relevance feedback in the Bayesian Network Retrieval model. Proceedings of the SIGIR01 Workshop on Mathematical and Formal Models in Information Retrieval. 10- 18 de Campos, L.M.; Fernández-Luna, J.M. & Huete, J.F. (2002). A layered Bayesian Network Model for Document Retrieval. Proceedings of the ECIR2002 Colloquium. Lecture notes in Computer Science, 2291, 169 – 182.

Bayesian Network- based Retrieval Models 46 Bibliography Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete. Reducing term to term relationships in an extended Bayesian network retrieval model. Proceedings of the Ninth International IPMU Conference (Information Processing and Mangement of Uncertainty in Knowledge-based Systems) Conference, Vol. 2, 1195-1202 (ISBN Vol. 2: 2- 9516453-2-5), 2002. ESIA – Université de Savoie (Editor). Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete. Two terms layer: An alternative topology for representing term relationships in the Bayesian Network Retrieval Model. Electronic Proceeding of the Seventh Online World Conference on Soft Computing in Industrial Applications (wsc7.ugr.es). Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete. Reducing Propagation Effort in Large Polytree: An application to Information Retrieval. To appear in Proceedings of the Workshop on Probabilistic and Graphical Models. Cuenca (SPAIN), 2002. Crestani, F., Lalmas, M., van Rijsbergen, C.J., Campbell, L. (1998). Is this Document Relevant?… Probably: A Survey of Probabilistic Models in Information Retrieval. Computing Survey. 30(4). 528-552.

Bayesian Network- based Retrieval Models 47 Bibliography Fernández-Luna, J.M. (2001). Modelos de Recuperación de Información basados en Redes de Creencia. Ph.D. Thesis (in Spanish). University of Granada. Frisse M. & Cousins, S.B. (1989). Information Retrieval from Hypertext: Update on the Dynamic Medical Handbook Project. Proceedings of the Hypertext89 Conference. 199- 212. Ghazfan, D., Indrawan, M. & Srinivasan, B. (1996). Towards meaningful Bayesian networks. IPMU96 Conference. 841-846. Haines, D. & Croft W.B. (1983). Relevance Feedback and Inference Networks. 20 th ACM-SIGIR Conference. 119-128. Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan and Kaufmann. San Mateo, California. Reis, I. (2000). Bayesian Networks for Information Retrieval. Ph.D. Thesis. Universidad Federal de Minas Gerais. van Rijsbergen, C.J. (1971). Information Retrieval. 2 nd Edition. Butter Worths. van Rijsbergen, C.J., Harper, D.J., & Porter, M.F. (1981). The selection of good search terms. Information Processing & Management. 17, 77-91.

Bayesian Network- based Retrieval Models 48 Bibliography Sahami, M. (1998). Using Machine Learning to Improve Information Access. Ph.D. Thesis. Stanford University. Savoy, J. & Desbois, D. (1991). Information Retrieval in Hypertext Systems: An Approach using Bayesian Networks. Electronic Publishing. 42(2), 87-108. Turtle, H.R., & Croft, W.B. (1991). Evaluation of an Inference Network-based Retrieval Model. Information Systems. 9(3), 189-224. Turtle, H.R., & Croft, W.B. (1997). Uncertainty in Information Systems. In Uncertainty Management in Information System: From needs to solutions. Kluver Academic. 189- 224. Tzeras K. & Hartman, S. (1993). Automatic Indexing Based on Bayesian Inference Netoworks. 16 th ACM-SIGIR Conference. 22-35.

The end... Thank you very much

Download ppt "An Overview of Bayesian Network-based Retrieval Models Juan Manuel Fernández Luna Departamento de Informática Universidad de Jaén Department."

Similar presentations