BASICS + EXACT AND APPROXIMATE INFERENCE

BASICS + EXACT AND APPROXIMATE INFERENCE
Pedro Larrañaga Computational Intelligence Group Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Bayesian Networks: From Theory to Practice International Black Sea University Autumn School on Machine Learning 3-11 October 2019, Tbilisi, Georgia

Basics of Bayesian networks Conceptos básicos
Reasoning under uncertainty Conditional independence Building BNs u-separation Bayesian networks: formal definition

Reasoning under uncertainty
Reasoning Cond.Indep. u-separ Definition Building Advantages of BNs Explicit representation of the uncertain knowledge Graphical, intuitive, closer to a world repres. Deal with uncertainty for reasoning and decision-making Founded on probability theory, provide a clear semantics and a sound theoretical foundation Manage many variables Both data and experts can be used to construct the model Current and huge development Support the expert; do not try to replace him

Reasoning under uncertainty
Reasoning Cond.Indep. u-separ Definition Building Modularity The joint probability distribution (global model) is specified via marginal and conditional distributions (local models) taking into account conditional independence relationships among variables This modularity: Provides an easy maintenance Reduces the number of parameters needed for the global model Estimation/elicitation is easier Reduction of the storing needs Efficient reasoning (inference)

Conditional independence
Reasoning Cond.Indep. u-separ Definition Building The joint probability distribution Dealing with a joint probability distribution n diseases D1,…,Dn m symptoms S1,…,Sm Represent P(D1,…,Dn,S1,…,Sm), with 2n+m-1 parameters E.g.: m=30, n=10, need of 240-1≈1012 That’s complete dependence: intractable in practice

Reasoning Cond.Indep. u-separ Definition Building Independence With mutual independence, only specify P(X1),…,P(Xn) n parameters (lineal) instead of 2n-1 (exponential) Unfortunately, it rarely holds in most domains Fortunately, there are some conditional independences. Exploit them (representation and inference)

Reasoning Cond.Indep. u-separ Definition Building Conditional independence Independence (marginal) sets of vars Conditional independence of X and Y given Z 3 disjoint sets of variables for all possible values x,y,z Intuitively, whenever Z=z, the information Y=y does not influence on the probability of x Notation:

Joint distribution factorized
Further factorizing the JPD Reasoning Cond.Indep. u-separ Definition Building Chain rule and factorization via c.i. Joint distribution factorized

BNs Informal definition: 2 components in a BN
Reasoning Cond.Indep. u-separ Definition Building Informal definition: 2 components in a BN Qualitative part: a directed acyclic graph (DAG) Nodes = variables Arcs = direct dependence relations (otherwise it indicates absence of direct dependence; there may be indirect dependences and independences) YES Not necessarily causality Quantitative part: a set of conditional probabilities that determine a unique JPD

BNs: nodes Reasoning Cond.Indep. u-separ Definition Building
Target node Parents Ancestors Children Descendants Rest Family

BNs: arcs (types of independence)
Reasoning Cond.Indep. u-separ Definition Building Independences in a BN A BN represents a set of independences Distinguish: Basic independences: we should take care of verifying them when constructing the net Derived independences: from the previous independences, by using the properties of the independence relations Check them by means of the u-separation criterion

Basic independences Basic independence: Markov condition
Reasoning Cond.Indep. u-separ Definition Building Basic independence: Markov condition Xi c.i. of its non-descendants, given its parents Pa(Xi)

Basic independences Example
Reasoning Cond.Indep. u-separ Definition Building Example Fever is conditionally independent of Jaundice given Malaria and Flu

Quantitative part Factorizing the JPD
Reasoning Cond.Indep. u-separ Definition Building Factorizing the JPD …Now with the quantitative part of the net, the JPD: Specify it intelligently. Use the chain rule and the Markov condition Let X1,…,Xn be an ancestral ordering (parents appear before their children in the sequence). It always exists (DAG) Using that ordering in the chain rule, in {Xi-1,…,X1} there are non-descendants of Xi, and we have

MODEL CONSTRUCTION EASIER:
Quantitative part Reasoning Cond.Indep. u-separ Definition Building Factorizing the JPD Therefore, we can recover the JPD by using the following factorization: Only store local distributions at each node Fewer parameters to assign and more naturally Inference easier MODEL CONSTRUCTION EASIER:

Quantitative part With all binary variables:
Reasoning Cond.Indep. u-separ Definition Building With all binary variables: 1 4 2 A E B W N 32=25-1 probabilities for the JPD 10 with the factorization in the BN:

Quantitative part BN Alarm for monitoring ICU patients
Reasoning Cond.Indep. u-separ Definition Building BN Alarm for monitoring ICU patients 237 probabilities for the JPD vs. 509 in BN

Independences derived from u-separation
Reasoning Cond.Indep. u-separ Definition Building u-separation Obtain the minimum graph containing X,Y,Z and their ancestors (ancestral graph) The subgraph obtained is moralized (add a link between parents with children in common) and remove direction of arcs Z u-separates X and Y whenever Z is in all paths between X and Y

Independencies derived from u-separation
Reasoning Cond.Indep. D-separ Definition Building u-separation X Blue u-separated by red? W ⊥ S | {Y,Z} ? W Y T W ⊥ T | Y ? W W S Y Z Y Z R S T

Joining the two parts Reasoning Cond.Indep. u-separ Definition Building u-separation Theorem [Verma and Pearl’90, Neapolitan’90] Let P be a prob. distribution of the variables in V and G=(V,E) a DAG. (G,P) holds the Markov condition iff disjoint u-separation defined by G c.i. defined by P Graph G represents all dependences of P Some independences of P may be not identified by u-separation in G

Definition of BN Formal definition Let P be a JPD over V={X1,…,Xn}.
Reasoning Cond.Indep. u-separ Definition Building Formal definition Let P be a JPD over V={X1,…,Xn}. A BN is a tuple (G,P), where G=(V,E) is a DAG such that: Each node of G represents a variable of V The Markov condition is held Each node has associated a local prob. distrib. such that u-separated variables in the graph are independent (taking an ancestral ordering) quantitative part

Definition of BN A property
Reasoning Cond.Indep. u-separ Definition Building A property Set of nodes that makes X c.i. of the rest of the network: A node is c.i. of all other nodes in the BN, given its parents, children and children’s parents -its Markov blanket-

Definition of BN Reasoning Cond.Indep. u-separ Definition Building Malaria is conditionally independent of Aches given ExoticTrip, Jaundice, Fever and Flu

What about continuous variables?
Reasoning Cond.Indep. D-separ Definition Building GAUSSIAN BNs All variables are continuous All conditional distributions as (linear) Gaussians Define the JPD Other: MTE, MoP (Inference in closed form)

What about dynamic systems?
Reasoning Cond.Indep. D-separ Definition Building DYNAMIC BNs: Time slices (with identical BNs) Transition arcs toward future Prior BN Transition BN Unrolled Stationarity and first-order Markov assumptions

Building a BN Expert /from data /both
Reasoning Cond.Indep. u-separ Definition Building Expert /from data /both Manual with the aid of an expert in the domain Causal mechanisms modelisation Causal graph probabilities Bayesian net Build it in the causal direction: BNs simpler and efficient Learning from a database Database Bayesian net algorithm A combination (experts → structure; database → probabilities)

Building a BN Reasoning Cond.Indep. u-separ Definition Building Summary

Building a BN Example: Asia BN [Lauritzen & Spiegelhalter’88]
Reasoning Cond.Indep. u-separ Definition Building Example: Asia BN [Lauritzen & Spiegelhalter’88]

Building a BN Reasoning Cond.Indep. u-separ Definition Building

Textbooks More recent:
A. Darwiche (2009) Modeling and Reasoning with BNs, Cambridge U.P. U. Kjaerulff, A. Madsen (2008) Bayesian Networks and Influence Diagrams -A Guide to Construction and Analysis. Springer D. Koller, N. Friedman (2009) Probabilistic Graphical Models, The MIT Press T. Koski, J. Noble (2009) Bayesian Networks: An Introduction. Wiley

Inference in Bayesian networks Conceptos básicos
Types of queries Exact inference: Brute-force computation Variable elimination algorithm Message passing algorithm Approximate inference: Probabilistic logic sampling

Example: Asia BN [Lauritzen & Spiegelhalter’88]

P(X|Smoker=yes)?

P(X|Asia=yes,Smoker=yes)?

P(X|Asia=yes,Smoker=yes,Dyspnea=yes)?

Types of queries Queries: posterior probabilities
Queries Brute-force VE Message Approx Queries: posterior probabilities Given some evidence e (observations), Posterior probability of a target variable(s) X : Other names: probability propagation, belief updating or revision… answer queries about P Vector Alarm Earth. Burgl. WCalls News ?

Types of queries Semantically, for any kind of reasoning
Queries Brute-force VE Message Approx Semantically, for any kind of reasoning Predictive reasoning or deductive (causal inference): predict effects Symptoms|Disease Alarm Earth. Burgl. WCalls News Target variable is usually a descendant of the evidence ? Diagnostic reasoning (diagnostic inference): diagnose the causes Disease|Symptoms ? Alarm Earth. Burgl. WCalls News Target variable is usually an ancestor of the evidence

Types of queries More queries: maximum a posteriori (MAP)
Queries Brute-force VE Message Approx More queries: maximum a posteriori (MAP) Most likely configurations (abductive inference): event that best explains the evidence Total abduction: search for Alarm Earth. Burgl. WCalls News ? all the unobserved In general, cannot be computed component-wise, with max P(xi|e) Partial abduction: search for subset. of unobserved (explanation set) Alarm Earth. Burgl. WCalls News ? Use MAP for: Classification: find most likely label, given the evidence

Exact inference [Pearl’88; Lauritzen & Spiegelhalter’88]
Queries Brute-force VE Message Approx Brute-force computation of P(X|e) First, consider P(Xi), without observed evidence e. Conceptually simple but computationally complex For a BN with n variables, each with its P(Xj|Pa(Xj)): Brute-force approach But this amounts to computing the JPD, often very inefficient and even intractable computationally CHALLENGE: Without computing the JDP, exploit the factorization encoded by the BN and the distributive law (local computations)

Exact inference Improving brute-force
Queries Brute-force VE Message Approx Improving brute-force Use the JPD factorization and the distributive law ? Table with 32 inputs (JPD) (if binary variables)

Exact inference Improving brute-force
Queries Brute-force VE Message Approx Improving brute-force Arrange computations effectively, moving some additions over X5 and X3: Biggest table with 8 (like the BN) over X4:

Exact inference Variable elimination algorithm Wanted:
Queries Brute-force VE Message Approx Variable elimination algorithm ONE variable Wanted: A list with all functions of the problem Select an elimination order  of all variables (except i) For each Xk from , if F is the set of functions that involve Xk: Repeat the algorithm for each target variable Delete F from the list Eliminate Xk= combine all the functions that contain this variable and marginalize out Xk Compute Add f’ to the list Output: combination (multiplication) of all functions in the current list. Normalize

Exact inference Example with Asia network
Queries Brute-force VE Message Approx Example with Asia network Visit to Asia (A) Smoking (S) Lung Cancer (L) Tuberculosis (T) Tub. or Lung Canc (E) Bronchitis (B) X-Ray (X) Dyspnea (D)

Exact inference Brute-force approach Compute P(D) by brute-force:
Queries Brute-force VE Message Approx Brute-force approach Compute P(D) by brute-force: Complexity is exponential in the size of the graph (number of variables *number of states for each variable)

Exact inference Queries Brute-force VE Message Approx
not necessarily a probability term

Exact inference Queries Brute-force VE Message Approx 4

Exact inference Variable elimination algorithm
Queries Brute-force VE Message Approx Size = 8 Variable elimination algorithm Local computations (due to moving the additions) Importance of the elimination ordering, but finding an optimal (minimum cost) is NP-hard [Arnborg et al.’87] (heuristics for good sequences) Complexity is exponential in the max # of var. in factors of the summation

Exact inference Message passing algorithm
Queries Brute-force VE Message Approx Message passing algorithm Operates passing messages among the nodes of the network. Nodes act as processors that receive, calculate and send information. Clique tree propagation, based on the same principle as VE but with a sophisticated caching strategy that: Enables to compute the posterior prob. distr. of all variables in twice the time it takes to compute that of one single variable

Exact inference Basic operations for a node
Queries Brute-force VE Message Approx Basic operations for a node Ask info(i,j): Target node i asks info to node j. Does it for all neighbors j. They do the same until there are no nodes to ask Send-message(i,j): Each node sends a message to the node that asked him the info… until reaching the target node A message is defined over the intersection of domains of fi and fj: And finally, we calculate locally at each node i: Target combines all received info with his info and marginalize over the target variable

Exact inference Procedure for X2 Queries Brute-force VE Message Approx
Ask CollectEvidence

Exact inference Queries Brute-force VE Message Approx Computing prob. P(Xi|e) of all (unobserved) variables i at a time We can perform the previous process for each node: but many messages are repeated! Or, we can use 2 rounds of messages as follows: Select a node as a root (or pivot) Ask or collect evidence from the leaves toward the root (messages in downward direction). As VE. Distribute evidence from the root toward the leaves (messages in upward direction) Calculate marginal distributions at each node by local computation, i.e. using its incoming messages This algorithm never constructs tables larger than those in the BN

Exact inference Message passing algorithm 4 3 2 5 6 7 8 1
Queries Brute-force VE Message Approx Message passing algorithm First sweep: DistributeEvidence Second sweep: CollectEvidence 1 4 3 2 5 7 6 8 Root node

Exact inference Networks with loops
Queries Brute-force VE Message Approx Networks with loops If net is not a polytree, it does not work Request/messages go in a cycle indefinitely Alternatives??

Exact inference Alternative: clustering methods
Queries Brute-force VE Message Approx Alternative: clustering methods Steps for the JUNCTION TREE CLUSTERING ALGORITHM: Moralize the BN Triangulate the moral graph and obtain the cliques Create the junction tree and its separators Compute new parameters Message passing algorithm COMPILATION

Approximate inference Inferencia aproximada
Queries Brute-force VE Message Approx Inferencia aproximada Why? Because exact inference is intractable (NP-complete) with large (+40) and densely connected BNs the associated cliques for the junction tree algorithm or the intermediate factors in the VE algorithm will grow in size, generating an exponential blowup in the number of computations performed Stochastic simulation to find approximate answers

Queries Brute-force VE Message Approx Inferencia aproximada Stochastic simulation Uses the network to generate a large number of cases (full instantiations) from the network distribution P(Xi|e) is estimated using these cases by counting observed frequencies in the samples. By the Law of Large Numbers, estimate converges to the exact probability as more cases are generated

Queries Brute-force VE Message Approx Inferencia aproximada Probabilistic logic sampling [Henrion’88] Given an ancestral ordering of the nodes (parents before children), generate from X once we have generated from its parents (i.e. from the root nodes down to the leaves) When all the nodes have been visited, we have a case, an instantiation of all the nodes in the BN A forward sampling algorithm Repeat and use the observed frequencies to estimate P(Xi|e) Use conditional prob. given the known values of the parents

Software www.hugin.com/
Software

Software See “Applications overview”

Software

Software code.google.com/p/bnt/

Software (UNED)

Software reasoning.cs.ucla.edu/samiam/

Software www.r-project.org/ learning gRbase, gRain inference
bnlearn, deal, pcalg, catnet, mugnet, bnclassify learning gRbase, gRain inference

BASICS + EXACT AND APPROXIMATE INFERENCE
Pedro Larrañaga Computational Intelligence Group Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Bayesian Networks: From Theory to Practice International Black Sea University Autumn School on Machine Learning 3-11 October 2019, Tbilisi, Georgia

BASICS + EXACT AND APPROXIMATE INFERENCE

Similar presentations

Presentation on theme: "BASICS + EXACT AND APPROXIMATE INFERENCE"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BASICS + EXACT AND APPROXIMATE INFERENCE

Similar presentations

Presentation on theme: "BASICS + EXACT AND APPROXIMATE INFERENCE"— Presentation transcript:

Similar presentations

About project

Feedback