Presentation is loading. Please wait.

Presentation is loading. Please wait.

PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005.

Similar presentations


Presentation on theme: "PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005."— Presentation transcript:

1 PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

2 Definitions  Bayesian networks consist of nodes and (usually directional) arcs  Nodes or states represent a classification class or in general events and are described with a pdf  Arcs represent relations between arcs, e.g., cause and effect, time sequence  Two nodes that are connected via another node are conditionally independent (given that node)

3 When to use Bayesian nets  Bayesian networks (or networks of inference) are statistical models that are used for classification (or in general pattern recognition) problems where there are dependencies among classes, e.g., time dependencies, cause and effect dependencies

4 Conditional Independence  Full independence between A and B P(A|B) = P(A) or P(A,B) = P(A) P(B)  Conditional independence of A, B given C P(A|BC) = P(A|C) or P(A,B|C) = P(A|C)P(B|C)

5 Conditional Independence A, C independent given B P(C|BA) = P(C|B) B,C independent given A P(B,C|A) = P(B|A)P(C|A) A,C dependent given B P(A,C|B) cannot be reduced! A B C A B C A BC

6 Three problems 1.Probability computation (use independence) 2.Training/Parameter Estimation Maximum likelihood (ML) if all is observable Expectation maximization (EM) if missing data 3.Inference (Testing) Diagnosis P(cause|effect)bottom-up PredictionP(effect|cause)top-down

7 Probability Computation For a Bayesian Network that consists of N nodes: 1.Compute P(n 1, n 2..n N ) using chain rule starting from the “last/bottom” node and working your way up P(n 1, n 2..n N ) = P(n N | n 1, n 2.. n N-1 ) P(n N-1 |n 1, n 2.. n N-2 ) … P(n 2 |n 1 ) P(n 1 ) 2.Identify conditional independence conditions from Bayesian network topology 3.Simplify the conditionals probabilities using independence conditions

8 Probability Computation Topology: P(C,S,R,W) = P(W|C,S,R) P(S|CR) P(R|C)P(C) Independent: (W,C)|S,R(S,R)|C Dependent: (S,R)|W P(C,S,R,W) = P(W|S,R) P(S|C) P(R|C) P(C) C S W R

9 Probability Computation  There are general algorithms for identifying cliques in the Bayesian net  Cliques are islands of conditional dependence, i.e., terms in the probability computation that cannot be further reduced SC WSR RC

10 Training/Parameter Estimation  Instead of estimating the joint pdf of the whole network the joint pdf of each of the cliques is estimated  For example if the network joint pdf is P(C,S,R,W) = P(W|S,R) P(S|C) P(R|C) P(C) instead of computing P(C,S,R,W) we compute each of P(W|S,R), P(S|C), P(R|C), P(C) for all possible values of W, S, R, C (much simpler)

11 Training/Parameter Estimation  For fully observable data and discrete probabilities compute maximum likelihood estimates of parameters, e.g., for discrete probs counts(W=1,S=1,R=0) P(W=1|S=1,R=0) ML = _______________________ counts(W=*,S=1,R=0)

12 Training/Parameter Estimation  Example: the following observations pairs are given for (W,C,S,R): (1,0,1,0), (0,0,1,0),(1,1,1,0),(0,1,1,0),(1,0,1,0), (0,1,0,0),(1,0,0,1),(0,1,1,1),(1,1,1,0)  Using Maximum Likelihood Estimation: P(W=1|S=1,R=0) ML = #(1, *, 1, 0)/#(*,*,1,0) = 2/5 = 0.4

13 Training/Parameter Estimation  When data is non observable or missing the EM algorithm is employed  There are efficient implementations of the EM algorithm for Bayesian nets that operate on the clique network  When the topology of the Bayesian network is not known structural EM can be used

14 Inference  There are two types of inference (testing) Diagnosis P(cause|effect)bottom-up PredictionP(effect|cause)top-down Once  Once the parameters of the network are estimated the joint network pdf can be estimated for ALL possible network values  Inference is simply probability computation using the network pdf

15 Inference  For example P(W=1|C=1) = P(W=1,C=1) / P(C=1) where P(W=1,C=1) =  RS P(W=1,C=1,R=*,S=*) P(C=1) =  RWS P(W=*,C=1,R=*,S=*)

16 Inference  Efficient algorithms exist for performing inference in large networks which operate on the clique network  Inference is often shown as a probability maximization problem, e.g., what is the most probable cause or effect? argmax W P(W|C=1)

17 Continuous Case  In our examples the network nodes represented discrete events (states or classes)  Network nodes often hold continuous variables (observations), e.g., length, energy  For the continuous case parametric pdf are introduced and their parameters are estimated using ML (observed) or EM (hidden)

18 Some Applications  Medical diagnosis  Computer problem diagnosis (MS)  Markov chains  Hidden Markov Models (HMMs)

19 Conclusions  Bayesian networks are used to represent dependencies between classes  Network topology defines conditional independence conditions that simplify the network pdf modeling and computation  Three problems: probability computation, estimation/training, inference/testing


Download ppt "PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005."

Similar presentations


Ads by Google