Presentation is loading. Please wait.

Presentation is loading. Please wait.

Belief Networks CS121 – Winter 2003. Other Names Bayesian networks Probabilistic networks Causal networks.

Similar presentations


Presentation on theme: "Belief Networks CS121 – Winter 2003. Other Names Bayesian networks Probabilistic networks Causal networks."— Presentation transcript:

1 Belief Networks CS121 – Winter 2003

2 Other Names Bayesian networks Probabilistic networks Causal networks

3 Probabilistic Belief There are several possible worlds that are indistinguishable to an agent given some prior evidence. The agent believes that a logic sentence B is True with probability p and False with probability 1-p. B is called a belief In the frequency interpretation of probabilities, this means that the agent believes that the fraction of possible worlds that satisfy B is p The distribution (p,1-p) is the strength of B

4 Problem At a certain time t, the KB of an agent is some collection of beliefs At time t the agent’s sensors make an observation that changes the strength of one of its beliefs How should the agent update the strength of its other beliefs?

5 Toothache Example A certain dentist is only interested in two things about any patient, whether he has a toothache and whether he has a cavity Over years of practice, she has constructed the following joint distribution: Toothache  Toothache Cavity0.040.06  Cavity0.010.89

6 In particular, this distribution implies that the prior probability of Toothache is 0.05 P(T) = P((T  C)v(T  C)) = P(T  C) + P(T  C) Toothache Example Using the joint distribution, the dentist can compute the strength of any logic sentence built with the proposition Toothache and Cavity Toothache  Toothache Cavity0.040.06  Cavity0.010.89

7 Toothache  Toothache Cavity0.040.06  Cavity0.010.89 She now makes an observation E that indicates that a specific patient x has high probability (0.8) of having a toothache, but is not directly related to whether he has a cavity New Evidence

8 She now makes an observation E that indicates that a specific patient x has high probability (0.8) of having a toothache, but is not directly related to whether he has a cavity She can use this additional information to create a joint distribution (specific for x) conditional to E, by keeping the same probability ratios between Cavity and  Cavity The probability of Cavity that was 0.1 is now (knowing E) 0.6526 Adjusting Joint Distribution 0.640.0126 0.160.1874 Toothache|E  Toothache|E Cavity|E0.040.06  Cavity|E0.010.89

9 Corresponding Calculus P(C|T) = P(C  T)/P(T) = 0.04/0.05 Toothache  Toothache Cavity0.040.06  Cavity0.010.89

10 Corresponding Calculus P(C|T) = P(C  T)/P(T) = 0.04/0.05 P(C  T|E) = P(C|T,E) P(T|E) = P(C|T) P(T|E) Toothache|E  Toothache|E Cavity|E0.040.06  Cavity|E0.010.89 C and E are independent given T

11 Corresponding Calculus P(C|T) = P(C  T)/P(T) = 0.04/0.05 P(C  T|E) = P(C|T,E) P(T|E) = P(C|T) P(T|E) = (0.04/0.05)0.8 = 0.64 0.640.0126 0.160.1874 Toothache|E  Toothache|E Cavity|E0.040.06  Cavity|E0.010.89

12 Generalization n beliefs X 1,…,X n The joint distribution can be used to update probabilities when new evidence arrives But: The joint distribution contains 2 n probabilities Useful independence is not made explicit

13 Purpose of Belief Networks Facilitate the description of a collection of beliefs by making explicit causality relations and conditional independence among beliefs Provide a more efficient way (than by using joint distribution tables) to update belief strengths when new evidence is observed

14 Alarm Example Five beliefs A: Alarm B: Burglary E: Earthquake J: JohnCalls M: MaryCalls

15 A Simple Belief Network BurglaryEarthquake Alarm MaryCallsJohnCalls causes effects Directed acyclic graph (DAG) Intuitive meaning of arrow from x to y: “x has direct influence on y” Nodes are beliefs

16 Assigning Probabilities to Roots BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002

17 Conditional Probability Tables BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 Size of the CPT for a node with k parents: 2 k

18 Conditional Probability Tables BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01

19 What the BN Means BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 P(x1,x2,…,xn) =  i=1,…,n P(xi|Parents(Xi))

20 Calculation of Joint Probability BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 P(J  M  A   B   E) = P(J|A)P(M|A)P(A|  B,  E)P(  B)P(  E) = 0.9 x 0.7 x 0.001 x 0.999 x 0.998 = 0.00062

21 What The BN Encodes Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm BurglaryEarthquake Alarm MaryCallsJohnCalls For example, John does not observe any burglaries directly

22 What The BN Encodes Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm BurglaryEarthquake Alarm MaryCallsJohnCalls For instance, the reasons why John and Mary may not call if there is an alarm are unrelated Note that these reasons could be other beliefs in the network. The probabilities summarize these non-explicit beliefs

23 Set E of evidence variables that are observed with new probability distribution, e.g., {JohnCalls,MaryCalls} Query variable X, e.g., Burglary, for which we would like to know the posterior probability distribution P(X|E) Distribution conditional to the observations made Inference In BN ???????? TFTFTFTF TTFFTTFF P(B| … )MJ

24 Inference Patterns BurglaryEarthquake Alarm MaryCallsJohnCalls Diagnostic BurglaryEarthquake Alarm MaryCallsJohnCalls Causal BurglaryEarthquake Alarm MaryCallsJohnCalls Intercausal BurglaryEarthquake Alarm MaryCallsJohnCalls Mixed Basic use of a BN: Given new observations, compute the new strengths of some (or all) beliefs Other use: Given the strength of a belief, which observation should we gather to make the greatest change in this belief’s strength

25 Applications http://excalibur.brc.uconn.edu/~baynet /researchApps.html Medical diagnosis, e.g., lymph-node deseases Fraud/uncollectible debt detection Troubleshooting of hardware/software systems

26 Neural Networks CS121 – Winter 2003

27 Function-Learning Formulation Goal function f Training set: (x i, f (x i )), i = 1,…,n Inductive inference: find a function h that fits the point well Issues: Representation Incremental learning Neural nets

28 Unit (Neuron)  g xixi x0x0 xnxn y wiwi y = g (  i=1,…,n w i x i ) g(u) = 1/[1 + exp(-  u)]

29 Particular Case: Perceptron  g xixi x0x0 xnxn y wiwi y = g (  i=1,…,n w i x i ) ++ + + +- - - - -

30 Particular Case: Perceptron  g xixi x0x0 xnxn y wiwi y = g (  i=1,…,n w i x i ) + + + + +- - - - - ?

31 Neural Network Network of interconnected neurons  g xixi x0x0 xnxn y wiwi  g xixi x0x0 xnxn y wiwi Acyclic (feed-forward) vs. recurrent networks

32 Two-Layer Feed-Forward Neural Network InputsHidden layer Output layer

33 Backpropagation (Principle) New example Y k = f(x k ) Error function: E(w) = ||y k – Y k || 2 w ij (k) = w ij (k-1) –   E/  w ij Backprojection: Update the weights of the inputs to the last layer, then the weights of the inputs to the previous layer, etc.

34 Issues How to choose the size and structure of networks? If network is too large, risk of over- fitting (data caching) If network is too small, representation may not be rich enough Role of representation: e.g., learn the concept of an odd number

35 What is AI? Discipline that systematizes and automates intellectual tasks to create machines that: Act like humansAct rationally Think like humansThink rationally

36 What Have We Learned? Collection of useful methods Connection between fields Relation between high-level (e.g., logic) and low-level (e.g., neural networks) representations Impact of hardware What is intelligence? Our techniques are better than our understanding


Download ppt "Belief Networks CS121 – Winter 2003. Other Names Bayesian networks Probabilistic networks Causal networks."

Similar presentations


Ads by Google