1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.

1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks

2 Chapter 12 Contents l Probabilistic Reasoning l Joint Probability Distributions l Bayes’ Theorem l Simple Bayesian Concept Learning l Bayesian Belief Networks l The Noisy-V Function l Bayes’ Optimal Classifier l The Naïve Bayes Classifier l Collaborative Filtering

3 Probabilistic Reasoning l Probabilities are expressed in a notation similar to that of predicates in FOPC: nP(S) = 0.5 nP(T) = 1 nP(¬(A Λ B) V C) = 0.2 l 1 = certain; 0 = certainly not

4 Conditional Probability l Conditional probability refers to the probability of one thing given that we already know another to be true: l This states the probability of B, given A.

5 Conditional Probability l Note that P(A|B) ≠ P(B|A) l P(R/\S) = 0.01 l P(S) = 0.1 l P(R) = 0.7

6 Conditional Probability l Conditional probability refers to the probability of one thing given that we already know another to be true: P(A \/ B) = P(A) + P(B) – P(A /\ B) P(A /\ B) = P(A) * p(B) if A and B are independent events.

7 Joint Probability Distributions l A joint probability distribution represents the combined probabilities of two or more variables. l This table shows, for example, that P (A Λ B) = 0.11 P (¬A Λ B) = 0.09 l Using this, we can calculate P(A): P(A) = P(A Λ B) + P(A Λ ¬B) = 0.11 + 0.63 = 0.74

8 Bayes’ Theorem l Bayes’ theorem lets us calculate a conditional probability: l P(B) is the prior probability of B. l P(B | A) is the posterior probability of B.

10 Simple Bayesian Concept Learning (1) l P (H|E) is used to represent the probability that some hypothesis, H, is true, given evidence E. l Let us suppose we have a set of hypotheses H 1 …H n. l For each H i l Hence, given a piece of evidence, a learner can determine which is the most likely explanation by finding the hypothesis that has the highest posterior probability.

11 Simple Bayesian Concept Learning (2) l In fact, this can be simplified. Since P(E) is independent of H i it will have the same value for each hypothesis. l Hence, it can be ignored, and we can find the hypothesis with the highest value of: l We can simplify this further if all the hypotheses are equally likely, in which case we simply seek the hypothesis with the highest value of P(E|H i ). This is the likelihood of E given H i.

12 Example l If high temp (A), have cold (B) – 80% l P(A|B) = 0.8 l Suppose 1 in 10,000 have cold l Suppose 1 in 1,000 have high temp l P(A) = 0.001 P(B) = 0.0001 l P(B|A) = {P(A|B)*P(B)}/P(A) l = 0.008 8 chances in 1000 that you have a cold when having a high temp.

13 Bayesian Belief Networks (1) l A belief network shows the dependencies between a group of variables. l If two variables A and B are independent if the likelihood that A will occur has nothing to do with whether B occurs. l C and D are dependent on A; D and E are dependent on B. The Bayesian belief network has probabilities associated with each link. E.g., P(C|A) = 0.2, P(C|¬A) = 0.4

15 Bayesian Belief Networks (3) l We can now calculate conditional probabilities: l P(A,B,C,D,E) = P(E|A,B,C,D)*P(A,B,C,D) l In fact, we can simplify this, since there are no dependencies between certain pairs of variables – between E and A, for example. Hence:

16 Example P(C) =.2 (go to college) P(S) =.8 if c,.2 if not c (study) P(P) =.6 if c,.5 if not c (party) P(F) =.9 if p,.7 if not p (fun) C P E F S

17 Example 2 S P P(E) exam success true true.6 true false.9 false true.1 false false.2 C P E F S

18 Example 3 P(C,S,¬P,E,¬F)=P(C)*P(S|C)*P(¬P|C)*P(E|S/\¬P)*P(¬F|¬P) = 0.2*0.8*0.4*0.9*0.3 = 0.01728 C P E F S

19 Bayes’ Optimal Classifier l A system that uses Bayes’ theory to classify data. l We have a piece of data y, and are seeking the correct hypothesis from H 1 … H 5, each of which assigns a classification to y. l The probability that y should be classified as c j is: l x 1 to x n are the training data, and m is the number of hypotheses. l This method provides the best possible classification for a piece of data.

20 The Naïve Bayes Classifier (1) l A vector of data is classified as a single classification. p(c i | d 1, …, d n ) l The classification with the highest posterior probability is chosen. l The hypothesis which has the highest posterior probability is the maximum a posteriori, or MAP hypothesis. l In this case, we are looking for the MAP classification. l Bayes’ theorem is used to find the posterior probability:

21 The Naïve Bayes Classifier (2) l since P(d 1, …, d n ) is a constant, independent of c i, we can eliminate it, and simply aim to find the classification c i, for which the following is maximised: l We now assume that all the attributes d 1, …, d n are independent l So P(d 1, …, d n |c i ) can be rewritten as: l The classification for which this is highest is chosen to classify the data.

22 Collaborative Filtering l A method that uses Bayesian reasoning to suggest items that a person might be interested in, based on their known interests. l if we know that Anne and Bob both like A, B and C, and that Anne likes D then we guess that Bob would also like D. l Can be calculated using decision trees:

1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.

Similar presentations

Presentation on theme: "1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.

Similar presentations

Presentation on theme: "1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks."— Presentation transcript:

Similar presentations

About project

Feedback