Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Similar presentations


Presentation on theme: "CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)"— Presentation transcript:

1 CHAPTER 3: BAYESIAN DECISION THEORY

2 Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Use Bayes rule to calculate the probability of the classes  Make rational decision among multiple actions to minimize expected risk  Learning association rules from data

3 Unobservable variables Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3  Tossing a coin is completely random process, can’t predict the outcome  Only can talk about the probabilities that the outcome of the next toss will be head or tails  If we have access to extra knowledge (exact composition of the coin, initial position, force etc.) the exact outcome of the toss can be predicted

4 Unobservable Variable Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4  Unobservable variable is the extra knowledge that we don’t have access to  Coin toss: the only observable variables is the outcome of the toss  X=f(z), z is unobservables, x is observables  f is deterministic function

5 Bernoulli Random Variable Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5  Result of tossing a coin is  {Heads,Tails}  Define a random variable X  {1,0}  p o the probability of heads  P(X = 1) = p o and P(X = 0) = 1 − P(X = 1) = 1 − p o  Assume asked to predict the next toss  If know p o we would predict heads if p o >1/2  Choose more probable case to minimize probability of the error 1- p o

6 Estimation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6  What if we don’t know P(X)  Want to estimate from given data (sample) X  Realm of statistics  Sample X generated from probability distribution of the observables x t  Want to build an approximator p(x) using sample X  In coin toss example: sample is outcomes of past N tosses and in distribution is characterized by single parameter p o

7 Parameter Estimation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7

8 Classification Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8  Credit scoring: two classes – high risk and low risk  Decide on observable information: (income and saving)  Have reasons to believe that these 2 variable gives us idea about the credibility of a customer  Represent by two random variable X 1 and X 2  Can’t observe customer intentions and moral codes  Can observe credibility of a past customer  Bernoulli random variable C conditioned on X=[X 1, X 2 ] T  We know P(C| X 1, X 2 )

9 Classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9  Assume know P(C| X 1, X 2 )  New applications X 1 =x 1, X 2 =x 2

10 Classification Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10  Similar to coin toss but C is conditioned on two other observable variables x=[x 1,x 2 ] T  The problem : Calculate P(C|x)  Use Bayes rule

11 Conditional Probability Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11  Probability of A (point will be inside A) if we know that B happens (point is inside B)  P(A|B)=P(A  B)/P(B)

12 Bayes Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12  P(A|B)=P(A  B)/P(B)  P(B|A)= P(A  B)/P(A)=>P(A  B)=P(B|A)*P(A)  P(A|B)=P(B|A)*P(A)/P(B)

13 Bayes Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13  Prior: probability of a customer is high risk regardless of x.  Knowledge we have as to the value of C before looking at observables x posterior likelihoodprior evidence

14 Bayes Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14  Likelihood: probability that event in C will have observable X  P(x 1,x 2 |C=1) is the probability that a high-risk customer has his X 1 =x 1,X 2 =x 2 posterior likelihoodprior evidence

15 Bayes Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15  Evidence: P(x) probability that observation x is seen regardless if positive or negative posterior likelihoodprior evidence

16 Bayes’ Rule 16 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) posterior likelihoodprior evidence

17 Bayes Rule for classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17  Assume know : prior, evidence and likelihood  Will learn how to estimate them from the data later  Plug them in into Bayes formula to obtain P(C|x)  Choose C=1 if P(C=1|x)>P(c=0|x)

18 Bayes’ Rule: K>2 Classes 18 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

19 Bayes’ Rule Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19  Deciding on specific input x  P(x) is the same for all classes  Don’t need it to compare posterior

20 Losses and Risks  Decisions/Errors are not equally good or costly  Actions: α i is assignment to class i  Loss of α i when the state is C k : λ ik  Expected risk (Duda and Hart, 1973) Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20

21 Losses and Risks: 0/1 Loss 21 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) For minimum risk, choose the most probable class

22 Losses and Risks: Reject Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22

23 Discriminant Functions Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23  Define a function g i (x) for each class ( “goodness” of selecting class C i given observables x  Maximum discriminant corresponds to minimum conditional risk

24 Decision Regions Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 K decision regions R 1,...,R K

25 K=2 Classes  g(x) = g 1 (x) – g 2 (x)  Log odds: Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25

26 Correlation and Causality Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26

27 Interaction Between Variables Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27  Have many variables want to represent and analyze  Structure of dependency among them  Conditional and marginal probabilities  Use graphical representation  Nodes (events) and arcs (dependencies)  Numbers of nodes and arcs (conditional probabilities)

28 Causes and Bayes’ Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 28 Diagnostic inference: Knowing that the grass is wet, what is the probability that rain is the cause? causal diagnostic

29 Causal vs Diagnostic Inference Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 29 Causal inference: If the sprinkler is on, what is the probability that the grass is wet? P(W|S) = P(W|R,S) P(R|S) + P(W|~R,S) P(~R|S) = P(W|R,S) P(R) + P(W|~R,S) P(~R) = 0.95 0.4 + 0.9 0.6 = 0.92

30 Causal vs Diagnostic Inference Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 30 Diagnostic inference: If the grass is wet, what is the probability that the sprinkler is on? P(S|W) = 0.35 > 0.2 P(S) P(S|R,W) = 0.21 Explaining away: Knowing that it has rained decreases the probability that the sprinkler is on.

31 Bayesian Networks: Causes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 31

32 Bayesian Nets: Local structure 32 Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)  Only need to specify interactions between neighboring events

33 Bayesian Networks: Classification Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 33 diagnostic P (C | x ) Bayes’ rule inverts the arc:

34 Naive Bayes’ Classifier Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 34 Given C, x j are independent: p(x|C) = p(x 1 |C) p(x 2 |C)... p(x d |C)

35 Naïve Bayes’ Classifier Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 35  Why naïve?  Ignores dependency among the inputs  Dependency among “baby food” and “diapers”  Hidden variables (“have children”)  Insert node/arcs and estimate their values from data

36 Association Rules  Association rule: X  Y  Support (X  Y):  Confidence (X  Y): Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 36

37 Association Rule Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 37  Only one customer bought chips  Same customer bought beer  P(C|B)=1  But support is tiny  Support shows statistical significance


Download ppt "CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)"

Similar presentations


Ads by Google