Presentation is loading. Please wait.

Presentation is loading. Please wait.

INTRODUCTION TO Machine Learning

Similar presentations


Presentation on theme: "INTRODUCTION TO Machine Learning"— Presentation transcript:

1 INTRODUCTION TO Machine Learning
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004

2 CHAPTER 3: Bayesian Decision Theory

3 Probability Tossing a coin is a random process because we cannot predict at any toss whether the outcome will be heads or tails, that is why we toss coins, or buy lottery tickets, or get insurance. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

4 Probability and Inference
Result of tossing a coin is Î {Heads,Tails} Random var X Î{1,0} Bernoulli: P {X=1} = poX (1 ‒ po)(1 ‒ X) Sample: X = {xt }Nt =1 Estimation: po = # {Heads}/#{Tosses} = ∑t xt / N Prediction of next toss: Heads if po > ½, Tails otherwise Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

5 Classification Credit scoring: Inputs are income and savings.
Output is low-risk vs high-risk Input: x = [x1,x2]T ,Output: C Î {0,1} Prediction: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

6 Bayes’ Rule prior likelihood posterior evidence
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

7 Prior probability Class Likelihood Evidence Poster probability
P(C=1) (A customer is high-risk) P(C=0)+P(C=1) = 1 Class Likelihood P(x|C): The conditional probability that an event belonging to C has the associated observation value x. Evidence P(x): The marginal probability that an observation x is seen, regardless of whether it is positive or negative example. P(x) = P(x|C=1)P(C=1)+ P(x|C=0)P(C=0) Poster probability Posterior = prior*likelihood/evidence Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

8 Bayes’ Rule prior likelihood posterior evidence
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

9 Bayes’ Rule: K>2 Classes
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

10 Bayes’ Classifier Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

11 Losses and Risks Actions: αi Loss of αi when the state is Ck : λik
Expected risk (Duda and Hart, 1973) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

12 Losses and Risks: 0/1 Loss
For minimum risk, choose the most probable class Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

13 Losses and Risks: Reject
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

14 Discriminant Functions
K decision regions R1,...,RK Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

15 K=2 Classes Dichotomizer (K=2) vs Polychotomizer (K>2)
g(x) = g1(x) – g2(x) Log odds: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

16 Utility Theory Prob of state k given exidence x: P (Sk|x)
Utility of αi when state is k: Uik Expected utility: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

17 Value of Information Expected utility using x only
Expected utility using x and new feature z z is useful if EU (x,z) > EU (x) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

18 Bayesian Networks Also called
belief networks Probabilistic networks They are graphical models for representing the interaction between variable visually. A bayesian network is composed of nodes and arcs between nodes. Each node corresponds to a random variable, X, and has a value corresponding to the probability of the random variable, P(X). Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

19 Bayesian Networks If there is a directed arc from node X to node Y, this indicates that X has a direct influence on Y. This influence is specified by the conditional probability P(X|Y) . The networks is a directed acyclic graph (DAG), namely, there are no cycles. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

20 Bayesian Networks Aka graphical models, probabilistic networks
Nodes are hypotheses (random vars) and the prob corresponds to our belief in the truth of the hypothesis Arcs are direct direct influences between hypotheses The structure is represented as a directed acyclic graph (DAG) The parameters are the conditional probs in the arcs (Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

21 Causes and Bayes’ Rule Diagnostic inference:
Knowing that the grass is wet, what is the probability that rain is the cause? diagnostic causal Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

22 Causal vs Diagnostic Inference
Causal inference: If the sprinkler is on, what is the probability that the grass is wet? P(W|S) = P(W|R,S) P(R|S) + P(W|~R,S) P(~R|S) = P(W|R,S) P(R) + P(W|~R,S) P(~R) = = 0.92 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

23 Diagnostic inference:
If the grass is wet, what is the probability that the sprinkler is on? P(S|W)=?. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

24 P(S|W) P(S|W) = P(W|S)P(S)/P(W) = 0.92*0.2/0.52 = 0.35
Where P(W) = P(W| R, S)P( R, S) + P(W|~R, S)P(~R, S) + P(W| R, ~S)P( R,~S) + P(W|~R,~S)P(~R,~S) = P(W| R, S)P(R) P(S) + P(W|~R, S)P(~R)P(S) + P(W| R, ~S)P(R) P(~S) + P(W|~R,~S)P(~R)P(~S) = 0.52 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

25 P(S|R,W): P(S|R,W) = P(W|R,S) P(S|R)/P(W|R) = P(W|R,S) P(S)/P(W|R)
= 0.21 < P(S|W) = 0.35 Explaining away: given that we know it rained, the probability of sprinkler causing the wet grass decreses. Knowing that the grass is wet, rain and sprinkler become dependent. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

26 Bayesian Networks: Causes
Causal inference: P(W|C) = P(W|R,S) P(R,S|C) + P(W|~R,S) P(~R,S|C) + P(W|R,~S) P(R,~S|C) + P(W|~R,~S) P(~R,~S|C) and use the fact that P(R,S|C) = P(R|C) P(S|C) Diagnostic: P(C|W ) = ? Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

27 Bayesian Nets: Local structure
P (F | C) = ? Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

28 Bayesian Networks: Inference
P (C,S,R,W,F) = P (C) P (S|C) P (R|C) P (W|R,S) P (F|R) P (C,F) = ∑S ∑R ∑W P (C,S,R,W,F) P (F|C) = P (C,F) / P(C) Not efficient! Belief propagation (Pearl, 1988) Junction trees (Lauritzen and Spiegelhalter, 1988) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

29 Bayesian Networks: Classification
Bayes’ rule inverts the arc: diagnostic P (C | x ) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

30 Naive Bayes’ Classifier
Given C, xj are independent: p(x|C) = p(x1|C) p(x2|C) ... p(xd|C) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

31 Influence Diagrams decision node utility node chance node
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

32 Association Rules Association rule: X ® Y Support (X ® Y):
Confidence (X ® Y): Apriori algorithm (Agrawal et al., 1996) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)


Download ppt "INTRODUCTION TO Machine Learning"

Similar presentations


Ads by Google