CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Slides:

Advertisements

Similar presentations

CS188: Computational Models of Human Behavior

Advertisements

A Tutorial on Learning with Bayesian Networks

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Pattern Recognition and Machine Learning

INTRODUCTION TO Machine Learning 3rd Edition

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

Yazd University, Electrical and Computer Engineering Department Course Title: Machine Learning By: Mohammad Ali Zare Chahooki Bayesian Decision Theory.

CHAPTER 13: Binomial Distributions

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Bayes for beginners Methods for dummies 27 February 2013 Claire Berna

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Machine Learning CMPT 726 Simon Fraser University

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Thanks to Nir Friedman, HU

Bayes Nets. Bayes Nets Quick Intro Topic of much current research Models dependence/independence in probability distributions Graph based - aka “graphical.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Bayesian Decision Theory Making Decisions Under uncertainty 1.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.

Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:

Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI

Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.

Machine Learning 5. Parametric Methods.

ELEC 303 – Random Signals Lecture 17 – Hypothesis testing 2 Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 2, 2009.

Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bayesian classification review Bayesian statistics derive K nearest neighbors (KNN) classifier analysis of 2-way classification results homework assignment.

Lecture 2. Bayesian Decision Theory

Lecture 1.31 Criteria for optimal reception of radio signals.

INTRODUCTION TO Machine Learning 2nd Edition

Data Mining Lecture 11.

CHAPTER 3: Bayesian Decision Theory

INTRODUCTION TO Machine Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Recognition and Machine Learning

Machine Learning” Dr. Alper Özpınar.

LECTURE 23: INFORMATION THEORY REVIEW

LECTURE 07: BAYESIAN ESTIMATION

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis

A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.

Parametric Methods Berlin Chen, 2005 References:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

INTRODUCTION TO Machine Learning

Presentation transcript:

CHAPTER 3: BAYESIAN DECISION THEORY

Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Use Bayes rule to calculate the probability of the classes  Make rational decision among multiple actions to minimize expected risk  Learning association rules from data

Unobservable variables Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3  Tossing a coin is completely random process, can’t predict the outcome  Only can talk about the probabilities that the outcome of the next toss will be head or tails  If we have access to extra knowledge (exact composition of the coin, initial position, force etc.) the exact outcome of the toss can be predicted

Unobservable Variable Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4  Unobservable variable is the extra knowledge that we don’t have access to  Coin toss: the only observable variables is the outcome of the toss  X=f(z), z is unobservables, x is observables  f is deterministic function

Bernoulli Random Variable Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5  Result of tossing a coin is  {Heads,Tails}  Define a random variable X  {1,0}  p o the probability of heads  P(X = 1) = p o and P(X = 0) = 1 − P(X = 1) = 1 − p o  Assume asked to predict the next toss  If know p o we would predict heads if p o >1/2  Choose more probable case to minimize probability of the error 1- p o

Estimation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6  What if we don’t know P(X)  Want to estimate from given data (sample) X  Realm of statistics  Sample X generated from probability distribution of the observables x t  Want to build an approximator p(x) using sample X  In coin toss example: sample is outcomes of past N tosses and in distribution is characterized by single parameter p o

Parameter Estimation Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7

Classification Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8  Credit scoring: two classes – high risk and low risk  Decide on observable information: (income and saving)  Have reasons to believe that these 2 variable gives us idea about the credibility of a customer  Represent by two random variable X 1 and X 2  Can’t observe customer intentions and moral codes  Can observe credibility of a past customer  Bernoulli random variable C conditioned on X=[X 1, X 2 ] T  We know P(C| X 1, X 2 )

Classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9  Assume know P(C| X 1, X 2 )  New applications X 1 =x 1, X 2 =x 2

Classification Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10  Similar to coin toss but C is conditioned on two other observable variables x=[x 1,x 2 ] T  The problem : Calculate P(C|x)  Use Bayes rule

Conditional Probability Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11  Probability of A (point will be inside A) if we know that B happens (point is inside B)  P(A|B)=P(A  B)/P(B)

Bayes Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12  P(A|B)=P(A  B)/P(B)  P(B|A)= P(A  B)/P(A)=>P(A  B)=P(B|A)*P(A)  P(A|B)=P(B|A)*P(A)/P(B)

Bayes Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13  Prior: probability of a customer is high risk regardless of x.  Knowledge we have as to the value of C before looking at observables x posterior likelihoodprior evidence

Bayes Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14  Likelihood: probability that event in C will have observable X  P(x 1,x 2 |C=1) is the probability that a high-risk customer has his X 1 =x 1,X 2 =x 2 posterior likelihoodprior evidence

Bayes Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15  Evidence: P(x) probability that observation x is seen regardless if positive or negative posterior likelihoodprior evidence

Bayes’ Rule 16 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) posterior likelihoodprior evidence

Bayes Rule for classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17  Assume know : prior, evidence and likelihood  Will learn how to estimate them from the data later  Plug them in into Bayes formula to obtain P(C|x)  Choose C=1 if P(C=1|x)>P(c=0|x)

Bayes’ Rule: K>2 Classes 18 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Bayes’ Rule Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19  Deciding on specific input x  P(x) is the same for all classes  Don’t need it to compare posterior

Losses and Risks  Decisions/Errors are not equally good or costly  Actions: α i is assignment to class i  Loss of α i when the state is C k : λ ik  Expected risk (Duda and Hart, 1973) Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20

Losses and Risks: 0/1 Loss 21 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) For minimum risk, choose the most probable class

Losses and Risks: Reject Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22

Discriminant Functions Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23  Define a function g i (x) for each class ( “goodness” of selecting class C i given observables x  Maximum discriminant corresponds to minimum conditional risk

Decision Regions Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 K decision regions R 1,...,R K

K=2 Classes  g(x) = g 1 (x) – g 2 (x)  Log odds: Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25

Correlation and Causality Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26

Interaction Between Variables Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27  Have many variables want to represent and analyze  Structure of dependency among them  Conditional and marginal probabilities  Use graphical representation  Nodes (events) and arcs (dependencies)  Numbers of nodes and arcs (conditional probabilities)

Causes and Bayes’ Rule Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 28 Diagnostic inference: Knowing that the grass is wet, what is the probability that rain is the cause? causal diagnostic

Causal vs Diagnostic Inference Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 29 Causal inference: If the sprinkler is on, what is the probability that the grass is wet? P(W|S) = P(W|R,S) P(R|S) + P(W|~R,S) P(~R|S) = P(W|R,S) P(R) + P(W|~R,S) P(~R) = = 0.92

Causal vs Diagnostic Inference Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 30 Diagnostic inference: If the grass is wet, what is the probability that the sprinkler is on? P(S|W) = 0.35 > 0.2 P(S) P(S|R,W) = 0.21 Explaining away: Knowing that it has rained decreases the probability that the sprinkler is on.

Bayesian Networks: Causes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 31

Bayesian Nets: Local structure 32 Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)  Only need to specify interactions between neighboring events

Bayesian Networks: Classification Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 33 diagnostic P (C | x ) Bayes’ rule inverts the arc:

Naive Bayes’ Classifier Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 34 Given C, x j are independent: p(x|C) = p(x 1 |C) p(x 2 |C)... p(x d |C)

Naïve Bayes’ Classifier Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 35  Why naïve?  Ignores dependency among the inputs  Dependency among “baby food” and “diapers”  Hidden variables (“have children”)  Insert node/arcs and estimate their values from data

Association Rules  Association rule: X  Y  Support (X  Y):  Confidence (X  Y): Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 36

Association Rule Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 37  Only one customer bought chips  Same customer bought beer  P(C|B)=1  But support is tiny  Support shows statistical significance