1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Bayesian Decision Theory
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Chapter 4: Linear Models for Classification
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.
Machine Learning CMPT 726 Simon Fraser University
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Institute of Systems and Robotics ISR – Coimbra Mobile Robotics Lab Bayesian Approaches 1 1 jrett.
Principles of Pattern Recognition
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.
Sergios Theodoridis Konstantinos Koutroumbas Version 2
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 E. Fatemizadeh Statistical Pattern Recognition.
Bayesian Decision Theory (Classification) 主講人:虞台文.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
Linear Models for Classification
Covariance matrices for all of the classes are identical, But covariance matrices are arbitrary.
Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ECE 8443 – Pattern Recognition LECTURE 04: PERFORMANCE BOUNDS Objectives: Typical Examples Performance Bounds ROC Curves Resources: D.H.S.: Chapter 2 (Part.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 04: GAUSSIAN CLASSIFIERS Objectives: Whitening.
1 Pattern Recognition Chapter 2 Bayesian Decision Theory.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Objectives: Chernoff Bound Bhattacharyya Bound ROC Curves Discrete Features Resources: V.V. – Chernoff Bound J.G. – Bhattacharyya T.T. – ROC Curves NIST.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Objectives: Normal Random Variables Support Regions Whitening Transformations Resources: DHS – Chap. 2 (Part 2) K.F. – Intro to PR X. Z. – PR Course S.B.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Lecture 2. Bayesian Decision Theory
Lecture 1.31 Criteria for optimal reception of radio signals.
LECTURE 04: DECISION SURFACES
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
LECTURE 03: DECISION SURFACES
CH 5: Multivariate Methods
LECTURE 05: THRESHOLD DECODING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 05: THRESHOLD DECODING
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Image Analysis
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EE513 Audio Signals and Systems
Pattern Recognition and Machine Learning
Mathematical Foundations of BME
LECTURE 09: DISCRIMINANT ANALYSIS
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen
LECTURE 05: THRESHOLD DECODING
LECTURE 11: Exam No. 1 Review
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia, National Taiwan University

2 Basic Assumptions The decision problem is posed in probabilistic terms All of the relevant probability values are known

3 State of Nature State of nature – A priori probability (prior) – Decision rule to judge just one fish –

4 Class-Conditional Probability Density

5 Bayes Formula

6 Posterior Probabilities

7 Bayes Decision Rule Probability of error Bayes decision rule

8 Bayes Decision Theory (1/3) CategoriesActions Loss functions Feature vector

9 Bayes Decision Theory (2/3) Bayes formula Conditional risk

10 Bayes Decision Theory (3/3) Decision function assumes one of the values Overall risk Bayes decision rule: compute the conditional risk then select the action for which is minimum then select the action for which is minimum

11 Two-Category Classification Conditional risk Decision rule: decide  1 if Likelihood ratio

12 Minimum-Error-Rate Classification If action is taken and the true state is, then the decision is correct if and in error if Error rate (the probability of error) is to be minimized Symmetrical or zero-one loss function Conditional risk

13 Minimum-Error-Rate Classification

14 Mini-max Criterion To perform well over a range of prior probability Minimize the maximum possible overall risk –So that the worst risk for any value of the priors is as small as possible

15 Mini-maximizing Risk

16 Searching for Mini-max Boundary

17 Neyman-Pearson Criterion Minimize the overall risk subject to a constraint Example –Minimize the total risk subject to

18 Discriminant Functions A classifier assigns to class if where are called discriminant functions where are called discriminant functions A discriminant function for a Bayes classifier Two discriminant functions for minimum- error-rate classification

19 Discriminant Functions

20 Two-Dimensional Two-Category Classifier

21 Dichotomizers Place a pattern in one of only two categories –cf. Polychotomizers More common to define a single duscriminant function Some particular forms

22 Univariate Normal PDF

23 Distribution with Maximum Entropy and Central Limit Theorem Entropy for discrete distribution Entropy for continuous distribution Central limit theorem –Aggregate effect of the sum of a large number of small, independent random disturbances, will lead to a Gaussian distrubution

24 Multivariate Normal PDF : d-component mean vector : d-component mean vector : d-by-d : d-by-d covariance matrix covariance matrix

25 Linear Combination of Gaussian Random Variables

26 Whitening Transform : matrix whose columns are the orthonormal eigenvectors of  : matrix whose columns are the orthonormal eigenvectors of  : diagonal matrix of the corresponding eigenvalues : diagonal matrix of the corresponding eigenvalues Whitening transform

27 Bivariate Gaussian PDF

28 Mahalanobis Distance Squared Mahalanobus distance Volume of the Hyperellipsoids of constant Mahalanobis distance r

29 Discriminant Functions for Normal Density

30 Case 1:  i =  2 I

31 Decision Boundaries

32 Decision Boundaries when P(  i )=P(  j )

33 Decision Boundaries when P(  i ) and P(  j ) are unequal

34 Case 2:  i = 

35 Decision Boundaries

36 Decision Boundaries

37 Case 3:  i = arbitrary

38 Decision Boundaries for One- Dimensional Case

39 Decision Boundaries for Two- Dimensional Case

40 Decision Boundaries for Three- Dimensional Case (1/2)

41 Decision Boundaries for Three- Dimensional Case (2/2)

42 Decision Boundaries for Four Normal Distributions

43 Example: Decision Regions for Two-Dimensional Gaussian Data

44 Example: Decision Regions for Two-Dimensional Gaussian Data

45 Bayes Decision Compared with Other Decision Strategies

46 Multicategory Case Probability of being correct Bayes classifier maximizes this probability by choosing the regions so that the integrand is maximal for all x –No other partitioning can yield a smaller probability of error

47 Error Bounds for Normal Densities Full calculation of the error probability is difficult for the Gaussian case –Especially in high dimensions –Discontinuous nature of the decision regions Upper bound on the error can be obtained for two-category case –By approximating the error integral analytically

48 Chernoff Bound

49 Bhattacharyya Bound

50 Chernoff Bound and Bhattacharyya Bound

51 Example: Error Bounds for Gaussian Distribution

52 Example: Error Bounds for Gaussian Distribution Bhattacharyya bound – k(1/2) = – P(error) < Chernoff bound – by numerical searching Error rate by numerical integration – –Impractical for higher dimension

53 Signal Detection Theory Internal signal in the detector x –Has mean  2 when external signal (pulse) is present –Has mean  1 when external signal is not present – p(x|  i ) ~ N(  i,  2 )

54 Signal Detection Theory

55 Four Probabilities Hit: P(x>x*|x in  2 ) False alarm: P(x>x*|x in  1 ) Miss: P(x<x*|x in  2 ) Correct reject: P(x<x*|x in  1 )

56 Receiver Operating Characteristic (ROC)

57 Bayes Decision Theory: Discrete Features

58 Independent Binary Features

59 Discriminant Function

60 Example: Three-Dimensional Binary Data

61 Example: Three-Dimensional Binary Data

62 Illustration of Missing Features

63 Decision with Missing Features

64 Noisy Features

65 Example of Statistical Dependence and Independence

66 Example of Causal Dependence State of an mobile –Temperature of engine –Pressure of brake fluid –Pressure of air in the tires –Voltages in the wires –Oil temperature –Coolant temperature –Speed of the radiator fan

67 Bayesian Belief Nets (Causal Networks)

68 Example: Belief Network for Fish

69 Simple Belief Network 1

70 Simple Belief Network 2

71 Use of Bayes Belief Nets Seek to determine some particular configuration of other variables –Given the values of some of the variables (evidence) Determine values of several query variables ( x ) given the evidence of all other variables ( e )

72 Example

73 Example

74 Naïve Bayes’ Rule (Idiot Bayes’ Rule) When the dependency relationship among the features are unknown, we generally take the simplest assumption –Features are conditionally independent given the category –Often works quite well

75 Applications in Medical Diagnosis Uppermost nodes represent a fundamental biological agent –Such as the presence of a virus or bacteria Intermediate nodes describe disease –Such as flu or emphysema Lowermost nodes describe the symptoms –Such as high temperature or coughing A physician enters measured values into the net and finds the most likely disease or cause

76 Compound Bayesian Decision