1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia, National Taiwan University
2 Basic Assumptions The decision problem is posed in probabilistic terms All of the relevant probability values are known
3 State of Nature State of nature – A priori probability (prior) – Decision rule to judge just one fish –
4 Class-Conditional Probability Density
5 Bayes Formula
6 Posterior Probabilities
7 Bayes Decision Rule Probability of error Bayes decision rule
8 Bayes Decision Theory (1/3) CategoriesActions Loss functions Feature vector
9 Bayes Decision Theory (2/3) Bayes formula Conditional risk
10 Bayes Decision Theory (3/3) Decision function assumes one of the values Overall risk Bayes decision rule: compute the conditional risk then select the action for which is minimum then select the action for which is minimum
11 Two-Category Classification Conditional risk Decision rule: decide 1 if Likelihood ratio
12 Minimum-Error-Rate Classification If action is taken and the true state is, then the decision is correct if and in error if Error rate (the probability of error) is to be minimized Symmetrical or zero-one loss function Conditional risk
13 Minimum-Error-Rate Classification
14 Mini-max Criterion To perform well over a range of prior probability Minimize the maximum possible overall risk –So that the worst risk for any value of the priors is as small as possible
15 Mini-maximizing Risk
16 Searching for Mini-max Boundary
17 Neyman-Pearson Criterion Minimize the overall risk subject to a constraint Example –Minimize the total risk subject to
18 Discriminant Functions A classifier assigns to class if where are called discriminant functions where are called discriminant functions A discriminant function for a Bayes classifier Two discriminant functions for minimum- error-rate classification
19 Discriminant Functions
20 Two-Dimensional Two-Category Classifier
21 Dichotomizers Place a pattern in one of only two categories –cf. Polychotomizers More common to define a single duscriminant function Some particular forms
22 Univariate Normal PDF
23 Distribution with Maximum Entropy and Central Limit Theorem Entropy for discrete distribution Entropy for continuous distribution Central limit theorem –Aggregate effect of the sum of a large number of small, independent random disturbances, will lead to a Gaussian distrubution
24 Multivariate Normal PDF : d-component mean vector : d-component mean vector : d-by-d : d-by-d covariance matrix covariance matrix
25 Linear Combination of Gaussian Random Variables
26 Whitening Transform : matrix whose columns are the orthonormal eigenvectors of : matrix whose columns are the orthonormal eigenvectors of : diagonal matrix of the corresponding eigenvalues : diagonal matrix of the corresponding eigenvalues Whitening transform
27 Bivariate Gaussian PDF
28 Mahalanobis Distance Squared Mahalanobus distance Volume of the Hyperellipsoids of constant Mahalanobis distance r
29 Discriminant Functions for Normal Density
30 Case 1: i = 2 I
31 Decision Boundaries
32 Decision Boundaries when P( i )=P( j )
33 Decision Boundaries when P( i ) and P( j ) are unequal
34 Case 2: i =
35 Decision Boundaries
36 Decision Boundaries
37 Case 3: i = arbitrary
38 Decision Boundaries for One- Dimensional Case
39 Decision Boundaries for Two- Dimensional Case
40 Decision Boundaries for Three- Dimensional Case (1/2)
41 Decision Boundaries for Three- Dimensional Case (2/2)
42 Decision Boundaries for Four Normal Distributions
43 Example: Decision Regions for Two-Dimensional Gaussian Data
44 Example: Decision Regions for Two-Dimensional Gaussian Data
45 Bayes Decision Compared with Other Decision Strategies
46 Multicategory Case Probability of being correct Bayes classifier maximizes this probability by choosing the regions so that the integrand is maximal for all x –No other partitioning can yield a smaller probability of error
47 Error Bounds for Normal Densities Full calculation of the error probability is difficult for the Gaussian case –Especially in high dimensions –Discontinuous nature of the decision regions Upper bound on the error can be obtained for two-category case –By approximating the error integral analytically
48 Chernoff Bound
49 Bhattacharyya Bound
50 Chernoff Bound and Bhattacharyya Bound
51 Example: Error Bounds for Gaussian Distribution
52 Example: Error Bounds for Gaussian Distribution Bhattacharyya bound – k(1/2) = – P(error) < Chernoff bound – by numerical searching Error rate by numerical integration – –Impractical for higher dimension
53 Signal Detection Theory Internal signal in the detector x –Has mean 2 when external signal (pulse) is present –Has mean 1 when external signal is not present – p(x| i ) ~ N( i, 2 )
54 Signal Detection Theory
55 Four Probabilities Hit: P(x>x*|x in 2 ) False alarm: P(x>x*|x in 1 ) Miss: P(x<x*|x in 2 ) Correct reject: P(x<x*|x in 1 )
56 Receiver Operating Characteristic (ROC)
57 Bayes Decision Theory: Discrete Features
58 Independent Binary Features
59 Discriminant Function
60 Example: Three-Dimensional Binary Data
61 Example: Three-Dimensional Binary Data
62 Illustration of Missing Features
63 Decision with Missing Features
64 Noisy Features
65 Example of Statistical Dependence and Independence
66 Example of Causal Dependence State of an mobile –Temperature of engine –Pressure of brake fluid –Pressure of air in the tires –Voltages in the wires –Oil temperature –Coolant temperature –Speed of the radiator fan
67 Bayesian Belief Nets (Causal Networks)
68 Example: Belief Network for Fish
69 Simple Belief Network 1
70 Simple Belief Network 2
71 Use of Bayes Belief Nets Seek to determine some particular configuration of other variables –Given the values of some of the variables (evidence) Determine values of several query variables ( x ) given the evidence of all other variables ( e )
72 Example
73 Example
74 Naïve Bayes’ Rule (Idiot Bayes’ Rule) When the dependency relationship among the features are unknown, we generally take the simplest assumption –Features are conditionally independent given the category –Often works quite well
75 Applications in Medical Diagnosis Uppermost nodes represent a fundamental biological agent –Such as the presence of a virus or bacteria Intermediate nodes describe disease –Such as flu or emphysema Lowermost nodes describe the symptoms –Such as high temperature or coughing A physician enters measured values into the net and finds the most likely disease or cause
76 Compound Bayesian Decision