Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

Similar presentations


Presentation on theme: "Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities."— Presentation transcript:

1 Bayesian Classification

2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities It is based on the famous Bayes theorem. A simple Bayesian classifier, naïve Bayesian classifier, has comparable performance with decision tree and selected neural network classifiers Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

3 Let X be the data record (case) whose class label is unknown. Let H be some hypothesis, such as "data record X belongs to a specified class C.“ For classification, we want to determine P (H|X) – the probability that the hypothesis H holds, given the observed data record X. P (H|X) is the posterior probability of H conditioned on X. For example, the probability that a fruit is an apple, given the condition that it is red and round. In contrast, P(H) is the prior probability, or a priori probability, of H. In this example P(H) is the probability that any given data record is an apple, regardless of how the data record looks. The posterior probability, P (H|X), is based on more information (such as background knowledge) than the prior probability, P(H), which is independent of X. Bayes Theorem

4 Bayesian Classification: Simple introduction… Similarly, P (X|H) is posterior probability of X conditioned on H. That is, it is the probability that X is red and round given that we know that it is true that X is an apple. P(X) is the prior probability of X, i.e., it is the probability that a data record from our set of fruits is red and round. Bayes theorem is useful in that it provides a way of calculating the posterior probability, P(H|X), from P(H), P(X), and P(X|H). Bayes theorem is P (H|X) = P(X|H) P(H) / P(X)

5 Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:

6 Example of Bayes Theorem Given: A doctor knows that meningitis causes stiff neck 50% of the time Prior probability of any patient having meningitis is 1/50,000 Prior probability of any patient having stiff neck is 1/20 If a patient has stiff neck, what’s the probability he/she has meningitis?

7 Bayesian Classifiers Consider each attribute and class label as random variables Given a record with attributes (A 1, A 2,…,A n ) Goal is to predict class C Specifically, we want to find the value of C that maximizes P(C| A 1, A 2,…,A n ) Can we estimate P(C| A 1, A 2,…,A n ) directly from data?

8 Bayesian Classifiers Approach: compute the posterior probability P(C | A 1, A 2, …, A n ) for all values of C using the Bayes theorem Choose value of C that maximizes P(C | A 1, A 2, …, A n ) Equivalent to choosing value of C that maximizes P(A 1, A 2, …, A n |C) P(C) How to estimate P(A 1, A 2, …, A n | C ) ?

9 Naïve Bayes Classifier Assume independence among attributes A i when class is given: P(A 1, A 2, …, A n |C) = P(A 1 | C j ) P(A 2 | C j )… P(A n | C j ) Can estimate P(A i | C j ) for all A i and C j. New point is classified to C j if P(C j )  P(A i | C j ) is maximal.

10 How to Estimate Probabilities from Data? For continuous attributes: Discretize the range into bins one ordinal attribute per bin violates independence assumption Two-way split: (A v) choose only one of the two splits as new attribute Probability density estimation: Assume attribute follows a normal distribution Use data to estimate parameters of distribution (e.g., mean and standard deviation) Once probability distribution is known, can use it to estimate the conditional probability P(A i |c) k

11 Naïve Bayes Classifier: Example1 A: attributes M: mammals N: non-mammals P(A|M)P(M) > P(A|N)P(N) => Mammals

12 Naïve Bayes (Summary ) Robust to isolated noise points Handle missing values by ignoring the instance during probability estimate calculations Robust to irrelevant attributes Independence assumption may not hold for some attributes It makes computation possible It yields optimal classifiers when satisfied But this is seldom satisfied in practice, as attributes (variables) are often correlated.


Download ppt "Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities."

Similar presentations


Ads by Google