Presentation is loading. Please wait.

Presentation is loading. Please wait.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Similar presentations


Presentation on theme: "BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability."— Presentation transcript:

1 BAYESIAN LEARNING

2 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability that a given sample belongs to a particular class BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

3 3 Bayesian learning algorithms are among the most practical approaches to certain types of learning problems. There results are comparable to the performance of other classifiers, such as decision tree and neural networks in many cases BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

4 4 Bayes Theorem Let X be a data sample, e.g. red and round fruit Let H be some hypothesis, such as that X belongs to a specified class C (e.g. X is an apple) For classification problems, we want to determine P(H|X), the probability that the hypothesis H holds given the observed data sample X BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

5 5 Prior & Posterior Probability The probability P(H) is called the prior probability of H, i.e the probability that any given data sample is an apple, regardless of how the data sample looks The probability P(H|X) is called posterior probability. It is based on more information, then the prior probability P(H) which is independent of X BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

6 6 Bayes Theorem It provides a way of calculating the posterior probability P(H|X) = P(X|H) P(H) P(X) P(X|H) is the posterior probability of X given H (it is the probability that X is red and round given that X is an apple) P(X) is the prior probability of X (probability that a data sample is red and round) BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

7 7 Naïve (Simple) Bayesian Classification Studies comparing classification algorithms have found that the simple Bayesian classifier is comparable in performance with decision tree and neural network classifiers It works as follows: 1. Each data sample is represented by an n-dimensional feature vector, X = (x 1, x 2, …, x n ), depicting n measurements made on the sample from n attributes, respectively A 1, A 2, … A n BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

8 8 Naïve (Simple) Bayesian Classification 2. Suppose that there are m classes C 1, C 2, … C m. Given an unknown data sample, X (i.e. having no class label), the classifier will predict that X belongs to the class having the highest posterior probability given X Thus if P(C i |X) > P(C j |X) for 1  j  m, j  i then X is assigned to C i This is called Bayes decision rule BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

9 9 Naïve (Simple) Bayesian Classification 3. We have P(C i |X) = P(X|C i ) P(C i ) / P(X) As P(X) is constant for all classes, only P(X|C i ) P(C i ) needs to be calculated The class prior probabilities may be estimated by P(C i ) = s i / s where s i is the number of training samples of class C i & s is the total number of training samples If class prior probabilities are equal (or not known and thus assumed to be equal) then we need to calculate only P(X|C i ) BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

10 10 Naïve (Simple) Bayesian Classification 4. Given data sets with many attributes, it would be extremely computationally expensive to compute P(X|C i ) For example, assuming the attributes of colour and shape to be Boolean, we need to store 4 probabilities for the category apple P(¬red  ¬round | apple) P(¬red  round | apple) P(red  ¬round | apple) P(red  round | apple) If there are 6 attributes and they are Boolean, then we need to store 2 6 probabilities BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

11 11 Naïve (Simple) Bayesian Classification In order to reduce computation, the naïve assumption of class conditional independence is made This presumes that the values of the attributes are conditionally independent of one another, given the class label of the sample (we assume that there are no dependence relationships among the attributes) BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

12 12 Naïve (Simple) Bayesian Classification Thus we assume that P(X|C i ) =  n k=1 P(x k |C i ) Example P(colour  shape | apple) = P(colour | apple) P(shape | apple) For 6 Boolean attributes, we would have only 12 probabilities to store instead of 2 6 = 64 Similarly for 6, three valued attributes, we would have 18 probabilities to store instead of 3 6 BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

13 13 Naïve (Simple) Bayesian Classification The probabilities P(x 1 |C i ), P(x 2 |C i ), …, P(x n |C i ) can be estimated from the training samples, where For an attribute A k, which can take on the values x 1k, x 2k, … e.g. colour = red, green, … P(x k |C i ) = s ik /s i where s ik is the number of training samples of class C i having the value x k for A k and s i is the number of training samples belonging to C i e.g. P(red|apple) = 7/10if 7 out of 10 apples are red BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

14 14 Naïve (Simple) Bayesian Classification Example: BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

15 15 Naïve (Simple) Bayesian Classification Example: Let C 1 = class buy computer and C 2 = class not buy computer The unknown sample: X = {age =  30, income = medium, student = yes, credit- rating = fair} The prior probability of each class can be computed as P(buy computer = yes) = 9/14 = 0.643 P(buy_computer = no) = 5/14 = 0.357 BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

16 16 Naïve (Simple) Bayesian Classification Example: To compute P(X|Ci) we compute the following conditional probabilities BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

17 17 Naïve (Simple) Bayesian Classification Example: Using the above probabilities we obtain And hence the naïve Bayesian classifier predicts that the student will buy computer, because BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

18 18 Naïve (Simple) Bayesian Classification An Example: Learning to classify text - Instances (training samples) are text documents - Classification labels can be: like-dislike, etc. - The task is to learn from these training examples to predict the class of unseen documents Design issue: - How to represent a text document in terms of attribute values BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

19 19 Naïve (Simple) Bayesian Classification One approach: - The attributes are the word positions - Value of an attribute is the word found in that position Note that the number of attributes may be different for each document We calculate the prior probabilities of classes from the training samples Also the probabilities of word in a position is calculated e.g. P(“The” in first position | like document) BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

20 20 Naïve (Simple) Bayesian Classification Second approach: The frequency with which a word occurs is counted irrespective of the word’s position Note that here also the number of attributes may be different for each document The probabilities of words are e.g. P(“The” | like document) BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

21 21 Naïve (Simple) Bayesian Classification Results An algorithm based on the second approach was applied to the problem of classifying articles of news groups - 20 newsgroups were considered - 1,000 articles of each news group were collected (total 20,000 articles) - The naïve Bayes algorithm was applied using 2/3 rd of these articles as training samples - Testing was done over the remaining 3 rd BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

22 22 Naïve (Simple) Bayesian Classification Results - Given 20 news groups, we would expect random guessing to achieve a classification accuracy of 5% - The accuracy achieved by this program was 89% BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

23 23 Naïve (Simple) Bayesian Classification Minor Variant The algorithm used only a subset of the words used in the documents - 100 most frequent words were removed (these include words such as “the”, and “of”) - Any word occurring fewer than 3 times was also removed BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml

24 24 Chapter 6 of T. Mitchell Reference BAYESIAN LEARNING www.CyberEdgeSolz.com/ex/study/ml


Download ppt "BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability."

Similar presentations


Ads by Google