BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Naïve-Bayes Classifiers Business Intelligence for Managers.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Data Mining Classification: Alternative Techniques
Ch5 Stochastic Methods Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
What is Statistical Modeling
Data Mining Classification: Naïve Bayes Classifier
ITCS 6265/8265 Project Group 5 Gabriel Njock Tanusree Pai Ke Wang.
Overview Full Bayesian Learning MAP learning
Assuming normally distributed data! Naïve Bayes Classifier.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska Hiroo Kusaba.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Bayesian Classification and Bayesian Networks
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Thanks to Nir Friedman, HU
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Rule Generation [Chapter ]
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Simple Bayesian Classifier
Bayesian Networks. Male brain wiring Female brain wiring.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Text Classification, Active/Interactive learning.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Naive Bayes Classifier
Bayesian Networks Martin Bachler MLA - VO
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40
Computing Science, University of Aberdeen1 Reflections on Bayesian Spam Filtering l Tutorial nr.10 of CS2013 is based on Rosen, 6 th Ed., Chapter 6 & exercises.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Naïve Bayes Classifier. Red = Yellow = Mass = Volume = Apple Sensors, scales, etc… 8/29/03Bayesian Classifier2.
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Bayesian Classification
Classification And Bayesian Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
Classification Today: Basic Problem Decision Trees.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Bayesian Learning. Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Bayesian Classification Using P-tree
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Classification and Prediction
Machine Learning: UNIT-3 CHAPTER-1
Bayesian Classification
Presentation transcript:

BAYESIAN LEARNING

2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability that a given sample belongs to a particular class BAYESIAN LEARNING

3 Bayesian learning algorithms are among the most practical approaches to certain types of learning problems. There results are comparable to the performance of other classifiers, such as decision tree and neural networks in many cases BAYESIAN LEARNING

4 Bayes Theorem Let X be a data sample, e.g. red and round fruit Let H be some hypothesis, such as that X belongs to a specified class C (e.g. X is an apple) For classification problems, we want to determine P(H|X), the probability that the hypothesis H holds given the observed data sample X BAYESIAN LEARNING

5 Prior & Posterior Probability The probability P(H) is called the prior probability of H, i.e the probability that any given data sample is an apple, regardless of how the data sample looks The probability P(H|X) is called posterior probability. It is based on more information, then the prior probability P(H) which is independent of X BAYESIAN LEARNING

6 Bayes Theorem It provides a way of calculating the posterior probability P(H|X) = P(X|H) P(H) P(X) P(X|H) is the posterior probability of X given H (it is the probability that X is red and round given that X is an apple) P(X) is the prior probability of X (probability that a data sample is red and round) BAYESIAN LEARNING

7 Naïve (Simple) Bayesian Classification Studies comparing classification algorithms have found that the simple Bayesian classifier is comparable in performance with decision tree and neural network classifiers It works as follows: 1. Each data sample is represented by an n-dimensional feature vector, X = (x 1, x 2, …, x n ), depicting n measurements made on the sample from n attributes, respectively A 1, A 2, … A n BAYESIAN LEARNING

8 Naïve (Simple) Bayesian Classification 2. Suppose that there are m classes C 1, C 2, … C m. Given an unknown data sample, X (i.e. having no class label), the classifier will predict that X belongs to the class having the highest posterior probability given X Thus if P(C i |X) > P(C j |X) for 1  j  m, j  i then X is assigned to C i This is called Bayes decision rule BAYESIAN LEARNING

9 Naïve (Simple) Bayesian Classification 3. We have P(C i |X) = P(X|C i ) P(C i ) / P(X) As P(X) is constant for all classes, only P(X|C i ) P(C i ) needs to be calculated The class prior probabilities may be estimated by P(C i ) = s i / s where s i is the number of training samples of class C i & s is the total number of training samples If class prior probabilities are equal (or not known and thus assumed to be equal) then we need to calculate only P(X|C i ) BAYESIAN LEARNING

10 Naïve (Simple) Bayesian Classification 4. Given data sets with many attributes, it would be extremely computationally expensive to compute P(X|C i ) For example, assuming the attributes of colour and shape to be Boolean, we need to store 4 probabilities for the category apple P(¬red  ¬round | apple) P(¬red  round | apple) P(red  ¬round | apple) P(red  round | apple) If there are 6 attributes and they are Boolean, then we need to store 2 6 probabilities BAYESIAN LEARNING

11 Naïve (Simple) Bayesian Classification In order to reduce computation, the naïve assumption of class conditional independence is made This presumes that the values of the attributes are conditionally independent of one another, given the class label of the sample (we assume that there are no dependence relationships among the attributes) BAYESIAN LEARNING

12 Naïve (Simple) Bayesian Classification Thus we assume that P(X|C i ) =  n k=1 P(x k |C i ) Example P(colour  shape | apple) = P(colour | apple) P(shape | apple) For 6 Boolean attributes, we would have only 12 probabilities to store instead of 2 6 = 64 Similarly for 6, three valued attributes, we would have 18 probabilities to store instead of 3 6 BAYESIAN LEARNING

13 Naïve (Simple) Bayesian Classification The probabilities P(x 1 |C i ), P(x 2 |C i ), …, P(x n |C i ) can be estimated from the training samples, where For an attribute A k, which can take on the values x 1k, x 2k, … e.g. colour = red, green, … P(x k |C i ) = s ik /s i where s ik is the number of training samples of class C i having the value x k for A k and s i is the number of training samples belonging to C i e.g. P(red|apple) = 7/10if 7 out of 10 apples are red BAYESIAN LEARNING

14 Naïve (Simple) Bayesian Classification Example: BAYESIAN LEARNING

15 Naïve (Simple) Bayesian Classification Example: Let C 1 = class buy computer and C 2 = class not buy computer The unknown sample: X = {age =  30, income = medium, student = yes, credit- rating = fair} The prior probability of each class can be computed as P(buy computer = yes) = 9/14 = P(buy_computer = no) = 5/14 = BAYESIAN LEARNING

16 Naïve (Simple) Bayesian Classification Example: To compute P(X|Ci) we compute the following conditional probabilities BAYESIAN LEARNING

17 Naïve (Simple) Bayesian Classification Example: Using the above probabilities we obtain And hence the naïve Bayesian classifier predicts that the student will buy computer, because BAYESIAN LEARNING

18 Naïve (Simple) Bayesian Classification An Example: Learning to classify text - Instances (training samples) are text documents - Classification labels can be: like-dislike, etc. - The task is to learn from these training examples to predict the class of unseen documents Design issue: - How to represent a text document in terms of attribute values BAYESIAN LEARNING

19 Naïve (Simple) Bayesian Classification One approach: - The attributes are the word positions - Value of an attribute is the word found in that position Note that the number of attributes may be different for each document We calculate the prior probabilities of classes from the training samples Also the probabilities of word in a position is calculated e.g. P(“The” in first position | like document) BAYESIAN LEARNING

20 Naïve (Simple) Bayesian Classification Second approach: The frequency with which a word occurs is counted irrespective of the word’s position Note that here also the number of attributes may be different for each document The probabilities of words are e.g. P(“The” | like document) BAYESIAN LEARNING

21 Naïve (Simple) Bayesian Classification Results An algorithm based on the second approach was applied to the problem of classifying articles of news groups - 20 newsgroups were considered - 1,000 articles of each news group were collected (total 20,000 articles) - The naïve Bayes algorithm was applied using 2/3 rd of these articles as training samples - Testing was done over the remaining 3 rd BAYESIAN LEARNING

22 Naïve (Simple) Bayesian Classification Results - Given 20 news groups, we would expect random guessing to achieve a classification accuracy of 5% - The accuracy achieved by this program was 89% BAYESIAN LEARNING

23 Naïve (Simple) Bayesian Classification Minor Variant The algorithm used only a subset of the words used in the documents most frequent words were removed (these include words such as “the”, and “of”) - Any word occurring fewer than 3 times was also removed BAYESIAN LEARNING

24 Chapter 6 of T. Mitchell Reference BAYESIAN LEARNING