Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Data Mining Classification: Naïve Bayes Classifier
Data Mining Classification: Alternative Techniques
Overview Full Bayesian Learning MAP learning
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian Classification and Bayesian Networks
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Thanks to Nir Friedman, HU
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
1 Naïve Bayes A probabilistic ML algorithm. 2 Axioms of Probability Theory All probabilities between 0 and 1 True proposition has probability 1, false.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Simple Bayesian Classifier
Bayesian Networks. Male brain wiring Female brain wiring.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
1 Lecture Notes 16: Bayes’ Theorem and Data Mining Zhangxi Lin ISQS 6347.
Naive Bayes Classifier
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Classifier. 2 Review: Decision Tree Age? Student? Credit? fair excellent >40 31…40
1 A Bayesian statistical method for particle identification in shower counters IX International Workshop on Advanced Computing and Analysis Techniques.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
Uncertainty in Expert Systems
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Bayesian Classification
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Classification Today: Basic Problem Decision Trees.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 8 Alternative Classification Algorithms Lecture slides taken/modified.
Bayesian Learning. Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Naive Bayes Classifier
Data Mining Classification: Alternative Techniques
Bayesian Classification
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Naïve Bayes CSC 600: Data Mining Class 19.
Bayesian Classification
Data Mining Classification: Alternative Techniques
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
LECTURE 07: BAYESIAN ESTIMATION
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Naïve Bayes CSC 576: Data Science.
Bayesian Classification
Presentation transcript:

Bayesian Classification

Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities It is based on the famous Bayes theorem. A simple Bayesian classifier, naïve Bayesian classifier, has comparable performance with decision tree and selected neural network classifiers Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured

Let X be the data record (case) whose class label is unknown. Let H be some hypothesis, such as "data record X belongs to a specified class C.“ For classification, we want to determine P (H|X) – the probability that the hypothesis H holds, given the observed data record X. P (H|X) is the posterior probability of H conditioned on X. For example, the probability that a fruit is an apple, given the condition that it is red and round. In contrast, P(H) is the prior probability, or a priori probability, of H. In this example P(H) is the probability that any given data record is an apple, regardless of how the data record looks. The posterior probability, P (H|X), is based on more information (such as background knowledge) than the prior probability, P(H), which is independent of X. Bayes Theorem

Bayesian Classification: Simple introduction… Similarly, P (X|H) is posterior probability of X conditioned on H. That is, it is the probability that X is red and round given that we know that it is true that X is an apple. P(X) is the prior probability of X, i.e., it is the probability that a data record from our set of fruits is red and round. Bayes theorem is useful in that it provides a way of calculating the posterior probability, P(H|X), from P(H), P(X), and P(X|H). Bayes theorem is P (H|X) = P(X|H) P(H) / P(X)

Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:

Example of Bayes Theorem Given: A doctor knows that meningitis causes stiff neck 50% of the time Prior probability of any patient having meningitis is 1/50,000 Prior probability of any patient having stiff neck is 1/20 If a patient has stiff neck, what’s the probability he/she has meningitis?

Bayesian Classifiers Consider each attribute and class label as random variables Given a record with attributes (A 1, A 2,…,A n ) Goal is to predict class C Specifically, we want to find the value of C that maximizes P(C| A 1, A 2,…,A n ) Can we estimate P(C| A 1, A 2,…,A n ) directly from data?

Bayesian Classifiers Approach: compute the posterior probability P(C | A 1, A 2, …, A n ) for all values of C using the Bayes theorem Choose value of C that maximizes P(C | A 1, A 2, …, A n ) Equivalent to choosing value of C that maximizes P(A 1, A 2, …, A n |C) P(C) How to estimate P(A 1, A 2, …, A n | C ) ?

Naïve Bayes Classifier Assume independence among attributes A i when class is given: P(A 1, A 2, …, A n |C) = P(A 1 | C j ) P(A 2 | C j )… P(A n | C j ) Can estimate P(A i | C j ) for all A i and C j. New point is classified to C j if P(C j )  P(A i | C j ) is maximal.

How to Estimate Probabilities from Data? For continuous attributes: Discretize the range into bins one ordinal attribute per bin violates independence assumption Two-way split: (A v) choose only one of the two splits as new attribute Probability density estimation: Assume attribute follows a normal distribution Use data to estimate parameters of distribution (e.g., mean and standard deviation) Once probability distribution is known, can use it to estimate the conditional probability P(A i |c) k

Naïve Bayes Classifier: Example1 A: attributes M: mammals N: non-mammals P(A|M)P(M) > P(A|N)P(N) => Mammals

Naïve Bayes (Summary ) Robust to isolated noise points Handle missing values by ignoring the instance during probability estimate calculations Robust to irrelevant attributes Independence assumption may not hold for some attributes It makes computation possible It yields optimal classifiers when satisfied But this is seldom satisfied in practice, as attributes (variables) are often correlated.