Classification Techniques: Bayesian Classification

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Naïve-Bayes Classifiers Business Intelligence for Managers.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Classification Techniques: Decision Tree Learning
Text Categorization CSC 575 Intelligent Information Retrieval.
What we will cover here What is a classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
What is Statistical Modeling
Data Mining Classification: Naïve Bayes Classifier
Assuming normally distributed data! Naïve Bayes Classifier.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
1 Naïve Bayes A probabilistic ML algorithm. 2 Axioms of Probability Theory All probabilities between 0 and 1 True proposition has probability 1, false.
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Bayesian Networks. Male brain wiring Female brain wiring.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Naive Bayes Classifier
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification and Prediction
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
1 Naïve Bayes Classification CS 6243 Machine Learning Modified from the slides by Dr. Raymond J. Mooney
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Bayesian Classification Using P-tree  Classification –Classification is a process of predicting an – unknown attribute-value in a relation –Given a relation,
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Bayesian Classification
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Classification & Prediction — Continue—. Overfitting in decision trees Small training set, noise, missing values Error rate decreases as training set.
1 Text Categorization CSE Categorization Given: –A description of an instance, x  X, where X is the instance language or instance space. –A fixed.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Bayesian Learning. Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Classification and Prediction Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Naive Bayes Classifier
Intelligent Information Retrieval
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Classification and Prediction
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Naïve Bayes Classifier
Presentation transcript:

Classification Techniques: Bayesian Classification Bamshad Mobasher DePaul University

Classification: 3 Step Process 1. Model construction (Learning): Each record (instance, example) is assumed to belong to a predefined class, as determined by one of the attributes This attribute is called the target attribute The values of the target attribute are the class labels The set of all instances used for learning the model is called training set 2. Model Evaluation (Accuracy): Estimate accuracy rate of the model based on a test set The known labels of test instances are compared with the predicts class from model Test set is independent of training set otherwise over-fitting will occur 3. Model Use (Classification): The model is used to classify unseen instances (i.e., to predict the class labels for new unclassified instances) Predict the value of an actual attribute

Classification Methods Decision Tree Induction Bayesian Classification K-Nearest Neighbor Neural Networks Support Vector Machines Association-Based Classification Genetic Algorithms Many More …. Also Ensemble Methods

Bayesian Learning Bayes’s theorem plays a critical role in probabilistic learning and classification Uses prior probability of each class given no information about an item Classification produces a posterior probability distribution over the possible classes given a description of an item The models are incremental in the sense that each training example can incrementally increase or decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data Given a data instance X with an unknown class label, H is the hypothesis that X belongs to a specific class C The conditional probability of hypothesis H given observation X, Pr(H|X), follows the Bayes’s theorem: Practical difficulty: requires initial knowledge of many probabilities

Basic Concepts In Probability I P(A | B) is the probability of A given B Assumes that B is all and only information known. Defined by: Bayes’s Rule: Direct corollary of above definition Often written in terms of hypothesis and evidence: A B

Basic Concepts In Probability II A and B are independent iff: Therefore, if A and B are independent: These two constraints are logically equivalent

Bayesian Classification Let set of classes be {c1, c2,…cn} Let E be description of an instance (e.g., vector representation) Determine class of E by computing for each class ci P(E) can be determined since classes are complete and disjoint:

Bayesian Categorization (cont.) Need to know: Priors: P(ci) and Conditionals: P(E | ci) P(ci) are easily estimated from data. If ni of the examples in D are in ci,then P(ci) = ni / |D| Assume instance is a conjunction of binary features/attributes:

Naïve Bayesian Classification Problem: Too many possible combinations (exponential in m) to estimate all P(E | ci) If we assume features/attributes of an instance are independent given the class (ci) (conditionally independent) Therefore, we then only need to know P(ej | ci) for each feature and category

Estimating Probabilities Normally, probabilities are estimated based on observed frequencies in the training data. If D contains ni examples in class ci, and nij of these ni examples contains feature/attribute ej, then: If the feature is continuous-valued, P(ej|ci) is usually computed based on Gaussian distribution with a mean μ and standard deviation σ and P(ej|ci) is

Smoothing Estimating probabilities from small training sets is error-prone: If due only to chance, a rare feature, ek, is always false in the training data, ci :P(ek | ci) = 0. If ek then occurs in a test example, E, the result is that ci: P(E | ci) = 0 and ci: P(ci | E) = 0 To account for estimation from small samples, probability estimates are adjusted or smoothed Laplace smoothing using an m-estimate assumes that each feature is given a prior probability, p, that is assumed to have been previously observed in a “virtual” sample of size m. For binary features, p is simply assumed to be 0.5.

Naïve Bayesian Classifier - Example Here, we have two classes C1=“yes” (Positive) and C2=“no” (Negative) Pr(“yes”) = instances with “yes” / all instances = 9/14 If a new instance X had outlook=“sunny”, then Pr(outlook=“sunny” | “yes”) = 2/9 (since there are 9 instances with “yes” (or P) of which 2 have outlook=“sunny”) Similarly, for humidity=“high”, Pr(humidity=“high” | “no”) = 4/5 And so on.

Naïve Bayes (Example Continued) Now, given the training set, we can compute all the probabilities Suppose we have new instance X = <sunny, mild, high, true>. How should it be classified? Similarly: X = < sunny , mild , high , true > Pr(X | “no”) = 3/5 . 2/5 . 4/5 . 3/5 Pr(X | “yes”) = (2/9 . 4/9 . 3/9 . 3/9)

Naïve Bayes (Example Continued) To find out to which class X belongs we need to maximize: Pr(X | Ci).Pr(Ci), for each class Ci (here “yes” and “no”) To convert these to probabilities, we can normalize by dividing each by the sum of the two: Pr(“no” | X) = 0.04 / (0.04 + 0.007) = 0.85 Pr(“yes” | X) = 0.007 / (0.04 + 0.007) = 0.15 Therefore the new instance X will be classified as “no”. X = <sunny, mild, high, true> Pr(X | “no”).Pr(“no”) = (3/5 . 2/5 . 4/5 . 3/5) . 5/14 = 0.04 Pr(X | “yes”).Pr(“yes”) = (2/9 . 4/9 . 3/9 . 3/9) . 9/14 = 0.007

Text Naïve Bayes – Spam Example Training Data P(no) = 0.4 P(yes) = 0.6 New email x containing t1, t4, t5  x = <1, 0, 0, 1, 1> Should it be classified as spam = “yes” or spam = “no”? Need to find P(yes | x) and P(no | x) …

Text Naïve Bayes - Example New email x containing t1, t4, t5 x = <1, 0, 0, 1, 1> P(no) = 0.4 P(yes) = 0.6 P(yes | x) = [4/6 * (1-4/6) * (1-3/6) * 3/6 * 2/6] * P(yes) / P(x) = [0.67 * 0.33 * 0.5 * 0.5 * 0.33] * 0.6 / P(x) = 0.11 / P(x) P(no | x) = [1/4 * (1-2/4) * (1-2/4) * 3/4 * 2/4] * P(no) / P(x) = [0.25 * 0.5 * 0.5 * 0.75 * 0.5] * 0.4 / P(x) = 0.019 / P(x) To get actual probabilities need to normalize: note that P(yes | x) + P(no | x) must be 1 0.11 / P(x) + 0.019 / P(x) = 1  P(x) = 0.11 + 0.019 = 0.129 So: P(yes | x) = 0.11 / 0.129 = 0.853 P(no | x) = 0.019 / 0.129 = 0.147

Classification Techniques: Bayesian Classification Bamshad Mobasher DePaul University