Naive Bayes Classifiers, an Overview By Roozmehr Safi.

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

What is Statistical Modeling

Data Mining Classification: Naïve Bayes Classifier

Rosa Cowan April 29, 2008 Predictive Modeling & The Bayes Classifier.

Assuming normally distributed data! Naïve Bayes Classifier.

Classification and risk prediction

Inferences About Means of Two Independent Samples Chapter 11 Homework: 1, 2, 3, 4, 6, 7.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Crash Course on Machine Learning

1 Naïve Bayes A probabilistic ML algorithm. 2 Axioms of Probability Theory All probabilities between 0 and 1 True proposition has probability 1, false.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.

Principles of Pattern Recognition

Bayesian Networks. Male brain wiring Female brain wiring.

by B. Zadrozny and C. Elkan

Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Naive Bayes Classifier

Chapter 6 Lecture 3 Sections: 6.4 – 6.5.

Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.

1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.

Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.

Chapter 8 – Naïve Bayes DM for Business Intelligence.

Bayesian Learning. Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.

Construction Engineering 221 Probability and statistics Normal Distribution.

Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.

Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Chapter 7. Classification and Prediction

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

Naive Bayes Classifier

CH 5: Multivariate Methods

Classification of unlabeled data:

Lecture 15: Text Classification & Naive Bayes

In-Class Exercise: Discrete Distributions

Mathematical Foundations of BME Reza Shadmehr

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Generally Discriminant Analysis

Mathematical Foundations of BME

LECTURE 23: INFORMATION THEORY REVIEW

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Mathematical Foundations of BME Reza Shadmehr

Naïve Bayes Classifier

Presentation transcript:

Naive Bayes Classifiers, an Overview By Roozmehr Safi

What is Naive Bayes Classifier (NBC)? NBC is a probabilistic classification method. Classification (A.K.A. discrimination, or supervised learning) is assigning new cases to one of a the pre-defined classes given a sample of cases for which the true classes are known. NBC is one of the oldest and simplest classification methods.

Some NBC Applications Credit scoring Marketing applications Employee selection Image processing Speech recognition Search engines…

How does NBC Work? NBC applies Bayes’ theorem with (naive) independence assumptions. A more descriptive term for it would be "independent feature model".

How does NBC work, Cntd. Let X1,…, Xm denote our features (Height, weight, foot size…), Y is the class number (1 for men,2 for women), and C is the number of classes (2). The problem consists of classifying the case (x1,…, xm) to the class c maximizing P(Y=c| X1=x1,…, Xm=xm) over c=1,…, C. Applying Bayes’ rule gives: P(Y=c| X1=x1,…, Xm=xm) = P(X1=x1,…, Xm=xm | Y=c)P(Y=c) / P(X1=x1,…, Xm=xm). Under the NB’s assumption of conditional independence, P(X1=x1,…, Xm=xm | Y=c) is replaced by And NB reduces the original problem to:.

An example: P(Obserevd Height|Male) = a P(Observed Weight|Male) = b P(Observed Foot size|Male) = c P(Male|observed case)≈ P(male) × a × b × C P(Observed Height|Female) = d P(Observed Weight|Female) = e P(Observed Foot size|Female) = f P(Female|observed case)≈ P(Female) × d × e × f * Pick the one that is larger

NBC advantage Despite unrealistic assumption of independence, NBC is remarkably successful even when independence is violated. Due to its simple structure the NBC it is appealing when the set of variables is large. NBC requires a small amount of training data: – It only needs to estimate means and variances of the variables – No need to form the covariance matrix. – Computationally inexpensive.

A Demonstration Data: From an online B2B exchange (1220 cases). Purpose: To distinguish cheaters of good sellers. Predictors: Member type: Enterprise, personal, other Years since joined: 1 to 10 years. No. of months since last membership renewal Membership Renewal duration. Type of service bought: standard, limited edition… If the member has a registered company. If the company page is decorated. Number of days in which member logged in during past 60 days. Industry: production, distribution, investment… Target: to predict if a seller is likely to cheat buyers based on data from old sellers.

Issues involved: Prob. distribution With discrete (categorical) features, estimating the probabilities can be done using frequency counts. With continuous features one can assume a certain form of quantitative probability distribution. There is evidence that discretization of data before applying NB is effective. Equal Frequency Discretization (EFD) divides the sorted values of a continuous variable into k equally populated bins.

Issues involved: Zero probabilities The case when a class and a feature value never occur together in the training set creates a problem, because assigning a probability of zero to one of the terms causes the whole expression to evaluate to zero. The zero probability can be replaced by a small constant, such as 0.5/n where n is the number of observations in the training set.

Issues involved: Missing values In some applications, values are missing not at random and can be meaningful. Therefore, missing values are treated as a separate category. If one does not want to treat missing values as a separate category, they should be handled prior to applying this macro with either a missing value imputation or excluding cases where they are present.

Thank you