Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Lazy vs. Eager Learning Lazy vs. eager learning
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Classification and Decision Boundaries
Data Mining Classification: Naïve Bayes Classifier
Data Mining Classification: Alternative Techniques
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Assuming normally distributed data! Naïve Bayes Classifier.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Chapter 7 – K-Nearest-Neighbor
Lecture Notes for CMPUT 466/551 Nilanjan Ray
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Classification and Regression. Classification and regression  What is classification? What is regression?  Issues regarding classification and regression.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Data Mining Classification: Alternative Techniques
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Learning Bayesian Networks
INSTANCE-BASE LEARNING
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
Classification II (continued) Model Evaluation
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
DATA MINING LECTURE 11 Classification Nearest Neighbor Classification Support Vector Machines Logistic Regression Naïve Bayes Classifier Supervised Learning.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Naive Bayes Classifier
Naïve Bayes Classifier. Bayes Classifier l A probabilistic framework for classification problems l Often appropriate because the world is noisy and also.
CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 8 Alternative Classification Algorithms Lecture slides taken/modified.
Bayesian Learning. Bayes Classifier A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem:
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
Naive Bayes Collaborative Filtering ICS 77B Max Welling.
KNN & Naïve Bayes Hongning Wang
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Data Mining Classification: Alternative Techniques
Classification Nearest Neighbor
Bayesian Classification
Naïve Bayes CSC 600: Data Mining Class 19.
Classification Nearest Neighbor
Data Mining Classification: Alternative Techniques
Instance Based Learning
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Data Mining Classification: Alternative Techniques
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Naïve Bayes CSC 576: Data Science.
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Presentation transcript:

Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier

Eager vs Lazy learners Eager learners: learn the model as soon as the training data becomes available Lazy learners: delay model-building until testing data needs to be classified – Rote classifier: memorizes the entire training data

k-nearest neighbor classifiers k-nearest neighbors of a record x are data points that have the k smallest distance to x

k-nearest neighbor classification Given a data record x find its k closest points – Closeness: Euclidean, Hamming, Jaccard distance Determine the class of x based on the classes in the neighbor list – Majority vote – Weigh the vote according to distance e.g., weight factor, w = 1/d 2 – Probabilistic voting

Characteristics of nearest-neighbor classifiers Instance of instance-based learning No model building (lazy learners) – Lazy learners: computational time in classification – Eager learners: computational time in model building Decision trees try to find global models, k-NN take into account local information K-NN classifiers depend a lot on the choice of proximity measure

Bayes Theorem X, Y random variables Joint probability: Pr(X=x,Y=y) Conditional probability: Pr(Y=y | X=x) Relationship between joint and conditional probability distributions Bayes Theorem:

Bayes Theorem for Classification X: attribute set Y: class variable Y depends on X in a non-determininstic way We can capture this dependence using Pr(Y|X) : Posterior probability vs Pr(Y): Prior probability

Building the Classifier Training phase: – Learning the posterior probabilities Pr(Y|X) for every combination of X and Y based on training data Test phase: – For test record X’, compute the class Y’ that maximizes the posterior probability Pr(Y’|X’)

Bayes Classification: Example X’=(Home Owner=No, Marital Status=Married, AnnualIncome=120K) Compute: Pr(Yes|X’), Pr(No|X’) pick No or Yes with max Prob. How can we compute these probabilities??

Computing posterior probabilities Bayes Theorem P(X) is constant and can be ignored P(Y): estimated from training data; compute the fraction of training records in each class P(X|Y)?

Naïve Bayes Classifier Attribute set X = {X 1,…,X d } consists of d attributes Conditional independence: – X conditionally independent of Y, given X: Pr(X|Y,Z) = Pr(X|Z) – Pr(X,Y|Z) = Pr(X|Z)xPr(Y|Z)

Naïve Bayes Classifier Attribute set X = {X 1,…,X d } consists of d attributes

Conditional probabilities for categorical attributes Categorical attribute X i Pr(Xi = xi|Y=y): fraction of training instances in class y that take value x i on the i-th attribute Pr(homeOwner = yes|No) = 3/7 Pr(MaritalStatus = Single| Yes) = 2/3

Estimating conditional probabilities for continuous attributes? Discretization? How can we discretize?

Naïve Bayes Classifier: Example X’ = (HomeOwner = No, MaritalStatus = Married, Income=120K) Need to compute Pr(Y|X’) or Pr(Y)xPr(X’|Y) But Pr(X’|Y) is – Y = No: Pr(HO=No|No)xPr(MS=Married|No)xPr(Inc=120K|No) = 4/7x4/7x = – Y=Yes: Pr(HO=No|Yes)xPr(MS=Married|Yes)xPr(Inc=120K|Yes) = 1x0x1.2x10 -9 = 0

Naïve Bayes Classifier: Example X’ = (HomeOwner = No, MaritalStatus = Married, Income=120K) Need to compute Pr(Y|X’) or Pr(Y)xPr(X’|Y) But Pr(X’|Y = Yes) is 0? Correction process: n c : number of training examples from class y j that take value x i n: total number of instances from class y j m: equivalent sample size (balance between prior and posterior) p: user-specified parameter (prior probability)

Characteristics of Naïve Bayes Classifier Robust to isolated noise points – noise points are averaged out Handles missing values – Ignoring missing-value examples Robust to irrelevant attributes – If X i is irrelevant, P(X i |Y) becomes almost uniform Correlated attributes degrade the performance of NB classifier