INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 Lecture Slides for.

Slides:

Advertisements

Similar presentations

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

What we will cover here What is a classifier

Naïve Bayes Classifier

Naïve Bayes Classifier

On Discriminative vs. Generative classifiers: Naïve Bayes

Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.

Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)

Fast Algorithms for Association Rule Mining

Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.

Mining Association Rules

Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Bayesian Decision Theory Making Decisions Under uncertainty 1.

Naïve Bayes Classifier Ke Chen Extended by Longin Jan Latecki COMP20411 Machine Learning.

NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.

Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki

DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.

Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated.

1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.

Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

Classification Techniques: Bayesian Classification

Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.

Algorithms for Classification: The Basic Methods.

INTRODUCTION TO Machine Learning 3rd Edition

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

DATA MINING LECTURE 10b Classification k-nearest neighbor classifier

COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Knowledge-based systems Sanaullah Manzoor CS&IT, Lahore Leads University

COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Last lecture summary Naïve Bayes Classifier. Bayes Rule Normalization Constant LikelihoodPrior Posterior Prior and likelihood must be learnt (i.e. estimated.

Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.

Classification: Naïve Bayes Classifier

Data Science Algorithms: The Basic Methods

Naïve Bayes Classifier

CSE543: Machine Learning Lecture 2: August 6, 2014

text processing And naïve bayes

Data Science Algorithms: The Basic Methods

Naïve Bayes Classifier

Naïve Bayes Classifier

Data Mining Lecture 11.

INTRODUCTION TO Machine Learning

Classification Techniques: Bayesian Classification

Machine Learning: Lecture 3

INTRODUCTION TO Machine Learning 3rd Edition

Naïve Bayes Classifier

Generative Models and Naïve Bayes

A discriminant function for 2-class problem can be defined as the ratio of class likelihoods g(x) = p(x|C1)/p(x|C2) Derive formula for g(x) when class.

Artificial Intelligence 9. Perceptron

Generative Models and Naïve Bayes

INTRODUCTION TO Machine Learning

A task of induction to find patterns

A task of induction to find patterns

NAÏVE BAYES CLASSIFICATION

Naïve Bayes Classifier

Naïve Bayes Classifier

Presentation transcript:

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

2 Outline Discriminant Function Learning Association Rules Naïve Bayes Classifier Example: Play Tennis Relevant Issues Conclusions

What is discriminant Function For classification problem, for each class, define a function such that we choose Ci if Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3

4 Discriminant Functions K decision regions R 1,...,R K

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 K=2 Classes Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g 1 (x) – g 2 (x) Log odds:

Problem: Association Rule Mining INPUT A set of transactions Objective: Given a set of transactions D, generate all association rules that have support and confidence greater than the user-specified minimum support and minimum confidence. Minimize computation time by pruning. Constraints: Items should be in lexicographical order Association Rules {Diaper}  {Beer}, {Milk, Bread}  {Eggs, Coke}, {Beer, Bread}  {Milk}, Real World Applications NCR (Terradata) does ARM for more than 20 large retail organizations including Walmart. Used for pattern discovery in biological DBs.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Association Rules Association rule: X  Y Support (X  Y): Confidence (X  Y): Apriori algorithm (Agrawal et al., 1996)

ICDM'06 Panel 8 Apriori Algorithm: Breadth First Search { } a a b a b d a d b b d cd

9 Apriori Algorithm Examples Problem Decomposition If the minimum support is 50%, then {Shoes, Jacket} is the only 2- itemset that satisfies the minimum support. If the minimum confidence is 50%, then the only two rules generated from this 2-itemset, that have confidence greater than 50%, are: Shoes  Jacket Support=50%, Confidence=66% Jacket  Shoes Support=50%, Confidence=100%

10 The Apriori Algorithm — Example Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3 Database D Min support =50%

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 Naive Bayes’ Classifier Given C, x j are independent: p(x|C) = p(x 1 |C) p(x 2 |C)... p(x d |C)

12 Background There are three methods to establish a classifier a) Model a classification rule directly Examples: k-NN, decision trees, perceptron, SVM b) Model the probability of class memberships given input data Example: multi-layered perceptron with the cross-entropy cost c) Make a probabilistic model of data within each class Examples: naive Bayes, model based classifiers a) and b) are examples of discriminative classification c) is an example of generative classification b) and c) are both examples of probabilistic classification

13 Probability Basics Prior, conditional and joint probability –Prior probability: –Conditional probability: –Joint probability: –Relationship: –Independence: Bayesian Rule

14 Probabilistic Classification Establishing a probabilistic model for classification –Discriminative model –Generative model MAP classification rule –MAP: Maximum A Posterior –Assign x to c* if Generative classification with the MAP rule –Apply Bayesian rule to convert:

15 Naïve Bayes Bayes classification Difficulty: learning the joint probability Naïve Bayes classification –Making the assumption that all input attributes are independent –MAP classification rule

16 Naïve Bayes Naïve Bayes Algorithm (for discrete input attributes) –Learning Phase: Given a training set S, Output: conditional probability tables; for elements –Test Phase: Given an unknown instance, Look up tables to assign the label c* to X’ if

17 Example Example: Play Tennis

18 Example Learning Phase OutlookPlay=YesPlay=No Sunny 2/93/5 Overcast 4/90/5 Rain 3/92/5 Temperatur e Play=YesPlay=No Hot 2/92/5 Mild 4/92/5 Cool 3/91/5 HumidityPlay=Ye s Play=N o High 3/94/5 Normal 6/91/5 WindPlay=YesPlay=No Strong 3/93/5 Weak 6/92/5 P(Play=Yes) = 9/14P(Play=No) = 5/14

19 Example Test Phase –Given a new instance, x ’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong) –Look up tables –MAP rule P(Outlook=Sunny|Play=No) = 3/5 P(Temperature=Cool|Play==No) = 1/5 P(Huminity=High|Play=No) = 4/5 P(Wind=Strong|Play=No) = 3/5 P(Play=No) = 5/14 P(Outlook=Sunny|Play=Yes) = 2/9 P(Temperature=Cool|Play=Yes) = 3/9 P(Huminity=High|Play=Yes) = 3/9 P(Wind=Strong|Play=Yes) = 3/9 P(Play=Yes) = 9/14 P(Yes| x ’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = P(No| x ’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = Given the fact P(Yes| x ’) < P(No| x ’), we label x ’ to be “No”.

20 Example Test Phase –Given a new instance, x ’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong) –Look up tables –MAP rule P(Outlook=Sunny|Play=No) = 3/5 P(Temperature=Cool|Play==No) = 1/5 P(Huminity=High|Play=No) = 4/5 P(Wind=Strong|Play=No) = 3/5 P(Play=No) = 5/14 P(Outlook=Sunny|Play=Yes) = 2/9 P(Temperature=Cool|Play=Yes) = 3/9 P(Huminity=High|Play=Yes) = 3/9 P(Wind=Strong|Play=Yes) = 3/9 P(Play=Yes) = 9/14 P(Yes| x ’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = P(No| x ’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = Given the fact P(Yes| x ’) < P(No| x ’), we label x ’ to be “No”.

21 Relevant Issues Violation of Independence Assumption –For many real world tasks, –Nevertheless, naïve Bayes works surprisingly well anyway! Zero conditional probability Problem –If no example contains the attribute value –In this circumstance, during test –For a remedy, conditional probabilities estimated with

22 Relevant Issues Continuous-valued Input Attributes –Numberless values for an attribute –Conditional probability modeled with the normal distribution –Learning Phase: Output: normal distributions and –Test Phase: Calculate conditional probabilities with all the normal distributions Apply the MAP rule to make a decision

Naïve Bayes based on the independence assumption  Training is very easy and fast; just requiring considering each attribute in each class separately  Test is straightforward; just looking up tables or calculating conditional probabilities with normal distributions A popular generative model  Performance competitive to most of state-of-the-art classifiers even in presence of violating independence assumption  Many successful applications, e.g., spam mail filtering  Apart from classification, naïve Bayes can do more… 23 Conclusions