Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.

Slides:



Advertisements
Similar presentations
Data Mining: Metode dan Algoritma
Advertisements

What we will cover here What is a classifier
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Naïve Bayes Classifier
Naïve Bayes Classifier
On Discriminative vs. Generative classifiers: Naïve Bayes
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Consider data instances about patients: Can we certainly derive the diagnostic rule:
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Data Mining with Naïve Bayesian Methods
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
Handling Uncertainty. Uncertain knowledge Typical example: Diagnosis. Can we certainly derive the diagnostic rule: if Toothache=true then Cavity=true.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Algorithms for Classification: The Basic Methods.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Naïve Bayes Classifier Ke Chen Extended by Longin Jan Latecki COMP20411 Machine Learning.
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Bayes February 17, 2009.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
Classification Techniques: Bayesian Classification
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
1 Naïve Bayes Classification CS 6243 Machine Learning Modified from the slides by Dr. Raymond J. Mooney
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.1: Decision Trees Rodney Nielsen Many /
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Credibility: Evaluating What’s Been Learned Predicting.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Mining Practical Machine Learning Tools and Techniques.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 4 of Data Mining by I. H. Witten and E. Frank.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Algorithms for Classification:
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Naïve Bayes Classifier
Decision Tree Saed Sayad 9/21/2018.
Classification Techniques: Bayesian Classification
Bayesian Classification
Machine Learning Techniques for Data Mining
Naïve Bayes Classifier
Data Mining CSCI 307, Spring 2019 Lecture 6
Naïve Bayes Classifier
Presentation transcript:

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Rodney Nielsen, Human Intelligence & Language Technologies Lab Algorithms: The Basic Methods ● Inferring rudimentary rules ● Naïve Bayes, probabilistic model ● Constructing decision trees ● Constructing rules ● Association rule learning ● Linear models ● Instance-based learning ● Clustering

Rodney Nielsen, Human Intelligence & Language Technologies Lab Probabilistic Modeling ● “Opposite” of 1R: use all the attributes ● Two assumptions: Attributes are  Equally important  Statistically independent (given the class value) ● I.e., knowing the value of one attribute says nothing about the value of another (if the class is known) ● Independence assumption is never correct! ● But … this scheme works well in practice

Rodney Nielsen, Human Intelligence & Language Technologies Lab Probabilities for Weather Data 5/ 14 5 No 9/ 14 9 Yes Play 3/5 2/5 3 2 No 3/9 6/9 3 6 Yes True False True False Windy 1/5 4/5 1 4 NoYesNoYesNoYes 6/9 3/9 6 3 Normal High Normal High Humidity 1/5 2/ /9 4/9 2/ Cool2/53/9Rainy Mild Hot Cool Mild Hot Temperature 0/54/9Overcast 3/52/9Sunny 23Rainy 04Overcast 32Sunny Outlook NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTempOutlook

Rodney Nielsen, Human Intelligence & Language Technologies Lab Probabilities for Weather Data ● A new day: ?TrueHighCoolSunny PlayWindyHumidityTemp.Outlook Likelihood of the two classes For “yes” = 2/9  3/9  3/9  3/9  9/14 = For “no” = 3/5  1/5  4/5  3/5  5/14 = Conversion into a probability by normalization: P(“yes”) = / ( ) = P(“no”) = / ( ) = 0.795

Rodney Nielsen, Human Intelligence & Language Technologies Lab Bayes’ Rule ● Probability of event H given evidence E: ● A priori probability of h : ● Probability of event before evidence is seen ● A posteriori probability of h : ● Probability of event after evidence is seen Thomas Bayes Born:1702 in London, England Died:1761 in Tunbridge Wells, Kent, England

Rodney Nielsen, Human Intelligence & Language Technologies Lab Bayes’ Rule ● Probability of event h given evidence e :

Rodney Nielsen, Human Intelligence & Language Technologies Lab Bayes’ Rule ● Probability of event h given evidence e : ● Independence assumption:

Rodney Nielsen, Human Intelligence & Language Technologies Lab Naïve Bayes for Classification ● Classification learning: what’s the probability of the class given an instance?  Evidence e = instance  Hypothesis h = class value for instance ● Naïve assumption: evidence splits into parts (i.e. attributes) that are independent

Rodney Nielsen, Human Intelligence & Language Technologies Lab Weather Data Example ?TrueHighCoolSunny PlayWindyHumidityTemp.Outlook Evidence E Probability of class “yes”

Rodney Nielsen, Human Intelligence & Language Technologies Lab The “Zero-frequency Problem” ● What if an attribute value doesn’t occur with every class value? (e.g. “Humidity = high” for class “yes”)  Probability will be zero!  A posteriori probability will also be zero! (No matter how likely the other values are!) ● Remedy: add 1 to the count for every attribute value-class combination (Laplace estimator) ● Result: probabilities will never be zero! (also: stabilizes probability estimates)

Rodney Nielsen, Human Intelligence & Language Technologies Lab Modified Probability Estimates ● In some cases adding a constant different from 1 might be more appropriate ● Example: attribute outlook for class yes Sunny OvercastRainy

Rodney Nielsen, Human Intelligence & Language Technologies Lab Missing Values ● Training: instance is not included in frequency count for attribute value-class combination ● Classification: attribute will be omitted from calculation ● Example: ?TrueHighCool? PlayWindyHumidityTemp.Outlook Likelihood of “yes” = 3/9  3/9  3/9  9/14 = Likelihood of “no” = 1/5  4/5  3/5  5/14 = P(“yes”) = / ( ) = 41% P(“no”) = / ( ) = 59%

Rodney Nielsen, Human Intelligence & Language Technologies Lab Numeric Attributes ● Usual assumption: attributes have a normal or Gaussian probability distribution (given the class) ● The probability density function for the normal distribution is defined by two parameters: ● Mean ● Standard deviation

Rodney Nielsen, Human Intelligence & Language Technologies Lab Statistics for Weather Data ● Example density value:

Rodney Nielsen, Human Intelligence & Language Technologies Lab Classifying a New Day ● A new day: ● Missing values during training are not included in calculation of mean and standard deviation Likelihood of “yes” = 2/9    3/9  9/14 = Likelihood of “no” = 3/5    3/5  5/14 = P(“yes”) = / ( ) = 25% P(“no”) = / ( ) = 75% ?true9066Sunny PlayWindyHumidityTemp.Outlook

Rodney Nielsen, Human Intelligence & Language Technologies Lab Naïve Bayes: Discussion ● Naïve Bayes works surprisingly well (even if independence assumption is clearly violated) ● Why? Because classification doesn’t require accurate probability estimates as long as maximum probability is assigned to correct class ● However: adding too many redundant attributes will cause problems (e.g. identical attributes) ● Note also: many numeric attributes are not normally distributed