ANGUILLA AUSTRALIA St. Helena & Dependencies South Georgia & South Sandwich Islands U.K. Serbia & Montenegro (Yugoslavia) FRANCENIGER INDIA IRELAND BRAZIL.

Slides:



Advertisements
Similar presentations
AE1APS Algorithmic Problem Solving John Drake
Advertisements

Sampling Distributions
1 Normal Probability Distributions. 2 Review relative frequency histogram 1/10 2/10 4/10 2/10 1/10 Values of a variable, say test scores In.
Using the Rule Normal Quantile Plots
Theoretical Probability Distributions We have talked about the idea of frequency distributions as a way to see what is happening with our data. We have.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
What is Clustering? Also called unsupervised learning, sometimes called classification by statisticians and sorting by psychologists and segmentation by.
Naïve Bayes Classifier
The Major Data Mining Tasks Classification Clustering Associations Most of the other tasks (for example, outlier discovery or anomaly detection ) make.
Sampling Distributions
Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you.
Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you.
Decision Tree Classifier
Need to know in order to do the normal dist problems How to calculate Z How to read a probability from the table, knowing Z **** how to convert table values.
Inferential Statistics
Theses slides are based on the slides by
Advanced Multimedia Text Classification Tamara Berg.
CISC 4631 Data Mining Lecture 06: Bayes Theorem Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)
Intermediate Statistical Analysis Professor K. Leppel.
Basic Statistics Standard Scores and the Normal Distribution.
Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.
Rates © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math miles per hour.
Bayesian Networks. Male brain wiring Female brain wiring.
Slides at Eamonn Keogh
§ 4.3 Equations and Inequalities Involving Absolute Value.
Clustering Slides by Eamonn Keogh.
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.
Primes and Tests for Divisibility Chapter 5 Section 1 By: Tiffany Fey.
Naive Bayes Classifier
Overfitting Overfitting occurs when a statistical model describes random error or noise instead of the underlying relationship. Overfitting generally occurs.
Bayesian Networks Martin Bachler MLA - VO
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Classification Techniques: Bayesian Classification
Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.
Confidence Interval Estimation For statistical inference in decision making:
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
 In this packet we will look at:  The meaning of acceleration  How acceleration is related to velocity and time  2 distinct types acceleration  A.
Multiplication of Common Fractions © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math 1 3 ×1 3 Applying.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Classification We have seen 2 classification techniques: Simple linear classifier, Nearest neighbor,. Let us see two more techniques: Decision tree, Naïve.
Initial slides by Eamonn Keogh Clustering. Organizing data into classes such that there is high intra-class similarity low inter-class similarity Finding.
Machine Learning in Practice Lecture 6 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Lets talk some more about features.. (Western Pipistrelle (Parastrellus hesperus) Photo by Michael Durham.
STAT03 - Descriptive statistics (cont.) - variability 1 Descriptive statistics (cont.) - variability Lecturer: Smilen Dimitrov Applied statistics for testing.
Psy B07 Chapter 3Slide 1 THE NORMAL DISTRIBUTION.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
Machine Learning Machine learning explores the study and construction of algorithms that can learn from data. Basic Idea: Instead of trying to create a.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
CHAPTER 11 Mean and Standard Deviation. BOX AND WHISKER PLOTS  Worksheet on Interpreting and making a box and whisker plot in the calculator.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
SECTION 1 TEST OF A SINGLE PROPORTION
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Data Mining Introduction to Classification using Linear Classifiers
Web-Mining Agents Clustering
2- Syllable Classification
The Classification Problem
The next 6 pages are mini reviews of your ML classification skills
How Many Ways Can 945 Be Written as the Difference of Squares?
*Currently, self driving cars do a bit of both.
*Currently, self driving cars do a bit of both.
Lecture Slides Elementary Statistics Thirteenth Edition
Data Mining: naïve Bayes
Mathematical Foundations of BME
Presentation transcript:

ANGUILLA AUSTRALIA St. Helena & Dependencies South Georgia & South Sandwich Islands U.K. Serbia & Montenegro (Yugoslavia) FRANCENIGER INDIA IRELAND BRAZIL A Gentle Introduction to Machine Learning and Data Mining for the Database Community Dr Eamonn Keogh University of California - Riverside

Grasshoppers Katydids The Classification Problem (informal definition) Given a collection of annotated data. In this case 5 instances Katydids of and five of Grasshoppers, decide what type of insect the unlabeled example is. Katydid or Grasshopper?

Thorax Length AbdomenLength AntennaeLength MandibleSize Spiracle Diameter Leg Length For any domain of interest, we can measure features

Insect ID AbdomenLengthAntennaeLength Insect Class Grasshopper Katydid Grasshopper Grasshopper Katydid Grasshopper Katydid Grasshopper Katydid Katydids ??????? We can store features in a database. My_Collection The classification problem can now be expressed as: Given a training database (My_Collection), predict the class label of a previously unseen instance Given a training database (My_Collection), predict the class label of a previously unseen instance previously unseen instance =

Antenna Length Grasshoppers Katydids Abdomen Length

We will return to the previous slide in two minutes. In the meantime, we are going to play a quick game. I am going to show you some classification problems which were shown to pigeons! Let us see if you are as smart as a pigeon! We will return to the previous slide in two minutes. In the meantime, we are going to play a quick game. I am going to show you some classification problems which were shown to pigeons! Let us see if you are as smart as a pigeon!

Examples of class A Examples of class B Pigeon Problem 1

Examples of class A Examples of class B What class is this object? What about this one, A or B? Pigeon Problem 1

Examples of class A Examples of class B This is a B! Pigeon Problem 1 Here is the rule. If the left bar is smaller than the right bar, it is an A, otherwise it is a B. Here is the rule. If the left bar is smaller than the right bar, it is an A, otherwise it is a B.

Examples of class A Examples of class B Even I know this one Pigeon Problem 2 Oh! This ones hard!

Examples of class A Examples of class B Pigeon Problem 2 So this one is an A. The rule is as follows, if the two bars are equal sizes, it is an A. Otherwise it is a B.

Examples of class A Examples of class B Pigeon Problem 3 This one is really hard! What is this, A or B? This one is really hard! What is this, A or B?

Examples of class A Examples of class B Pigeon Problem 3 It is a B! The rule is as follows, if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B.

Why did we spend so much time with this game? Because we wanted to show that almost all classification problems have a geometric interpretation, check out the next 3 slides…

Examples of class A Examples of class B Pigeon Problem 1 Here is the rule again. If the left bar is smaller than the right bar, it is an A, otherwise it is a B. Here is the rule again. If the left bar is smaller than the right bar, it is an A, otherwise it is a B. Left Bar Right Bar

Examples of class A Examples of class B Pigeon Problem 2 Left Bar Right Bar Let me look it up… here it is.. the rule is, if the two bars are equal sizes, it is an A. Otherwise it is a B.

Examples of class A Examples of class B Pigeon Problem 3 Left Bar Right Bar The rule again: if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B. The rule again: if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B.

Antenna Length Grasshoppers Katydids Abdomen Length

Antenna Length Abdomen Length Katydids Grasshoppers previously unseen instance We can “project” the previously unseen instance into the same space as the database. We have now abstracted away the details of our particular problem. It will be much easier to talk about points in space. previously unseen instance We can “project” the previously unseen instance into the same space as the database. We have now abstracted away the details of our particular problem. It will be much easier to talk about points in space ??????? previously unseen instance =

Simple Linear Classifier previously unseen instance If previously unseen instance above the line then class is Katydid else class is Grasshopper Katydids Grasshoppers R.A. Fisher

Which of the “Pigeon Problems” can be solved by the Simple Linear Classifier? 1)Perfect 2)Useless 3)Pretty Good Problems that can be solved by a linear classifier are call linearly separable.

Grasshopper Antennae shorter than body? Cricket Foretiba has ears? KatydidsCamel Cricket Yes No 3 Tarsi? No

Naïve Bayes Classifier We will start off with a visual intuition, before looking at the math… Thomas Bayes

Antenna Length Grasshoppers Katydids Abdomen Length Remember this example? Let’s get lots more data…

Antenna Length Katydids Grasshoppers With a lot of data, we can build a histogram. Let us just build one for “Antenna Length” for now…

We can leave the histograms as they are, or we can summarize them with two normal distributions. Let us us two normal distributions for ease of visualization in the following slides…

p(c j | d) = probability of class c j, given that we have observed d 3 Antennae length is 3 We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it? We can just ask ourselves, give the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid. There is a formal way to discuss the most probable classification…

10 2 P( Grasshopper | 3 ) = 10 / (10 + 2)= P( Katydid | 3 ) = 2 / (10 + 2)= Antennae length is 3 p(c j | d) = probability of class c j, given that we have observed d

9 3 P( Grasshopper | 7 ) = 3 / (3 + 9)= P( Katydid | 7 ) = 9 / (3 + 9)= Antennae length is 7 p(c j | d) = probability of class c j, given that we have observed d

6 6 P( Grasshopper | 5 ) = 6 / (6 + 6)= P( Katydid | 5 ) = 6 / (6 + 6)= Antennae length is 5 p(c j | d) = probability of class c j, given that we have observed d

Bayes Classifiers That was a visual intuition for a simple case of the Bayes classifier, also called: Idiot Bayes Naïve Bayes Simple Bayes We are about to see some of the mathematical formalisms, and more examples, but keep in mind the basic idea. previously unseen instance Find out the probability of the previously unseen instance belonging to each class, then simply pick the most probable class.

Bayes Classifiers Bayesian classifiers use Bayes theorem, which says p(c j | d ) = p(d | c j ) p(c j ) p(d) p(c j | d) = probability of instance d being in class c j, This is what we are trying to compute p(d | c j ) = probability of generating instance d given class c j, We can imagine that being in class c j, causes you to have feature d with some probability p(c j ) = probability of occurrence of class c j, This is just how frequent the class c j, is in our database p(d) = probability of instance d occurring This can actually be ignored, since it is the same for all classes

Assume that we have two classes malefemale c 1 = male, and c 2 = female. We have a person whose sex we do not know, say “drew” or d. malefemale malefemale Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female, I.e which is greater p(male | drew) or p(female | drew) malemalemale p(male | drew) = p(drew | male ) p(male) p(drew) (Note: “Drew can be a male or female name”) What is the probability of being called “drew” given that you are a male? What is the probability of being a male? What is the probability of being named “drew”? (actually irrelevant, since it is that same for all classes) Drew Carey Drew Barrymore

p(c j | d) = p(d | c j ) p(c j ) p(d) Officer Drew NameSex DrewMale ClaudiaFemale DrewFemale Female AlbertoMale KarinFemale NinaFemale SergioMale This is Officer Drew (who arrested me in 1997). Is Officer Drew a Male or Female? Luckily, we have a small database with names and sex. We can use it to apply Bayes rule…

male p(male | drew) = 1/3 * 3/8 = /8 3/8 female p(female | drew) = 2/5 * 5/8 = /8 3/8 Officer Drew p(c j | d) = p(d | c j ) p(c j ) p(d) NameSex DrewMale ClaudiaFemale DrewFemale Female AlbertoMale KarinFemale NinaFemale SergioMale Female Officer Drew is more likely to be a Female.

Officer Drew IS a female! Officer Drew male p(male | drew) = 1/3 * 3/8 = /8 3/8 female p(female | drew) = 2/5 * 5/8 = /8 3/8

NameOver 170 CM EyeHair lengthSex DrewNoBlueShortMale ClaudiaYesBrownLongFemale DrewNoBlueLongFemale DrewNoBlueLongFemale AlbertoYesBrownShortMale KarinNoBlueLongFemale NinaYesBrownShortFemale SergioYesBlueLongMale p(c j | d) = p(d | c j ) p(c j ) p(d) So far we have only considered Bayes Classification when we have one attribute (the “antennae length”, or the “name”). But we may have many features. How do we use all the features?

To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d|c j ) = p(d 1 |c j ) * p(d 2 |c j ) * ….* p(d n |c j ) The probability of class c j generating instance d, equals…. The probability of class c j generating the observed value for feature 1, multiplied by.. The probability of class c j generating the observed value for feature 2, multiplied by..

To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d|c j ) = p(d 1 |c j ) * p(d 2 |c j ) * ….* p(d n |c j ) p( officer drew |c j ) = p(over_170 cm = yes|c j ) * p(eye =blue|c j ) * …. Officer Drew is blue-eyed, over 170 cm tall, and has long hair Female p( officer drew | Female) = 2/5 * 3/5 * …. Male p( officer drew | Male) = 2/3 * 2/3 * ….

p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) cjcj The Naive Bayes classifiers is often represented as this type of graph… Note the direction of the arrows, which state that each class causes certain features, with a certain probability …

Naïve Bayes is fast and space efficient We can look up all the probabilities with a single scan of the database and store them in a (small) table… SexOver190 cmMaleYes0.15 No0.85 FemaleYes0.01 No0.99 cjcj … p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) SexLong HairMaleYes0.05 No0.95 FemaleYes0.70 No0.30 SexMale Female

Naïve Bayes is NOT sensitive to irrelevant features... Suppose we are trying to classify a persons sex based on several features, including eye color. (Of course, eye color is completely irrelevant to a persons gender) Female p( Jessica | Female) = 9,000/10,000 * 9,975/10,000 * …. Male p( Jessica | Male) = 9,001/10,000 * 2/10,000 * …. p( Jessica |c j ) = p(eye = brown|c j ) * p( wears_dress = yes|c j ) * …. However, this assumes that we have good enough estimates of the probabilities, so the more data the better. Almost the same!

An obvious point. I have used a simple two class problem, and two possible values for each example, for my previous examples. However we can have an arbitrary number of classes, or feature values AnimalMass >10 kgCatYes0.15 No0.85 DogYes0.91 No0.09 PigYes0.99 No0.01 cjcj … p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) AnimalCat Dog Pig ColorCatBlack0.33 White0.23 Brown0.44 DogBlack0.97 White0.03 Brown0.90 PigBlack0.04 White0.01 Brown0.95

Naïve Bayesian Classifier p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) p(d|cj)p(d|cj) Problem! Naïve Bayes assumes independence of features… SexOver 6 foot MaleYes0.15 No0.85 FemaleYes0.01 No0.99 SexOver 200 pounds MaleYes0.11 No0.80 FemaleYes0.05 No0.95

Naïve Bayesian Classifier p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) p(d|cj)p(d|cj) Solution Consider the relationships between attributes… SexOver 6 foot MaleYes0.15 No0.85 FemaleYes0.01 No0.99 SexOver 200 pounds MaleYes and Over 6 foot0.11 No and Over 6 foot0.59 Yes and NOT Over 6 foot0.05 No and NOT Over 6 foot0.35 FemaleYes and Over 6 foot0.01

Naïve Bayesian Classifier p(d 1 |c j ) p(d 2 |c j ) p(d n |c j ) p(d|cj)p(d|cj) Solution Consider the relationships between attributes… But how do we find the set of connecting arcs?? Read Keogh, E. & Pazzani, M. (1999). Learning augmented Bayesian classifiers: A comparison of distribution- based and classification-based approaches. In Uncertainty 99, 7th. Int'l Workshop on AI and Statistics, Ft. Lauderdale, FL, pp Don’t bother writing a reaction paper, but if we had a pop quiz…

Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you in absolute confidence primarily to seek your assistance to transfer our cash of twenty one Million Dollars ($21, ) now in the custody of a private Security trust firm in Europe the money is in trunk boxes deposited and declared as family valuables by my late father as a matter of fact the company does not know the content as money, although my father made them to under stand that the boxes belongs to his foreign partner. …

This mail is probably spam. The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future. See for more details. Content analysis details: (12.20 points, 5 required) NIGERIAN_SUBJECT2 (1.4 points) Subject is indicative of a Nigerian spam FROM_ENDS_IN_NUMS (0.7 points) From: ends in numbers MIME_BOUND_MANY_HEX (2.9 points) Spam tool pattern in MIME boundary URGENT_BIZ (2.7 points) BODY: Contains urgent matter US_DOLLARS_3 (1.5 points) BODY: Nigerian scam key phrase ($NN,NNN,NNN.NN) DEAR_SOMETHING (1.8 points) BODY: Contains 'Dear (something)' BAYES_30 (1.6 points) BODY: Bayesian classifier says spam probability is 30 to 40% [score: ]

ANGUILLAAUSTRALIA St. Helena & Dependencies South Georgia & South Sandwich Islands U.K. Serbia & Montenegro (Yugoslavia) FRANCENIGERINDIAIRELANDBRAZIL (((((Australia: 0.27, ((Anguilla: 0.13, S.H.D.: 0.13), S.G.S.S.I.: 0.2)), U.K.: 0.4) (Yugoslavia: 0.47, France: 0.47)) ((Niger: 0.2, India: 0.2), Ireland: 0.47)), Brazil: 1)

Pedro (Portuguese) Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian) Cristovao (Portuguese) Christoph (German), Christophe (French), Cristobal (Spanish), Cristoforo (Italian), Kristoffer (Scandinavian), Krystof (Czech), Christopher (English) Miguel (Portuguese) Michalis (Greek), Michael (English), Mick (Irish!) A Demonstration of Hierarchical Clustering using String Edit Distance Piotr Pyotr Petros Pietro Pedro Pierre Piero Peter Peder Peka Peadar Michalis Michael Miguel Mick Cristovao Christopher Christophe Christoph Crisdean Cristobal Cristoforo Kristoffer Krystof

Piotr Pyotr Petros Pietro Pedro Pierre Piero Peter Peder Peka Peadar Pedro (Portuguese/Spanish) Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian)