Presentation is loading. Please wait.

Presentation is loading. Please wait.

So far we have only seen features that are real numbers.

Similar presentations


Presentation on theme: "So far we have only seen features that are real numbers."— Presentation transcript:

1 So far we have only seen features that are real numbers.
But features could be: Boolean (Has Wings?) Categorical (Green, Brown, Gray) etc How do we handle such features? The good news is that we can always define some measure of “nearest” for nearest neighbor for basically any kinds of features. Such measures are called distance measures (or sometime, similarity measures).

2 Let us consider an example that uses Boolean features: Features:
Has wings? Has spur on front legs? Has cone-shaped head? length(antenna) > 1.5* length(abdomen) Under this representation, every insect is a just Boolean vector: Insect17 ={true, true, false, false} or Insect17 ={1,1,0,0} Instead of using the Euclidean distance, we can use the Hamming distance (or one of many other measures). Which insect is the nearest neighbor of Insect17 ={1,1,0,0}? Insect1 ={1,1,0,1}, Insect2 ={0,0,0,0}, Insect3 ={0,1,1,1}, Insect3 ={0,1,1,1} Here we would say Insect17 is in the blue class. Insect17 The Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.

3 We can use the nearest neighbor algorithm with any distance/similarity function
For example, is “Faloutsos” Greek or Irish? We could compare the name “Faloutsos” to a database of names using string edit distance… edit_distance(Faloutsos, Keogh) = 8 edit_distance(Faloutsos, Gunopulos) = 6 Hopefully, the similarity of the name (particularly the suffix) to other Greek names would mean the nearest nearest neighbor is also a Greek name. ID Name Class 1 Gunopulos Greek 2 Papadopoulos 3 Kollios 4 Dardanos 5 Keogh Irish 6 Gough 7 Greenhaugh 8 Hadleigh Specialized distance measures exist for DNA strings, time series, images, graphs, videos, sets, fingerprints etc…

4 Peter Piotr Edit Distance Example Piter Pioter Piotr Pyotr Peter
How similar are the names “Peter” and “Piotr”? Assume the following cost function Substitution 1 Unit Insertion 1 Unit Deletion 1 Unit D(Peter,Piotr) is 3 It is possible to transform any string Q into string C, using only Substitution, Insertion and Deletion. Assume that each of these operators has a cost associated with it. The similarity between two strings can be defined as the cost of the cheapest transformation from Q to C. Note that for now we have ignored the issue of how we can find this cheapest transformation Peter Piter Pioter Piotr Substitution (i for e) Insertion (o) Piotr Pyotr Petros Pietro Pedro Pierre Piero Peter Deletion (e)

5 Decision Tree Classifier
10 1 2 3 4 5 6 7 8 9 Ross Quinlan Abdomen Length > 7.1? Antenna Length no yes Antenna Length > 6.0? Katydid no yes Grasshopper Katydid Abdomen Length

6 Decision Tree Classifier
10 1 2 3 4 5 6 7 8 9 Here is a different tree. (exercise, draw out the full tree) In general, if we have n Boolean features, the number of trees is: This is both good and bad news. Antenna Length Abdomen Length

7 Antennae shorter than body?
Yes No 3 Tarsi? Grasshopper Yes No Foretiba has ears? Yes No Cricket Decision trees predate computers Katydids Camel Cricket

8 Decision Tree Classification
A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution Decision tree generation consists of two phases Tree construction At start, all the training examples are at the root Partition examples recursively based on selected attributes Tree pruning Identify and remove branches that reflect noise or outliers Use of decision tree: Classifying an unknown sample Test the attribute values of the sample against the decision tree

9 How do we construct the decision tree?
Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they can be discretized in advance) Examples are partitioned recursively based on selected attributes. Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) Conditions for stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf There are no samples left

10 Information Gain as A Splitting Criteria
Select the attribute with the highest information gain (information gain is the expected reduction in entropy). Assume there are two classes, P and N Let the set of examples S contain p elements of class P and n elements of class N The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as 0 log(0) is defined as 0

11

12 Information Gain in Decision Tree Induction
Assume that using attribute A, a current set will be partitioned into some number of child sets The encoding information that would be gained by branching on A Note: entropy is at its minimum if the collection of objects is completely uniform

13 Person Homer 0” 250 36 M Marge 10” 150 34 F Bart 2” 90 10 Lisa 6” 78 8
Hair Length Weight Age Class Homer 0” 250 36 M Marge 10” 150 34 F Bart 2” 90 10 Lisa 6” 78 8 Maggie 4” 20 1 Abe 1” 170 70 Selma 8” 160 41 Otto 180 38 Krusty 200 45 Comic 8” 290 38 ?

14 Let us try splitting on Hair length
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = yes no Hair Length <= 5? Let us try splitting on Hair length Entropy(1F,3M) = -(1/4)log2(1/4) - (3/4)log2(3/4) = Entropy(3F,2M) = -(3/5)log2(3/5) - (2/5)log2(2/5) = Gain(Hair Length <= 5) = – (4/9 * /9 * ) =

15 Let us try splitting on Weight
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = yes no Weight <= 160? Let us try splitting on Weight Entropy(4F,1M) = -(4/5)log2(4/5) - (1/5)log2(1/5) = Entropy(0F,4M) = -(0/4)log2(0/4) - (4/4)log2(4/4) = 0 Gain(Weight <= 160) = – (5/9 * /9 * 0 ) =

16 Let us try splitting on Age
Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9) = yes no age <= 40? Let us try splitting on Age Entropy(3F,3M) = -(3/6)log2(3/6) - (3/6)log2(3/6) = 1 Entropy(1F,2M) = -(1/3)log2(1/3) - (2/3)log2(2/3) = Gain(Age <= 40) = – (6/9 * 1 + 3/9 * ) =

17 This time we find that we can split on Hair length, and we are done!
Of the 3 features we had, Weight was best. But while people who weigh over 160 are perfectly classified (as males), the under 160 people are not perfectly classified… So we simply recurse! yes no Weight <= 160? This time we find that we can split on Hair length, and we are done! yes no Hair Length <= 2?

18 We need don’t need to keep the data around, just the test conditions.
Weight <= 160? yes no How would these people be classified? Hair Length <= 2? Male yes no Male Female

19 Male Male Female It is trivial to convert Decision Trees to rules… yes
Weight <= 160? yes no Hair Length <= 2? Male yes no Male Female Rules to Classify Males/Females If Weight greater than 160, classify as Male Elseif Hair Length less than or equal to 2, classify as Male Else classify as Female

20 Once we have learned the decision tree, we don’t even need a computer!
This decision tree is attached to a medical machine, and is designed to help nurses make decisions about what type of doctor to call. Decision tree for a typical shared-care setting applying the system for the diagnosis of prostatic obstructions.

21 Keogh vs. State of California = {0,1,1,0,0,0,1,0}
Classification Problem: Fourth Amendment Cases before the Supreme Court I The Fourth Amendment (Amendment IV) to the United States Constitution is the part of the Bill of Rights that prohibits unreasonable searches and seizures and requires any warrant to be judicially sanctioned and supported by probable cause. Suppose we have a 4th Amendment case, Keogh vs. State of California. Keogh argues that the search that found evidence of him taking bribes was and Unreasonable search (U), and the state argues that the search was Reasonable (R). The case is appealed to the Supreme Court, will the court decide U or R? We can use machine learning to try to predict their decision. What should the features be? Keogh vs. State of California = {0,1,1,0,0,0,1,0} • If the search was conducted in a home. • If the search was conducted in a business. If the search was conducted on one’s person. • If the search was conducted in a car. • If the search was a full search, as opposed to a less extensive intrusion. • If the search was conducted incident to arrest. • If the search was conducted after a lawful arrest. • If an exception to the warrant requirement existed (beyond that of search incident to a lawful arrest). The Statistical Analysis of Judicial Decisions and Legal Rules with Classification Treesjels_ Jonathan P. Kastellec

22 Keogh vs. State of California = {0,1,1,0,0,0,1,0}
Classification Problem: Fourth Amendment Cases before the Supreme Court II The Supreme Court’s search and seizure decisions, 1962–1984 terms. Keogh vs. State of California = {0,1,1,0,0,0,1,0} U = Unreasonable R = Reasonable

23 Decision Tree for Supreme Court Justice Sandra Day O'Connor
We can also learn decision trees for individual Supreme Court Members. Using similar decision trees for the other eight justices, these models correctly predicted the majority opinion in 75 percent of the cases, substantially outperforming the experts' 59 percent. Decision Tree for Supreme Court Justice Sandra Day O'Connor

24 The worked examples we have seen were performed on small datasets
The worked examples we have seen were performed on small datasets. However with small datasets there is a great danger of overfitting the data… When you have few datapoints, there are many possible splitting rules that perfectly classify the data, but will not generalize to future datasets. Yes No Wears green? Female Male For example, the rule “Wears green?” perfectly classifies the data, so does “Mothers name is Jacqueline?”, so does “Has blue shoes”…

25 Avoid Overfitting in Classification
The generated tree may overfit the training data Too many branches, some may reflect anomalies due to noise or outliers Result is in poor accuracy for unseen samples Two approaches to avoid overfitting Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a threshold Difficult to choose an appropriate threshold Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees Use a set of data different from the training data to decide which is the “best pruned tree”

26 ? Which of the “Pigeon Problems” can be solved by a Decision Tree?
10 1 2 3 4 5 6 7 8 9 Deep Bushy Tree Useless ? 10 1 2 3 4 5 6 7 8 9 100 10 20 30 40 50 60 70 80 90 The Decision Tree has a hard time with correlated attributes

27 Advantages/Disadvantages of Decision Trees
Easy to understand (Doctors love them!) Easy to generate rules Disadvantages: May suffer from overfitting. Classifies by rectangular partitioning (so does not handle correlated features very well). Can be quite large – pruning is necessary. Does not handle streaming data easily

28 Grade distribution with approximate letter grade mappings B+
There were 38 questions, but I assume 37 is the max. The mean was 30.5, Std was 3.6. There were 3 perfect scores. 20 A- 18 Grade distribution with approximate letter grade mappings B+ 16 14 12 10 C+ A 8 B 6 4 A+ 2 18 20 22 24 26 28 30 32 34 36 38

29 Decision Tree Classifier
10 1 2 3 4 5 6 7 8 9 Ross Quinlan Abdomen Length > 7.1? Antenna Length no yes Antenna Length > 6.0? Katydid no yes Grasshopper Katydid Abdomen Length

30

31 How would we go about building a classifier for projectile points?

32 21225212 length width length = 3.10 width = 1.45
I. Location of maximum blade width 1. Proximal quarter 2. Secondmost proximal quarter 3. Secondmost distal quarter 4. Distal quarter II. Base shape 1. Arc-shaped 2. Normal curve 3. Triangular 4. Folsomoid III. Basal indentation ratio 1. No basal indentation 2. 0·90–0·99 (shallow) 3. 0·80–0·89 (deep) IV. Constriction ratio 1·00 0·90–0·99 0·80–0·89 ·70–0·79 ·60–0·69 0·50–0·59 V. Outer tang angle 1. 93–115 2. 88–92 3. 81–87 4. 66–88 5. 51–65 6. <50 VI. Tang-tip shape 1. Pointed 2. Round 3. Blunt VII. Fluting 1. Absent 2. Present VIII. Length/width ratio 1. 1·00–1·99 2. 2·00–2·99 4. 4·00–4·99 5. 5·00–5·99 6. >6. 6·00 width length = 3.10 width = 1.45 length /width ratio= 2.13

33 21225212 yes no no yes Fluting? = TRUE? Base Shape = 4
I. Location of maximum blade width 1. Proximal quarter 2. Secondmost proximal quarter 3. Secondmost distal quarter 4. Distal quarter II. Base shape 1. Arc-shaped 2. Normal curve 3. Triangular 4. Folsomoid III. Basal indentation ratio 1. No basal indentation 2. 0·90–0·99 (shallow) 3. 0·80–0·89 (deep) IV. Constriction ratio 1·00 0·90–0·99 0·80–0·89 ·70–0·79 ·60–0·69 0·50–0·59 V. Outer tang angle 1. 93–115 2. 88–92 3. 81–87 4. 66–88 5. 51–65 6. <50 VI. Tang-tip shape 1. Pointed 2. Round 3. Blunt VII. Fluting 1. Absent 2. Present VIII. Length/width ratio 1. 1·00–1·99 2. 2·00–2·99 4. 4·00–4·99 5. 5·00–5·99 6. >6. 6·00 Fluting? = TRUE? no yes Base Shape = 4 Late Archaic no yes Mississippian Length/width ratio = 2

34 ? 21225212 21265122 - Late Archaic 14114214 - Transitional Paleo
We could also us the Nearest Neighbor Algorithm ? Late Archaic Transitional Paleo Transitional Paleo Late Archaic Woodland

35 Arrowhead Decision Tree
Decision Tree for Arrowheads It might be better to use the shape directly in the decision tree… Lexiang Ye and Eamonn Keogh (2009) Time Series Shapelets: A New Primitive for Data Mining. SIGKDD 2009 Avonlea Clovis Mix Training data (subset) 11.24 85.47 Shapelet Dictionary (Clovis) (Avonlea) I II 100 200 300 400 0.5 1.0 1.5 Arrowhead Decision Tree 2 1 Clovis Avonlea The shapelet decision tree classifier achieves an accuracy of 80.0%, the accuracy of rotation invariant one-nearest-neighbor classifier is 68.0%.

36 Naïve Bayes Classifier
Thomas Bayes We will start off with a visual intuition, before looking at the math…

37 Grasshoppers Katydids Remember this example? Let’s get lots more data…
10 1 2 3 4 5 6 7 8 9 Antenna Length Abdomen Length Remember this example? Let’s get lots more data…

38 With a lot of data, we can build a histogram
With a lot of data, we can build a histogram. Let us just build one for “Antenna Length” for now… 10 1 2 3 4 5 6 7 8 9 Antenna Length Katydids Grasshoppers

39 We can leave the histograms as they are, or we can summarize them with two normal distributions.
Let us us two normal distributions for ease of visualization in the following slides…

40 p(cj | d) = probability of class cj, given that we have observed d
We want to classify an insect we have found. Its antennae are 3 units long. How can we classify it? We can just ask ourselves, give the distributions of antennae lengths we have seen, is it more probable that our insect is a Grasshopper or a Katydid. There is a formal way to discuss the most probable classification… p(cj | d) = probability of class cj, given that we have observed d 3 Antennae length is 3

41 p(cj | d) = probability of class cj, given that we have observed d
P(Grasshopper | 3 ) = 10 / (10 + 2) = 0.833 P(Katydid | 3 ) = 2 / (10 + 2) = 0.166 10 2 3 Antennae length is 3

42 p(cj | d) = probability of class cj, given that we have observed d
P(Grasshopper | 7 ) = 3 / (3 + 9) = 0.250 P(Katydid | 7 ) = 9 / (3 + 9) = 0.750 9 3 7 Antennae length is 7

43 p(cj | d) = probability of class cj, given that we have observed d
P(Grasshopper | 5 ) = 6 / (6 + 6) = 0.500 P(Katydid | 5 ) = 6 / (6 + 6) = 0.500 6 6 5 Antennae length is 5

44 Bayes Classifiers That was a visual intuition for a simple case of the Bayes classifier, also called: Idiot Bayes Naïve Bayes Simple Bayes We are about to see some of the mathematical formalisms, and more examples, but keep in mind the basic idea. Find out the probability of the previously unseen instance belonging to each class, then simply pick the most probable class.

45 Bayes Classifiers Bayesian classifiers use Bayes theorem, which says
p(cj | d ) = p(d | cj ) p(cj) p(d) p(cj | d) = probability of instance d being in class cj, This is what we are trying to compute p(d | cj) = probability of generating instance d given class cj, We can imagine that being in class cj, causes you to have feature d with some probability p(cj) = probability of occurrence of class cj, This is just how frequent the class cj, is in our database p(d) = probability of instance d occurring This can actually be ignored, since it is the same for all classes

46 c1 = male, and c2 = female. Assume that we have two classes
We have a person whose sex we do not know, say “drew” or d. Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female, I.e which is greater p(male | drew) or p(female | drew) (Note: “Drew can be a male or female name”) Drew Barrymore Drew Carey What is the probability of being called “drew” given that you are a male? What is the probability of being a male? p(male | drew) = p(drew | male ) p(male) p(drew) What is the probability of being named “drew”? (actually irrelevant, since it is that same for all classes)

47 p(cj | d) = p(d | cj ) p(cj) p(d)
This is Officer Drew (who arrested me in 1997). Is Officer Drew a Male or Female? Luckily, we have a small database with names and sex. We can use it to apply Bayes rule… Name Sex Drew Male Claudia Female Alberto Karin Nina Sergio Officer Drew p(cj | d) = p(d | cj ) p(cj) p(d)

48 p(cj | d) = p(d | cj ) p(cj) p(d)
Name Sex Drew Male Claudia Female Alberto Karin Nina Sergio p(cj | d) = p(d | cj ) p(cj) p(d) Officer Drew p(male | drew) = 1/3 * 3/8 = 0.125 3/ /8 Officer Drew is more likely to be a Female. p(female | drew) = 2/5 * 5/8 = 0.250 3/ /8

49 Officer Drew IS a female!
p(male | drew) = 1/3 * 3/8 = 0.125 3/ /8 p(female | drew) = 2/5 * 5/8 = 0.250 3/ /8

50 p(cj | d) = p(d | cj ) p(cj) p(d)
So far we have only considered Bayes Classification when we have one attribute (the “antennae length”, or the “name”). But we may have many features. How do we use all the features? p(cj | d) = p(d | cj ) p(cj) p(d) Name Over 170CM Eye Hair length Sex Drew No Blue Short Male Claudia Yes Brown Long Female Alberto Karin Nina Sergio

51 p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)
To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj) The probability of class cj generating instance d, equals…. The probability of class cj generating the observed value for feature 1, multiplied by.. The probability of class cj generating the observed value for feature 2, multiplied by..

52 p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj)
To simplify the task, naïve Bayesian classifiers assume attributes have independent distributions, and thereby estimate p(d|cj) = p(d1|cj) * p(d2|cj) * ….* p(dn|cj) p(officer drew|cj) = p(over_170cm = yes|cj) * p(eye =blue|cj) * …. Officer Drew is blue-eyed, over 170cm tall, and has long hair p(officer drew| Female) = 2/5 * 3/5 * …. p(officer drew| Male) = 2/3 * 2/3 * ….

53 cj … p(d1|cj) p(d2|cj) p(dn|cj)
The Naive Bayes classifiers is often represented as this type of graph… Note the direction of the arrows, which state that each class causes certain features, with a certain probability p(d1|cj) p(d2|cj) p(dn|cj)

54 cj … Naïve Bayes is fast and space efficient
We can look up all the probabilities with a single scan of the database and store them in a (small) table… p(d1|cj) p(d2|cj) p(dn|cj) Sex Over190cm Male Yes 0.15 No 0.85 Female 0.01 0.99 Sex Long Hair Male Yes 0.05 No 0.95 Female 0.70 0.30 Sex Male Female

55 Naïve Bayes is NOT sensitive to irrelevant features...
Suppose we are trying to classify a persons sex based on several features, including eye color. (Of course, eye color is completely irrelevant to a persons gender) p(Jessica |cj) = p(eye = brown|cj) * p( wears_dress = yes|cj) * …. p(Jessica | Female) = 9,000/10, * 9,975/10,000 * …. p(Jessica | Male) = 9,001/10, * 2/10, * …. Almost the same! However, this assumes that we have good enough estimates of the probabilities, so the more data the better.

56 cj An obvious point. I have used a simple two class problem, and two possible values for each example, for my previous examples. However we can have an arbitrary number of classes, or feature values p(d1|cj) p(d2|cj) p(dn|cj) Animal Mass >10kg Cat Yes 0.15 No 0.85 Dog 0.91 0.09 Pig 0.99 0.01 Animal Color Cat Black 0.33 White 0.23 Brown 0.44 Dog 0.97 0.03 0.90 Pig 0.04 0.01 0.95 Animal Cat Dog Pig

57 Naïve Bayesian Classifier
Problem! Naïve Bayes assumes independence of features… p(d|cj) Naïve Bayesian Classifier p(d1|cj) p(d2|cj) p(dn|cj) Sex Over 6 foot Male Yes 0.15 No 0.85 Female 0.01 0.99 Sex Over 200 pounds Male Yes 0.11 No 0.80 Female 0.05 0.95

58 Naïve Bayesian Classifier
Solution Consider the relationships between attributes… p(d|cj) Naïve Bayesian Classifier p(d1|cj) p(d2|cj) p(dn|cj) Sex Over 6 foot Male Yes 0.15 No 0.85 Female 0.01 0.99 Sex Over 200 pounds Male Yes and Over 6 foot 0.11 No and Over 6 foot 0.59 Yes and NOT Over 6 foot 0.05 No and NOT Over 6 foot 0.35 Female 0.01

59 Naïve Bayesian Classifier
Solution Consider the relationships between attributes… p(d|cj) Naïve Bayesian Classifier p(d1|cj) p(d2|cj) p(dn|cj) But how do we find the set of connecting arcs?? Eamonn J. Keogh, Michael J. Pazzani: Learning the Structure of Augmented Bayesian Classifiers. International Journal on Artificial Intelligence Tools 11(4): (2002)

60 The Naïve Bayesian Classifier has a piecewise quadratic decision boundary
Grasshoppers Katydids Ants Adapted from slide by Ricardo Gutierrez-Osuna

61 Which of the “Pigeon Problems” can be solved by a decision tree?
10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 100 10 20 30 40 50 60 70 80 90

62 One second of audio from our sensor.
The Common Eastern Bumble Bee (Bombus impatiens) takes about one tenth of a second to pass the laser. 0.2 0.1 -0.1 Background noise Bee begins to cross laser Bee has past though the laser -0.2 x 10 4 0.5 1 1.5 2 2.5 3 3.5 4 4.5

63 Bee begins to cross laser
One second of audio from the laser sensor. Only Bombus impatiens (Common Eastern Bumble Bee) is in the insectary. 0.2 0.1 -0.1 Background noise Bee begins to cross laser -0.2 4 x 10 0.5 1 1.5 2 2.5 3 3.5 4 4.5 -3 x 10 Single-Sided Amplitude Spectrum of Y(t) 4 Peak at 197Hz 3 60Hz interference |Y(f)| 2 Harmonics 1 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz)

64 4 3 |Y(f)| 2 1 Frequency (Hz) Frequency (Hz) Frequency (Hz) x 10 100
-3 x 10 4 3 |Y(f)| 2 1 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz) 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz) 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz)

65 4 3 |Y(f)| 2 1 Frequency (Hz) Frequency (Hz) Frequency (Hz) x 10 100
-3 x 10 4 3 |Y(f)| 2 1 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz) 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz) 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz)

66 100 200 300 400 500 600 700 Wing Beat Frequency Hz

67 Anopheles stephensi is a primary mosquito vector of malaria.
The yellow fever mosquito (Aedes aegypti) is a mosquito that can spread dengue fever, chikungunya, and yellow fever viruses 100 200 300 400 500 600 700 Wing Beat Frequency Hz

68 Anopheles stephensi: Female mean =475, Std = 30
400 500 600 700 Anopheles stephensi: Female mean =475, Std = 30 Aedes aegyptii : Female mean =567, Std = 43 517 If I see an insect with a wingbeat frequency of 500, what is it?

69 Can we get more features?
517 400 500 600 700 8.02% of the area under the red curve What is the error rate? 12.2% of the area under the pink curve Can we get more features?

70 Aedes aegypti (yellow fever mosquito)
Circadian Features Aedes aegypti (yellow fever mosquito) dawn dusk 12 24 Midnight Noon Midnight

71 Suppose I observe an insect with a wingbeat frequency of 420Hz
700 Suppose I observe an insect with a wingbeat frequency of 420Hz What is it? 600 500 400

72 400 500 600 700 Midnight 12 24 Noon Suppose I observe an insect with a wingbeat frequency of 420Hz at 11:00am What is it?

73 400 500 600 700 Midnight 12 24 Noon Suppose I observe an insect with a wingbeat frequency of 420 at 11:00am What is it? (Culex | [420Hz,11:00am]) = (6/ ( )) * (2/ ( )) = 0.111 (Anopheles | [420Hz,11:00am]) = (6/ ( )) * (4/ ( )) = 0.222 (Aedes | [420Hz,11:00am]) = (0/ ( )) * (3/ ( )) = 0.000

74 Blue Sky Ideas Once you have a classifier working, you begin to see new uses for them… Let us see some examples..

75 Capturing or killing individually targeted insects
Most efforts to capture or kill insects are “shotgun”. Many non- targeted insects (including beneficial ones) are killed/captured. In some cases, the ratios are 1,000 to 1 (i.e. 1,000 non-targeted insects are effected for each one that was targeted). We believe our sensors allow an ultra precise approach, with a ratio approaching 1 to 1. This has obvious implications for SIT/metagenomics

76 Zoom-in (after removing the wing)
Kill It seems obvious you could kill a mosquito with a powerful enough laser and with enough time. But we need to do it fast, with as little power as possible. We have gotten this down to 1/20th of a second, and just 1 watt. (and falling) The mosquitoes may survive the laser strike, but they cannot fly away (as was the case in photo shown right) We are building a SIT Hotel California for female mosquitoes (you can check out anytime you like, but you can never leave) Culex tarsalis Collaboration with UCR mechanical engineers Amir Rose and Dr. Guillermo Aguilar Zoom-in (after removing the wing)

77 Capture We envision building robotic traps that can be left in the field, and programed with different sampling missions. Such traps could be placed and retrieved by drones. Capturing live insects is important if you want to do metagenomics. Some examples of sampling missions… Capture examples of gravid{Aedes aegypti} Capture insects marked{ Cripple(left-C|right-S) } Capture examples of insects that are NOT Anopheles AND have a wingbeat frequency > (to exclude bees, etc.) Capture examples of any insects with a wingbeat frequency > 500, encountered between 4:00am and 4:10am Capture examples of fed{Anopheles gambiae} OR fed{Anopheles quadriannulatus} OR fed{Anopheles melas}

78 Capture About 10% of the insects captured by Venus fly traps are flying insects We believe that we can build inexpensive mechanical traps that can capture sex/species targeted insects. Capture examples of gravid{Aedes aegypti} Capture insects marked{ Cripple(left-C|right-S) } Capture examples of insects that are NOT Anopheles AND have a wingbeat frequency > (to exclude bees, etc.) Capture examples of any insects with a wingbeat frequency > 500, encountered between 4:00am and 4:10am Capture examples of fed{Anopheles gambiae} OR fed{Anopheles quadriannulatus} OR fed{Anopheles melas}

79 Advantages/Disadvantages of Naïve Bayes
Fast to train (single scan). Fast to classify Not sensitive to irrelevant features Handles real and discrete data Handles streaming data well Disadvantages: Assumes independence of features

80 Summary of Classification
We have seen 4 major classification techniques: Simple linear classifier, Nearest neighbor, Decision tree. There are other techniques: Neural Networks, Support Vector Machines, Genetic algorithms.. In general, there is no one best classifier for all problems. You have to consider what you hope to achieve, and the data itself… Let us now move on to the other classic problem of data mining and machine learning, Clustering…

81 Dear SIR, I am Mr. John Coleman and my sister is Miss Rose Colemen, we are the children of late Chief Paul Colemen from Sierra Leone. I am writing you in absolute confidence primarily to seek your assistance to transfer our cash of twenty one Million Dollars ($21, ) now in the custody of a private Security trust firm in Europe the money is in trunk boxes deposited and declared as family valuables by my late father as a matter of fact the company does not know the content as money, although my father made them to under stand that the boxes belongs to his foreign partner.

82 This mail is probably spam
This mail is probably spam. The original message has been attached along with this report, so you can recognize or block similar unwanted mail in future. See for more details. Content analysis details: (12.20 points, 5 required) NIGERIAN_SUBJECT2 (1.4 points) Subject is indicative of a Nigerian spam FROM_ENDS_IN_NUMS (0.7 points) From: ends in numbers MIME_BOUND_MANY_HEX (2.9 points) Spam tool pattern in MIME boundary URGENT_BIZ (2.7 points) BODY: Contains urgent matter US_DOLLARS_ (1.5 points) BODY: Nigerian scam key phrase ($NN,NNN,NNN.NN) DEAR_SOMETHING (1.8 points) BODY: Contains 'Dear (something)' BAYES_ (1.6 points) BODY: Bayesian classifier says spam probability is 30 to 40% [score: ]


Download ppt "So far we have only seen features that are real numbers."

Similar presentations


Ads by Google