Download presentation

Presentation is loading. Please wait.

Published byJaydon Harrington Modified about 1 year ago

1
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

2
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Alternative Techniques l Rule-Based Classifier –Classify records by using a collection of “if…then…” rules l Instance Based Classifiers

3
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Rule-based Classifier (Example) R1: (Give Birth = no) (Can Fly = yes) Birds R2: (Give Birth = no) (Live in Water = yes) Fishes R3: (Give Birth = yes) (Blood Type = warm) Mammals R4: (Give Birth = no) (Can Fly = no) Reptiles R5: (Live in Water = sometimes) Amphibians

4
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Application of Rule-Based Classifier l A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no) (Can Fly = yes) Birds R2: (Give Birth = no) (Live in Water = yes) Fishes R3: (Give Birth = yes) (Blood Type = warm) Mammals R4: (Give Birth = no) (Can Fly = no) Reptiles R5: (Live in Water = sometimes) Amphibians The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal

5
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 How does Rule-based Classifier Work? R1: (Give Birth = no) (Can Fly = yes) Birds R2: (Give Birth = no) (Live in Water = yes) Fishes R3: (Give Birth = yes) (Blood Type = warm) Mammals R4: (Give Birth = no) (Can Fly = no) Reptiles R5: (Live in Water = sometimes) Amphibians A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules

6
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 From Decision Trees To Rules Rules are mutually exclusive and exhaustive Rule set contains as much information as the tree

7
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Rules Can Be Simplified Initial Rule: (Refund=No) (Status=Married) No Simplified Rule: (Status=Married) No

8
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Instance-Based Classifiers Store the training records Use training records to predict the class label of unseen cases

9
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Instance Based Classifiers l Examples: –Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly –Nearest neighbor Uses k “closest” points (nearest neighbors) for performing classification

10
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Nearest Neighbor Classifiers l Basic idea: –If it walks like a duck, quacks like a duck, then it’s probably a duck Training Records Test Record Compute Distance Choose k of the “nearest” records

11
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11 Nearest-Neighbor Classifiers l Requires three things –The set of stored records –Distance Metric to compute distance between records –The value of k, the number of nearest neighbors to retrieve l To classify an unknown record: –Compute distance to other training records –Identify k nearest neighbors –Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

12
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12 Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

13
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 Nearest Neighbor Classification l Compute distance between two points: –Euclidean distance l Determine the class from nearest neighbor list –take the majority vote of class labels among the k-nearest neighbors –Weigh the vote according to distance weight factor, w = 1/d 2

14
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14 Nearest Neighbor Classification… l Choosing the value of k: –If k is too small, sensitive to noise points –If k is too large, neighborhood may include points from other classes

15
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15 Nearest Neighbor Classification… l Scaling issues –Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes –Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M – Solution: Normalize the vectors to unit length

16
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16 Nearest neighbor Classification… l k-NN classifiers are lazy learners –It does not build models explicitly –Unlike eager learners such as decision tree induction and rule-based systems –Classifying unknown records are relatively expensive

17
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17 Artificial Neural Networks (ANN)

18
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18 Artificial Neural Networks (ANN) l What is ANN?

19
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19 Artificial Neural Networks (ANN) Axon Nucleus Cell Body Synapse Dendrites Output (Y) w1w1 w2w2 w3w3 b y x1x1 x2x2 x3x3 Input (X) Weight Neuron

20
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20 Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.

21
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21 Artificial Neural Networks (ANN)

22
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22 Artificial Neural Networks (ANN) l Model is an assembly of inter-connected nodes and weighted links l Output node sums up each of its input value according to the weights of its links l Compare output node against some threshold t Perceptron Model or

23
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23 General Structure of ANN Training ANN means learning the weights of the neurons

24
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 24 Algorithm for learning ANN l Initialize the weights (w 0, w 1, …, w k ) l Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples –Objective function: –Find the weights w i ’s that minimize the above objective function e.g., backpropagation algorithm (see lecture notes)

25
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 25 Support Vector Machines l Find a linear hyperplane (decision boundary) that will separate the data

26
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 26 Support Vector Machines l One Possible Solution

27
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 27 Support Vector Machines l Another possible solution

28
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 28 Support Vector Machines l Other possible solutions

29
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 29 Support Vector Machines l Which one is better? B1 or B2? l How do you define better?

30
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 30 Support Vector Machines l Find hyperplane maximizes the margin => B1 is better than B2

31
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 31 Support Vector Machines

32
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 32 Support Vector Machines l We want to maximize: –Which is equivalent to minimizing: –But subjected to the following constraints: This is a constrained optimization problem –Numerical approaches to solve it (e.g., quadratic programming)

33
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 33 Support Vector Machines l What if the problem is not linearly separable?

34
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 34 Support Vector Machines l What if the problem is not linearly separable? –Introduce slack variables Need to minimize: Subject to:

35
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 35 Nonlinear Support Vector Machines l What if decision boundary is not linear?

36
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 36 Nonlinear Support Vector Machines l Transform data into higher dimensional space

37
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 37 Ensemble Methods l Construct a set of classifiers from the training data l Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

38
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 38 General Idea

39
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 39 Why does it work? l Suppose there are 25 base classifiers –Each classifier has error rate, = 0.35 –Assume classifiers are independent –Probability that the ensemble classifier makes a wrong prediction:

40
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 40 Examples of Ensemble Methods l How to generate an ensemble of classifiers? –Bagging –Boosting

41
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 41 Bagging l Sampling with replacement l Build classifier on each bootstrap sample l Each sample has probability (1 – 1/n) n of being selected

42
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 42 Boosting l An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records –Initially, all N records are assigned equal weights –Unlike bagging, weights may change at the end of boosting round

43
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 43 Boosting l Records that are wrongly classified will have their weights increased l Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

44
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 44 Example: AdaBoost l Base classifiers: C 1, C 2, …, C T l Error rate: l Importance of a classifier:

45
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 45 Example: AdaBoost l Weight update: l If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated l Classification:

46
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 46 Illustrating AdaBoost Data points for training Initial weights for each data point

47
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 47 Illustrating AdaBoost

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google