Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.

Similar presentations


Presentation on theme: "Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar."— Presentation transcript:

1 Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

2 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Alternative Techniques l Rule-Based Classifier –Classify records by using a collection of “if…then…” rules l Instance Based Classifiers

3 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Rule-based Classifier (Example) R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians

4 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Application of Rule-Based Classifier l A rule r covers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal

5 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ How does Rule-based Classifier Work? R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules

6 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ From Decision Trees To Rules Rules are mutually exclusive and exhaustive Rule set contains as much information as the tree

7 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Rules Can Be Simplified Initial Rule: (Refund=No)  (Status=Married)  No Simplified Rule: (Status=Married)  No

8 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Instance-Based Classifiers Store the training records Use training records to predict the class label of unseen cases

9 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Instance Based Classifiers l Examples: –Rote-learner  Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly –Nearest neighbor  Uses k “closest” points (nearest neighbors) for performing classification

10 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Nearest Neighbor Classifiers l Basic idea: –If it walks like a duck, quacks like a duck, then it’s probably a duck Training Records Test Record Compute Distance Choose k of the “nearest” records

11 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Nearest-Neighbor Classifiers l Requires three things –The set of stored records –Distance Metric to compute distance between records –The value of k, the number of nearest neighbors to retrieve l To classify an unknown record: –Compute distance to other training records –Identify k nearest neighbors –Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

12 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

13 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Nearest Neighbor Classification l Compute distance between two points: –Euclidean distance l Determine the class from nearest neighbor list –take the majority vote of class labels among the k-nearest neighbors –Weigh the vote according to distance  weight factor, w = 1/d 2

14 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Nearest Neighbor Classification… l Choosing the value of k: –If k is too small, sensitive to noise points –If k is too large, neighborhood may include points from other classes

15 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Nearest Neighbor Classification… l Scaling issues –Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes –Example:  height of a person may vary from 1.5m to 1.8m  weight of a person may vary from 90lb to 300lb  income of a person may vary from $10K to $1M – Solution: Normalize the vectors to unit length

16 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Nearest neighbor Classification… l k-NN classifiers are lazy learners –It does not build models explicitly –Unlike eager learners such as decision tree induction and rule-based systems –Classifying unknown records are relatively expensive

17 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Artificial Neural Networks (ANN)

18 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Artificial Neural Networks (ANN) l What is ANN?

19 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Artificial Neural Networks (ANN) Axon Nucleus Cell Body Synapse Dendrites Output (Y) w1w1 w2w2 w3w3 b  y x1x1 x2x2 x3x3 Input (X) Weight Neuron

20 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.

21 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Artificial Neural Networks (ANN)

22 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Artificial Neural Networks (ANN) l Model is an assembly of inter-connected nodes and weighted links l Output node sums up each of its input value according to the weights of its links l Compare output node against some threshold t Perceptron Model or

23 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ General Structure of ANN Training ANN means learning the weights of the neurons

24 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Algorithm for learning ANN l Initialize the weights (w 0, w 1, …, w k ) l Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples –Objective function: –Find the weights w i ’s that minimize the above objective function  e.g., backpropagation algorithm (see lecture notes)

25 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines l Find a linear hyperplane (decision boundary) that will separate the data

26 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines l One Possible Solution

27 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines l Another possible solution

28 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines l Other possible solutions

29 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines l Which one is better? B1 or B2? l How do you define better?

30 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines l Find hyperplane maximizes the margin => B1 is better than B2

31 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines

32 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines l We want to maximize: –Which is equivalent to minimizing: –But subjected to the following constraints:  This is a constrained optimization problem –Numerical approaches to solve it (e.g., quadratic programming)

33 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines l What if the problem is not linearly separable?

34 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Support Vector Machines l What if the problem is not linearly separable? –Introduce slack variables  Need to minimize:  Subject to:

35 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Nonlinear Support Vector Machines l What if decision boundary is not linear?

36 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Nonlinear Support Vector Machines l Transform data into higher dimensional space

37 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Ensemble Methods l Construct a set of classifiers from the training data l Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

38 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ General Idea

39 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Why does it work? l Suppose there are 25 base classifiers –Each classifier has error rate,  = 0.35 –Assume classifiers are independent –Probability that the ensemble classifier makes a wrong prediction:

40 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Examples of Ensemble Methods l How to generate an ensemble of classifiers? –Bagging –Boosting

41 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Bagging l Sampling with replacement l Build classifier on each bootstrap sample l Each sample has probability (1 – 1/n) n of being selected

42 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Boosting l An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records –Initially, all N records are assigned equal weights –Unlike bagging, weights may change at the end of boosting round

43 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Boosting l Records that are wrongly classified will have their weights increased l Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

44 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example: AdaBoost l Base classifiers: C 1, C 2, …, C T l Error rate: l Importance of a classifier:

45 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example: AdaBoost l Weight update: l If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated l Classification:

46 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Illustrating AdaBoost Data points for training Initial weights for each data point

47 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Illustrating AdaBoost


Download ppt "Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar."

Similar presentations


Ads by Google