Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.

Similar presentations


Presentation on theme: "Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar."— Presentation transcript:

1 Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

2 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Instance-Based Classifiers Store the training records Use training records to predict the class label of unseen cases

3 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Instance Based Classifiers l Examples: –Rote-learner  Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly –Nearest neighbor  Uses k “closest” points (nearest neighbors) for performing classification

4 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Nearest Neighbor Classifiers l Basic idea: –If it walks like a duck, quacks like a duck, then it’s probably a duck Training Records Test Record Compute Distance Choose k of the “nearest” records

5 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Nearest-Neighbor Classifiers l Requires three things –The set of stored records –Distance Metric to compute distance between records –The value of k, the number of nearest neighbors to retrieve l To classify an unknown record: –Compute distance to other training records –Identify k nearest neighbors –Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

6 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

7 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Nearest Neighbor Classification l Compute distance between two points: –Euclidean distance l Determine the class from nearest neighbor list –take the majority vote of class labels among the k-nearest neighbors –Weigh the vote according to distance  weight factor, w = 1/d 2

8 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Nearest Neighbor Classification… l Choosing the value of k: –If k is too small, sensitive to noise points –If k is too large, neighborhood may include points from other classes

9 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Nearest Neighbor Classification… l Scaling issues –Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes –Example:  height of a person may vary from 1.5m to 1.8m  weight of a person may vary from 90lb to 300lb  income of a person may vary from $10K to $1M

10 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Nearest Neighbor Classification… l Problem with Euclidean measure: –High dimensional data  curse of dimensionality

11 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11 Nearest neighbor Classification… l k-NN classifiers are lazy learners –It does not build models explicitly –Unlike eager learners such as decision tree induction and rule-based systems –Classifying unknown records are relatively expensive

12 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12 Example: PEBLS l PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) –Works with both continuous and nominal features  For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) –Each record is assigned a weight factor –Number of nearest neighbor, k = 1

13 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 Bayes Classifier l A probabilistic framework for solving classification problems l Conditional Probability: l Bayes theorem:

14 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14 Example of Bayes Theorem l Given: –A doctor knows that meningitis causes stiff neck 50% of the time –Prior probability of any patient having meningitis is 1/50,000 –Prior probability of any patient having stiff neck is 1/20 l If a patient has stiff neck, what’s the probability he/she has meningitis?

15 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15 Bayesian Classifiers l Consider each attribute and class label as random variables l Given a record with attributes (A 1, A 2,…,A n ) –Goal is to predict class C –Specifically, we want to find the value of C that maximizes P(C| A 1, A 2,…,A n )

16 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16 Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.

17 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17 Artificial Neural Networks (ANN)

18 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18 Artificial Neural Networks (ANN) l Model is an assembly of inter-connected nodes and weighted links l Output node sums up each of its input value according to the weights of its links l Compare output node against some threshold t Perceptron Model or

19 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19 General Structure of ANN Training ANN means learning the weights of the neurons

20 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20 Algorithm for learning ANN l Initialize the weights (w 0, w 1, …, w k ) l Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples –Objective function: –Find the weights w i ’s that minimize the above objective function  e.g., backpropagation algorithm

21 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21 Support Vector Machines l Find a linear hyperplane (decision boundary) that will separate the data

22 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22 Support Vector Machines l One Possible Solution

23 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23 Support Vector Machines l Another possible solution

24 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 24 Support Vector Machines l Other possible solutions

25 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 25 Support Vector Machines l Which one is better? B1 or B2? l How do you define better?

26 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 26 Support Vector Machines l Find hyperplane maximizes the margin => B1 is better than B2

27 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 27 Support Vector Machines

28 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 28 Support Vector Machines l What if the problem is not linearly separable?

29 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 29 Nonlinear Support Vector Machines l What if decision boundary is not linear?

30 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 30 Nonlinear Support Vector Machines l Transform data into higher dimensional space

31 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 31 Ensemble Methods l Construct a set of classifiers from the training data l Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

32 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 32 General Idea

33 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 33 Why does it work? l Suppose there are 25 base classifiers –Each classifier has error rate,  = 0.35 –Assume classifiers are independent –Probability that the ensemble classifier makes a wrong prediction:

34 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 34 Examples of Ensemble Methods l How to generate an ensemble of classifiers? –Bagging –Boosting

35 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 35 4/30/201535 Ensemble Methods l Ensemble methods –Use a combination of models to increase accuracy –Combine a series of k learned models, M 1, M 2, …, M k, with the aim of creating an improved model M* l Popular ensemble methods –Bagging: averaging the prediction over a collection of classifiers –Boosting: weighted vote with a collection of classifiers


Download ppt "Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar."

Similar presentations


Ads by Google