Part II - Classification© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II - Classification Margaret H. Dunham Department of Computer.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Support Vector Machines
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Supervised Learning Recap
Indian Statistical Institute Kolkata
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
CSE 5331/7331 F' CSE 5331/7331 Fall 2011 DATA MINING Introductory and Related Topics Margaret H. Dunham Department of Computer Science and Engineering.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
x – independent variable (input)
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Data Mining Techniques Outline
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Classification Continued
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
CS Instance Based Learning1 Instance Based Learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Chapter 5 Data mining : A Closer Look.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
CSE 5331/7331 F'091 CSE 5331/7331 Fall 2009 DATA MINING Introductory and Related Topics Margaret H. Dunham Department of Computer Science and Engineering.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
CSE 8392 Spring CSE 8392 SPRING 1999 DATA MINING: CORE TOPICS Classification Professor Margaret H. Dunham Department of Computer Science and Engineering.
Chapter 9 Neural Network.
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 DATA MINING Introductory and Related Topics Margaret H. Dunham Department of Computer Science.
CS690L Data Mining: Classification
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
 Based on observed functioning of human brain.  (Artificial Neural Networks (ANN)  Our view of neural networks is very simplistic.  We view a neural.
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Lecture Notes for Chapter 4 Introduction to Data Mining
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Classification Today: Basic Problem Decision Trees.
Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining,
DATA MINING Introductory and Advanced Topics Part I
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
CS 9633 Machine Learning Support Vector Machines
Classification.
DATA MINING Spatial Clustering
Data Mining, Neural Network and Genetic Programming
DATA MINING Introductory and Advanced Topics Part II - Clustering
Advanced Artificial Intelligence Classification
A task of induction to find patterns
Presentation transcript:

Part II - Classification© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II - Classification Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.

Part II - Classification© Prentice Hall2 Classification Outline Classification Problem Overview Classification Problem Overview Regression Regression Similarity Measures Similarity Measures Bayesian Classification Bayesian Classification Decision Trees Decision Trees Rules Rules Neural Networks Neural Networks Goal: Provide an overview of the classification problem and introduce some of the basic algorithms

Part II - Classification© Prentice Hall3 Classification Problem Given a database D={t 1,t 2,…,t n } and a set of classes C={C 1,…,C m }, the Classification Problem is to define a mapping f:D  C where each t i is assigned to one class. Given a database D={t 1,t 2,…,t n } and a set of classes C={C 1,…,C m }, the Classification Problem is to define a mapping f:D  C where each t i is assigned to one class. Actually divides D into equivalence classes. Actually divides D into equivalence classes. Prediction is similar, but may be viewed as having infinite number of classes. Prediction is similar, but may be viewed as having infinite number of classes.

Part II - Classification© Prentice Hall4 Classification Examples Teachers classify students’ grades as A, B, C, D, or F. Teachers classify students’ grades as A, B, C, D, or F. Identify mushrooms as poisonous or edible. Identify mushrooms as poisonous or edible. Predict when a river will flood. Predict when a river will flood. Identify individuals with credit risks. Identify individuals with credit risks. Speech recognition Speech recognition Pattern recognition Pattern recognition

Part II - Classification© Prentice Hall5 Classification Ex: Grading If x >= 90 then grade =A. If x >= 90 then grade =A. If 80<=x<90 then grade =B. If 80<=x<90 then grade =B. If 70<=x<80 then grade =C. If 70<=x<80 then grade =C. If 60<=x<70 then grade =D. If 60<=x<70 then grade =D. If x<50 then grade =F. If x<50 then grade =F. >=90<90 x >=80<80 x >=70<70 x F B A >=60<50 x C D

Part II - Classification© Prentice Hall6 Classification Ex: Letter Recognition View letters as constructed from 5 components: Letter C Letter E Letter A Letter D Letter F Letter B

Part II - Classification© Prentice Hall7 Classification Techniques Approach: Approach: 1.Create specific model by evaluating training data (or using domain experts’ knowledge). 2.Apply model developed to new data. Classes must be predefined Classes must be predefined Most common techniques use DTs, NNs, or are based on distances or statistical methods. Most common techniques use DTs, NNs, or are based on distances or statistical methods.

Part II - Classification© Prentice Hall8 Defining Classes Partitioning Based Distance Based

Part II - Classification© Prentice Hall9 Issues in Classification Missing Data Missing Data –Ignore –Replace with assumed value Measuring Performance Measuring Performance –Classification accuracy on test data –Confusion matrix –OC Curve

Part II - Classification© Prentice Hall10 Height Example Data

Part II - Classification© Prentice Hall11 Classification Performance True Positive True NegativeFalse Positive False Negative

Part II - Classification© Prentice Hall12 Confusion Matrix Example Using height data example with Output1 correct and Output2 actual assignment

Part II - Classification© Prentice Hall13 Operating Characteristic Curve

Part II - Classification© Prentice Hall14 Regression Assume data fits a predefined function Assume data fits a predefined function Determine best values for regression coefficients c 0,c 1,…,c n. Determine best values for regression coefficients c 0,c 1,…,c n. Assume an error: y = c 0 +c 1 x 1 +…+c n x n Assume an error: y = c 0 +c 1 x 1 +…+c n x n +  Estimate error using mean squared error for training set:

Part II - Classification© Prentice Hall15 Linear Regression Poor Fit

Part II - Classification© Prentice Hall16 Classification Using Regression Division: Use regression function to divide area into regions. Division: Use regression function to divide area into regions. Prediction: Use regression function to predict a class membership function. Input includes desired class. Prediction: Use regression function to predict a class membership function. Input includes desired class.

Part II - Classification© Prentice Hall17Division

Part II - Classification© Prentice Hall18Prediction

Part II - Classification© Prentice Hall19 Classification Using Distance Place items in class they are “closest” to. Place items in class they are “closest” to. Must determine distance between an item and a class. Must determine distance between an item and a class. Classes represented by Classes represented by –Centroid: Central value. –Medoid: Representative point. –Individual points Algorithm: KNN Algorithm: KNN

Part II - Classification© Prentice Hall20 K Nearest Neighbor (KNN): Training set includes classes. Training set includes classes. Examine K items near item to be classified. Examine K items near item to be classified. New item placed in class with the most number of close items. New item placed in class with the most number of close items. O(n) for each tuple to be classified. O(n) for each tuple to be classified.

Part II - Classification© Prentice Hall21 KNN Algorithm

Part II - Classification© Prentice Hall22 DT Classification Partitioning based: Divide search space into rectangular regions. Partitioning based: Divide search space into rectangular regions. Tuple placed into class based on the region within which it falls. Tuple placed into class based on the region within which it falls. DT approaches differ in how the tree is built: DT Induction DT approaches differ in how the tree is built: DT Induction Internal nodes associated with attribute and arcs with values for that attribute. Internal nodes associated with attribute and arcs with values for that attribute. Algorithms: ID3, C4.5, CART Algorithms: ID3, C4.5, CART

Part II - Classification© Prentice Hall23 Decision Tree Given: –D = {t 1, …, t n } where t i = –D = {t 1, …, t n } where t i = –Database schema contains {A 1, A 2, …, A h } –Classes C={C 1, …., C m } Decision or Classification Tree is a tree associated with D such that –Each internal node is labeled with attribute, A i –Each arc is labeled with predicate which can be applied to attribute at parent –Each leaf node is labeled with a class, C j

Part II - Classification© Prentice Hall24 DT Induction

Part II - Classification© Prentice Hall25 DT Splits Area Gender Height M F

Part II - Classification© Prentice Hall26 Comparing DTs Balanced Deep

Part II - Classification© Prentice Hall27 DT Issues Choosing Splitting Attributes Choosing Splitting Attributes Ordering of Splitting Attributes Ordering of Splitting Attributes Splits Splits Tree Structure Tree Structure Stopping Criteria Stopping Criteria Training Data Training Data Pruning Pruning

Part II - Classification© Prentice Hall28 Decision Tree Induction is often based on Information Theory So

Part II - Classification© Prentice Hall29 Information

Part II - Classification© Prentice Hall30 DT Induction When all the marbles in the bowl are mixed up, little information is given. When all the marbles in the bowl are mixed up, little information is given. When the marbles in the bowl are all from one class and those in the other two classes are on either side, more information is given. When the marbles in the bowl are all from one class and those in the other two classes are on either side, more information is given. Use this approach with DT Induction !

Part II - Classification© Prentice Hall31 Information/Entropy Given probabilitites p 1, p 2,.., p s whose sum is 1, Entropy is defined as: Given probabilitites p 1, p 2,.., p s whose sum is 1, Entropy is defined as: Entropy measures the amount of randomness or surprise or uncertainty. Entropy measures the amount of randomness or surprise or uncertainty. Information is maximized when entropy is minimized. Information is maximized when entropy is minimized.

Part II - Classification© Prentice Hall32 ID3 Creates tree using information theory concepts and tries to reduce expected number of comparison.. Creates tree using information theory concepts and tries to reduce expected number of comparison.. ID3 chooses split attribute with the highest information gain: ID3 chooses split attribute with the highest information gain:

Part II - Classification© Prentice Hall33 Entropy log (1/p)p log (1/p)H(p,1-p)

Part II - Classification© Prentice Hall34 ID3 Example (Output1) Starting state entropy: Starting state entropy: 4/15 log(15/4) + 8/15 log(15/8) + 3/15 log(15/3) = Gain using gender: Gain using gender: –Female: 3/9 log(9/3)+6/9 log(9/6)= –Male: 1/6 (log 6/1) + 2/6 log(6/2) + 3/6 log(6/3) = –Weighted sum: (9/15)(0.2764) + (6/15)(0.4392) = –Gain: – = Gain using height: Gain using height: – (2/15)(1) = Choose height as first splitting attribute Choose height as first splitting attribute

Part II - Classification© Prentice Hall35 C4.5 ID3 favors attributes with large number of divisions ID3 favors attributes with large number of divisions Improved version of ID3: Improved version of ID3: –Missing Data –Continuous Data –Pruning –Rules –GainRatio:

Part II - Classification© Prentice Hall36 CART Create Binary Tree Create Binary Tree Uses entropy Uses entropy Formula to choose split point, s, for node t: Formula to choose split point, s, for node t: P L,P R probability that a tuple in the training set will be on the left or right side of the tree. P L,P R probability that a tuple in the training set will be on the left or right side of the tree.

Part II - Classification© Prentice Hall37 CART Example At the start, there are six choices for split point: At the start, there are six choices for split point: –P(Gender)= 2(6/15)(9/15)(2/15 + 4/15 + 3/15)=0.224 –P(1.6) = 0 –P(1.7) = 2(2/15)(13/15)(0 + 8/15 + 3/15) = –P(1.8) = 2(5/15)(10/15)(4/15 + 6/15 + 3/15) = –P(1.9) = 2(9/15)(6/15)(4/15 + 2/15 + 3/15) = –P(2.0) = 2(12/15)(3/15)(4/15 + 8/15 + 3/15) = 0.32 Split at 1.8 Split at 1.8

Part II - Classification© Prentice Hall38 DT Advantages/Disadvantages Advantages: Advantages: –Easy to understand. –Easy to generate rules Disadvantages: Disadvantages: –May suffer from overfitting. –Classifies by rectangular partitioning. –Does not easily handle nonnumeric data. –Can be quite large – pruning is necessary.

Part II - Classification© Prentice Hall39 Rules Perform classification using If-Then rules Perform classification using If-Then rules Classification Rule: r = Classification Rule: r = Antecedent, Consequent May generate from from other techniques (DT, NN) or generate directly. May generate from from other techniques (DT, NN) or generate directly. Direct Algorithms: 1R, PRISM Direct Algorithms: 1R, PRISM

Part II - Classification© Prentice Hall40 Generating Rules from DTs

Part II - Classification© Prentice Hall41 Generating Rules Example

Part II - Classification© Prentice Hall42 1R Algorithm

Part II - Classification© Prentice Hall43 1R Example

Part II - Classification© Prentice Hall44 PRISM Algorithm

Part II - Classification© Prentice Hall45 PRISM Example

Part II - Classification© Prentice Hall46 Decision Tree vs. Rules Tree has implied order in which splitting is performed. Tree has implied order in which splitting is performed. Tree created based on looking at all classes. Tree created based on looking at all classes. Rules have no ordering of predicates. Rules have no ordering of predicates. Only need to look at one class to generate its rules. Only need to look at one class to generate its rules.

Part II - Classification© Prentice Hall47 NN Typical NN structure for classification: Typical NN structure for classification: –One output node per class –Output value is class membership function value Supervised learning Supervised learning For each tuple in training set, propagate it through NN. Adjust weights on edges to improve future classification. For each tuple in training set, propagate it through NN. Adjust weights on edges to improve future classification. Algorithms: Propagation, Backpropagation, Gradient Descent Algorithms: Propagation, Backpropagation, Gradient Descent

Part II - Classification© Prentice Hall48 NN Issues Number of source nodes Number of source nodes Number of hidden layers Number of hidden layers Training data Training data Number of sinks Number of sinks Interconnections Interconnections Weights Weights Activation Functions Activation Functions Learning Technique Learning Technique When to stop learning When to stop learning

Part II - Classification© Prentice Hall49 Decision Tree vs. Neural Network

Part II - Classification© Prentice Hall50 Propagation Tuple Input Output

Part II - Classification© Prentice Hall51 NN Propagation Algorithm

Part II - Classification© Prentice Hall52 Example Propagation © Prentie Hall

Part II - Classification© Prentice Hall53 NN Learning Adjust weights to perform better with the associated test data. Adjust weights to perform better with the associated test data. Supervised: Use feedback from knowledge of correct classification. Supervised: Use feedback from knowledge of correct classification. Unsupervised: No knowledge of correct classification needed. Unsupervised: No knowledge of correct classification needed.

Part II - Classification© Prentice Hall54 NN Supervised Learning

Part II - Classification© Prentice Hall55 Supervised Learning Possible error values assuming output from node i is y i but should be d i : Possible error values assuming output from node i is y i but should be d i : Change weights on arcs based on estimated error Change weights on arcs based on estimated error

Part II - Classification© Prentice Hall56 NN Backpropagation Propagate changes to weights backward from output layer to input layer. Propagate changes to weights backward from output layer to input layer. Delta Rule:  w ij = c x ij (d j – y j ) Delta Rule:  w ij = c x ij (d j – y j ) Gradient Descent: technique to modify the weights in the graph. Gradient Descent: technique to modify the weights in the graph.

Part II - Classification© Prentice Hall57 Backpropagation Error

Part II - Classification© Prentice Hall58 Backpropagation Algorithm

Part II - Classification© Prentice Hall59 Gradient Descent

Part II - Classification© Prentice Hall60 Gradient Descent Algorithm

Part II - Classification© Prentice Hall61 Output Layer Learning

Part II - Classification© Prentice Hall62 Hidden Layer Learning

Part II - Classification© Prentice Hall63 Types of NNs Different NN structures used for different problems. Different NN structures used for different problems. Perceptron Perceptron Self Organizing Feature Map Self Organizing Feature Map Radial Basis Function Network Radial Basis Function Network

Part II - Classification© Prentice Hall64 Perceptron Perceptron is one of the simplest NNs. Perceptron is one of the simplest NNs. No hidden layers. No hidden layers.

Part II - Classification© Prentice Hall65 Perceptron Example Suppose: Suppose: –Summation: S=3x 1 +2x 2 -6 –Activation: if S>0 then 1 else 0

Part II - Classification© Prentice Hall66 Self Organizing Feature Map (SOFM) Competitive Unsupervised Learning Competitive Unsupervised Learning Observe how neurons work in brain: Observe how neurons work in brain: –Firing impacts firing of those near –Neurons far apart inhibit each other –Neurons have specific nonoverlapping tasks Ex: Kohonen Network Ex: Kohonen Network

Part II - Classification© Prentice Hall67 Kohonen Network

Part II - Classification© Prentice Hall68 Kohonen Network Competitive Layer – viewed as 2D grid Competitive Layer – viewed as 2D grid Similarity between competitive nodes and input nodes: Similarity between competitive nodes and input nodes: –Input: X = –Input: X = –Weights: –Weights: –Similarity defined based on dot product Competitive node most similar to input “wins” Competitive node most similar to input “wins” Winning node weights (as well as surrounding node weights) increased. Winning node weights (as well as surrounding node weights) increased.

Part II - Classification© Prentice Hall69 Radial Basis Function Network RBF function has Gaussian shape RBF function has Gaussian shape RBF Networks RBF Networks –Three Layers –Hidden layer – Gaussian activation function –Output layer – Linear activation function

Part II - Classification© Prentice Hall70 Radial Basis Function Network

Part II - Classification© Prentice Hall71 NN Advantages/Disadvantages Advantages: Advantages: –Can continue the learning process even after the training set has been applied. –Can easily be parallelized. Disadvantages: Disadvantages: –Difficult to understand. –May suffer from overfitting. –Structure of graph must be determined apriori. –Input attribute values must be numeric. –Verification of correct functions of the NN may be difficult to perform.