Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.

Slides:



Advertisements
Similar presentations
Classification and Prediction
Advertisements

Data Mining Lecture 9.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Classification Techniques: Decision Tree Learning
Lecture outline Classification Decision-tree classification.
Classification and Prediction
Classification & Prediction
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification Continued
1 DATA MINING. 2 Introduction Outline Define data mining Data mining vs. databases Basic data mining tasks Data mining development Data mining issues.
Classification II.
Classification and Prediction
Classification.
Chapter 4 Classification and Scoring
Chapter 7 Decision Tree.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Mohammad Ali Keyvanrad
11/9/2012ISC471 - HCI571 Isabelle Bichindaritz 1 Classification.
Basics of Decision Trees  A flow-chart-like hierarchical tree structure –Often restricted to a binary structure  Root: represents the entire dataset.
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Basic Data Mining Technique
Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Feature Selection: Why?
Ch10 Machine Learning: Symbol-Based
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification CS 685: Special Topics in Data Mining Fall 2010 Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
Classification And Bayesian Learning
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Classification and Prediction
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Data Mining and Decision Support
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Decision Trees.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
1 March 9, 2016Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 4 — Classification.
Decision Tree. Classification Databases are rich with hidden information that can be used for making intelligent decisions. Classification is a form of.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Chapter 6 Classification and Prediction
Information Management course
Classification and Prediction
Chapter 8 Tutorial.
Classification by Decision Tree Induction
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining: Concepts and Techniques
CS 685: Special Topics in Data Mining Jinze Liu
Data Mining – Chapter 3 Classification
Classification & Prediction
Classification and Prediction
CS 685: Special Topics in Data Mining Jinze Liu
CSCI N317 Computation for Scientific Applications Unit Weka
©Jiawei Han and Micheline Kamber
Classification.
CS 685: Special Topics in Data Mining Spring 2009 Jinze Liu
CS 685: Special Topics in Data Mining Jinze Liu
Classification 1.
Presentation transcript:

Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas

Outline Classification Introduction Decision Tree Classifier Accuracy Measures

Classification and Prediction Classification and Prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends For example: Bank loan applicants are “safe” or “risky” Guess a customer will buy a new computer? Analysis cancer data to predict which one of three specific treatments should apply

Classification Classification is a Two-Step Process Learning step: classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data Prediction step: predicts categorical class labels (discrete or nominal)

Learning step: Model Construction Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

Learning step Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision trees, or mathematical formulae

Prediction step: Using the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

Prediction step Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of test set samples that are correctly classified by the model Test set is independent of training set, otherwise over-fitting will occur

N-fold Cross-validation In order to solve over-fitting problem, n-fold cross- validation is usually used For example, 7 fold cross validation: Divide the whole training dataset into 7 parts equally Take the first part away, train the model on the rest 6 portions After the model is trained, feed the first part as testing dataset, obtain the accuracy Repeat step two and three, but take the second part away and so on…

Supervised learning VS Unsupervised learning Because the class label of each training tuple is provided, this step is also known as supervised learning It contrasts with unsupervised learning (or clustering), in which the class label of each training tuple is unknown

Issues: Data Preparation Data cleaning Preprocess data in order to reduce noise and handle missing values Relevance analysis (feature selection) Remove the irrelevant or redundant attributes Data transformation Generalize and/or normalize data

Issues: Evaluating Classification Methods Accuracy Speed time to construct the model (training time) time to use the model (classification/prediction time) Robustness: handling noise and missing values Scalability: efficiency in disk-resident databases Interpretability

Outline Classification Introduction Decision Tree Classifier Accuracy Measures

Decision Tree Decision Tree induction is the learning of decision trees from class-labeled training tuples A decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute Each Branch represents an outcome of the test Each Leaf node holds a class label

Decision Tree Example

Decision Tree Algorithm Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide- and-conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)

Attribute Selection Measure: Information Gain (ID3/C4.5) Select the attribute with the highest information gain Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D| Expected information (entropy) needed to classify a tuple in D:

Attribute Selection Measure: Information Gain (ID3/C4.5) Information needed (after using A to split D into v partitions) to classify D: Information gained by branching on attribute A

Decision Tree

means “age <=30” has 5 out of 14 samples, with 2 yes’s and 3 no’s. I(2,3) = -2/5 * log(2/5) – 3/5 * log(3/5)

Tools log.html log.html

Decision Tree Info age (D)= 5/14 I(2,3)+4/14 I(4,0)+ 5/14 I(3,2) = 5/14 * ( )+ 4/14 * (0)+ 5/14 * ( )=0.694 For type in -2/5*log2(2/5)-3/5*log2(3/5)

Decision Tree ( ) Similarily, we can compute Gain(income)=0.029 Gain(student)=0.151 Gain(credit_rating)=0.048 Since “student” obtains highest information gain, we can partition the tree into:

Decision Tree Info income (D)= 4/14 I(3,1)+6/14 I(4,2)+ 4/14 I(2,2) = 4/14 * ( )+ 6/14 * ( )+ 4/14 * (1)= =0.911 Gain(income)= =0.029

Decision Tree

Another Decision Tree Example

Decision Tree Example Info(Tenured)=I(3,3)= log2(12)=log12/log2= / =

Decision Tree Example Info RANK (Tenured)= 3/6 I(1,2) + 2/6 I(1,1) + 1/6 I(1,0)= 3/6 * ( ) + 2/6 (1) + 1/6 (0)= /6 I(1,2) means “Assistant Prof” has 3 out of 6 samples, with 1 yes’s and 2 no’s. 2/6 I(1,1) means “Associate Prof” has 2 out of 6 samples, with 1 yes’s and 1 no’s. 1/6 I(1,0) means “Professor” has 1 out of 6 samples, with 1 yes’s and 0 no’s.

Decision Tree Example Info YEARS (Tenured)= 1/6 I(1,0) + 2/6 I(0,2) + 1/6 I(0,1) + 2/6 I (2,0)= 0 1/6 I(1,0) means “years=2” has 1 out of 6 samples, with 1 yes’s and 0 no’s. 2/6 I(0,2) means “years=3” has 2 out of 6 samples, with 0 yes’s and 2 no’s. 1/6 I(0,1) means “years=6” has 1 out of 6 samples, with 0 yes’s and 1 no’s. 2/6 I(2,0) means “years=7” has 2 out of 6 samples, with 2 yes’s and 0 no’s.

Group Practice Example

Outline Classification Introduction Decision Tree Classifier Accuracy Measures

classes(Real) buy computer = yes (Real) buy computer = no total (Predict) buy computer = yes (Predict) buy computer = no total7000 (Buy Computer) 3000 (Does not buy Computer) 10000

Classifier Accuracy Measures Alternative accuracy measures (e.g., for cancer diagnosis) sensitivity = t-pos/pos = 6954/7000 specificity = t-neg/neg = 2588/3000 precision = t-pos/(t-pos + f-pos) = 6954/7366 accuracy =