CIS 335 CIS 335 Data Mining Classification Part I.

Slides:



Advertisements
Similar presentations
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Advertisements

Classification Algorithms
Decision Tree Approach in Data Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
Decision Tree.
Machine Learning in Practice Lecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Classification Techniques: Decision Tree Learning
Overview Previous techniques have consisted of real-valued feature vectors (or discrete-valued) and natural measures of distance (e.g., Euclidean). Consider.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
What is Statistical Modeling
Lecture outline Classification Decision-tree classification.
Ensemble Learning: An Introduction
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Tree-based methods, neutral networks
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
Classification Based in part on Chapter 10 of Hand, Manilla, & Smyth and Chapter 7 of Han and Kamber David Madigan.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Classification.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Ensemble Learning (2), Tree and Forest
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
Chapter 7 Decision Tree.
Bayesian Networks. Male brain wiring Female brain wiring.
Inductive learning Simplest form: learn a function from examples
Slides for “Data Mining” by I. H. Witten and E. Frank.
Mohammad Ali Keyvanrad
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Classification Techniques: Bayesian Classification
CS690L Data Mining: Classification
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Decision Trees.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
10. Decision Trees and Markov Chains for Gene Finding.
DECISION TREES An internal node represents a test on an attribute.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Mining Lecture 11.
Introduction to Data Mining, 2nd Edition by
Classification and Prediction
Data Mining – Chapter 3 Classification
CSCI N317 Computation for Scientific Applications Unit Weka
Statistical Learning Dong Liu Dept. EEIS, USTC.
Presentation transcript:

CIS 335 CIS 335 Data Mining Classification Part I

CIS 335 what is a model? it can be a set of statistics or rules, tree, neural net, linear, etc how to build the model? assumptions?

CIS 335 what are applications of classification?

CIS 335 Labels the goal is to predict the class of an unlabeled instance what are examples of classes? how many labels can each have? is it feasible to get labeled instances? class label is discrete and unordered - why? numeric prediction is done by regression

CIS 335 sets training set validation set test set cross-validation (n-fold)

CIS 335 definitions instances, tuples, records, samples, rows,... attributes, features, variables,

CIS 335 two step process: learning (induction) predicting how does this relate to your own decision- making process?

CIS 335 data mining supervised - there is a class and labeled instances are available  classification  anomaly detection unsupervised - no class  clustering  association analysis

CIS 335 mapping function y = f(X) X is the instance f is the model, learned from the training data y is the class sometimes there are several "discriminators": f 1, f 2, f 3 - one for each class

CIS 335 overfitting model too accurately describes the training data doesn't do very well on new instances imagine a classifier that predicts student success based on g-number one that generalizes can be better sometimes post-processing can improve generalization how do you overfit?

CIS 335 accuracy number of correct predictions / total predictions for the confusion matrix: it is 98/115 =.85 what about the one below: abc a2131 b5452 c7427

CIS 335 decision trees model is the tree itself each branch is a test and the leafs are labels to classify an instance, trace the path through the tree where have you seen decision trees? old male uncle aunt cousin y y n n

CIS 335 what are they good for? classifying, of course give a description of the data (exploratory) tree form is intuitive simple and fast

CIS 335 induction ID3 -> c4.5 and CART were early classifiers (J48 is c4.5) input is instances with attributes and labels output is tree

CIS 335 Goal: pure leaves use splits to isolate each split makes leaves more pure yellow small tang orange lemon n y y n colorsizefruit orangesmalltang yellowsmalllemon yellowsmalllemon orangesmalltang orangesmallorange largeorange largeorange

CIS 335 Measuring Purity attr x + n y gini is a common metric for left leaf=1-( )=.32 for right leaf=1-( )=.42 for the entire split, use weighted sum gini(split)=.32* *.67=.38 7

CIS 335 Expanding the tree nodes that are not very pure can be further split on another attribute process can continue until  all nodes are pure  a threshold is met attr x + n y attr y

CIS 335 Numeric attributes and other splits choose a good number – one that produces the lowest gini evaluate all possible splits multiway splits are also possible e.g. marital status: S, D, M attr z + <10 ≥10 -

CIS 335 Greedy Algorithms example TSP

CIS 335 greedy algorithm look through each attribute calc result of split using gini or other measure select attribute/split with best result split can be  discrete  continous value  binary with splitting sets (careful about ordinal)

CIS 335 selection measures based on purity information gain gain ratio gini

CIS 335 pruning postprocessing subtrees can be removed if the purity is "good enough" sometimes subtrees can be repeated or replicated

CIS 335 Bayes classifier based on Bayes theorem good accuracy and speed assumes  iid  independence of attributes

CIS 335 Probability teenm/fbuy yfy nmy nfy nfy ymy nfy nmn ymn ymn nfn yfn ymn yfn ymn counts  how many total records _______  how many teens _______  how many female _______  how many buy _______ what is probability  of teens → p(teen=y) ______  of males → p(gender=male) _____  of buying → p(buy = y) _______

CIS 335 Conditional Probability teenm/fbuy yfy nmy nfy nfy ymy nfy nmn ymn ymn nfn yfn ymn yfn ymn of those that bought  how many teens _____  how many male _____ p(teen=y | buy=y): probability of being a teen given that you bought what is the conditional probability  p(teen | buy) ______  p(female | not buy) ______

CIS 335 Conditional Probability, cont. teenm/fbuy yfy nmy nfy nfy ymy nfy nmn ymn ymn nfn yfn ymn yfn ymn formula: let x be the event that cust is teen and y be the event that they buy what is p(x,y)? _______ what is p(x)? _______ what is p(x|y)? ________

CIS 335 Bayes formula derivation p(x,y) is the same as p(y,x) according to definition of conditional prob: and so and thus rearranging, we have

CIS 335 Bayes theorem variables: X is an instance, C is the class want to know p(C 0 | X) (probability that class is 0 given the evidence X) p(C 0 | X) is the posterior probability p(C 0 ) is the prior p(X) is the evidence p(X | C 0 ) is the likelihood

CIS 335 Calculating posterior directly teenm/fbuy yfy nmy nfy nfy ymy nfy nmn ymn ymn nfn yfn ymn yfn ymn p(buy | teen) = 2/8 p(buy | male) = 2/7 this can be done easily for one attribute p(buy | not teen, male)  there are only two instances  think about it for 100 attributes – data is just not available

CIS 335 example want to predict whether or not you will have a good day based on if you have had breakfast and whether the sun is shining let X={x 1,x 2 } be an instance, x 1 is breakfast(Y/N), x 2 is sunshine(Y/N) C is the class0=bad day, 1=good day

CIS 335 Naive Bayes p(C 0 | x 1,x 2 ) = p(x 1,x 2 | C 0 ) p(C 0 ) / p(X) problem is p(x1,x2 | C 0 ) is complex simplify by assuming attribute values are independent of each other

CIS 335 collecting data for discrete attributes p(C 0 ) = number of bad days / number of days p(x 1 =1 | C 0 ) is the number of bad days you had breakfast p(x 2 =1 | C 0 ) is the number of bad days the sun was shining p(x 1 =0 | C 0 ) is the number of bad days you didn't have breakfast p(x 2 =0 | C 0 ) is the number of bad days the sun wasn't shining p(x 1 =0,x 2 =0) percentage of days you didn't have breakfast and the sun was shining

CIS 335 another simplification do not have to calculate p(X) since it is the same for all posteriors regardless of class if > then >

CIS 335 collecting data for continuous attributes same general idea for the discrete attributes separate all values for an attribute for a particular class calculate mean and s.d. use these to calculate the prob for a particular value

CIS 335 comparing results confusion matrix precision TP/(TP+FP) recallTP/(TP+FN) accuracy(TP+TN)/(TP+FN+FP+TN) f1 metric2*prec*recall / (prec+recall) predicted actual yesno yesTPFN noFPTN

CIS 335 example precision = 95 / 109 = 0.87 recall = 95 / 98 = 0.97 accuracy = 182 / 199 = 0.91 f1 = 0.92 predicted actual yesno yes953 no1487