Tree methods: Dependent variable is categorical

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Chapter 7 Classification and Regression Trees
Data Mining Lecture 9.
Random Forest Predrag Radenković 3237/10
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
Classification with Multiple Decision Trees
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
Decision Tree.
Application of Decision Tree: Bankruptcy Prediction 2004/05/07.
A Quick Overview By Munir Winkel. What do you know about: 1) decision trees 2) random forests? How could they be used?
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Classification Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair.
Basic Data Mining Techniques Chapter Decision Trees.
Tree-based methods, neutral networks
Basic Data Mining Techniques
Classification and Regression trees: CART BOOSTING AND BAGGING RANDOM FOREST Data Mining Trees: ARF.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Decision Tree Models in Data Mining
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Decision Trees.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
Ronny Kohavi, ICML 1998 What is Data mining? Research Question Find Data Internal Databases Data Warehouses Internet Online databases Data Collection.
K Nearest Neighbors Classifier & Decision Trees
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
Multivariate Dyadic Regression Trees for Sparse Learning Problems Xi Chen Machine Learning Department Carnegie Mellon University (joint work with Han Liu)
CS690L Data Mining: Classification
1 Data Mining dr Iwona Schab Decision Trees. 2 Method of classification Recursive procedure which (progressively) divides sets of n units into groups.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Feature (Gene) Selection MethodsSample Classification Methods Gene filtering: Variance (SD/Mean) Principal Component Analysis Regression using variable.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
1 Illustration of the Classification Task: Learning Algorithm Model.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Chapter 26: Data Mining Part 2 Prepared by Assoc. Professor Bela Stantic.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
10. Decision Trees and Markov Chains for Gene Finding.
Illustrating Classification Task
Basic Technical Concepts in Machine Learning
Classification with Gene Expression Data
Teori Keputusan (Decision Theory)
Decision Trees (suggested time: 30 min)
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 6 Classification and Prediction
Classification by Decision Tree Induction
12/2/2018.
Data Mining – Chapter 3 Classification
Lecture 05: Decision Trees
Topic 12 An Introduction to Tree Models
Presentation transcript:

Tree methods: Dependent variable is categorical   Classification    trees (e.g., CART, C5, Firm, Tree) Decision Trees Decision Rules Tree methods: Dependent variable is numeric Regression Trees Credit Card? Car? Reject Age<30 Approve Y N

Classification trees Function f(X,Y) Tree form of f(X,Y) Y 4 1 2 3 X 1 3 4 X Y Y<4 X<3 Y<2 Function f(X,Y) Tree form of f(X,Y)

Classification tree for the cancer groups using 10 principal components of the top 100 cancer genes. The classification rule produces zero mistakes in the training set and five mistakes in the testing set.

Tree methods: Dependent variable is categorical   Classification    trees (e.g., CART, C5, Firm, Tree) Decision Trees Decision Rules Tree methods: Dependent variable is numeric Regression Trees Credit Card? Car? Reject Age<30 Approve Y N

Classification & Regression Trees Function f(X,Y) Tree form of f(X,Y) 5 2 3 4 X Y Y<4 X<3 2 Y<2 3 5 Classification & Regression Trees Fit a tree model to data. Recursive Partitioning Algorithm. At each node we perform a split: we chose a variable X and a value t that minimizes a criteria. The split: L = {X < t} ; R = { X t}

Regression Tree for log(Sales) HIP95 < 40.5 [Ave: 1.074, Effect: -0.76 ] HIP96 < 16.5 [Ave: 0.775, Effect: -0.298 ] RBEDS < 59 [Ave: 0.659, Effect: -0.117 ] HIP95 < 0.5 [Ave: 1.09, Effect: +0.431 ] -> 1.09 HIP95 >= 0.5 [Ave: 0.551, Effect: -0.108 ] KNEE96 < 3.5 [Ave: 0.375, Effect: -0.175 ] -> 0.375 KNEE96 >= 3.5 [Ave: 0.99, Effect: +0.439 ] -> 0.99 RBEDS >= 59 [Ave: 1.948, Effect: +1.173 ] -> 1.948 HIP96 >= 16.5 [Ave: 1.569, Effect: +0.495 ] FEMUR96 < 27.5 [Ave: 1.201, Effect: -0.368 ] -> 1.201 FEMUR96 >= 27.5 [Ave: 1.784, Effect: +0.215 ] -> 1.784 HIP95 >= 40.5 [Ave: 2.969, Effect: +1.136 ] KNEE95 < 77.5 [Ave: 2.493, Effect: -0.475 ] BEDS < 217.5 [Ave: 2.128, Effect: -0.365 ] -> 2.128 BEDS >= 217.5 [Ave: 2.841, Effect: +0.348 ] OUTV < 53937.5 [Ave: 3.108, Effect: +0.267 ] -> 3.108 OUTV >= 53937.5 [Ave: 2.438, Effect: -0.404 ] -> 2.438 KNEE95 >= 77.5 [Ave: 3.625, Effect: +0.656 ] SIR < 9451 [Ave: 3.213, Effect: -0.412 ] -> 3.213 SIR >= 9451 [Ave: 3.979, Effect: +0.354 ] -> 3.979

For regression trees two criteria functions are: For classification trees: criteria functions

Regression Tree

Smart and Mars (see doc) Other methods: Smart and Mars (see doc) Regularized Discriminant Analysis Bayesian Discriminant Analysis Flexible Discriminant Analysis.

Classification tree: > data(tissue) > x <- f.pca(f.toarray(tissue))$scores[,1:4] > x= data.frame(x,gr=gr) > library(rpart) > tr =rpart(factor(gr)~., data=x) n= 41   node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 41 22 3 (0.26829268 0.26829268 0.46341463) 2) PC3< -0.9359889 23 12 1 (0.47826087 0.47826087 0.04347826) 4) PC2< -1.154355 12 1 1 (0.91666667 0.00000000 0.08333333) * 5) PC2>=-1.154355 11 0 2 (0.00000000 1.00000000 0.00000000) * 3) PC3>=-0.9359889 18 0 3 (0.00000000 0.00000000 1.00000000) * > plot(tr) > text(tr) >