MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

CHAPTER 9: Decision Trees
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture outline Classification Decision-tree classification.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Tree-based methods, neutral networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part I: Classification and Bayesian Learning
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Ensemble Learning (2), Tree and Forest
Decision Tree Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Machine Learning Decision Trees. E. Keogh, UC Riverside Decision Tree Classifier Ross Quinlan Antenna Length Abdomen Length.
Decision Tree Learning
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Machine Learning 5. Parametric Methods.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
Classification and Regression Trees
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
10. Decision Trees and Markov Chains for Gene Finding.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
k-Nearest neighbors and decision tree
Ch9: Decision Trees 9.1 Introduction A decision tree:
Roberto Battiti, Mauro Brunato
Data Mining – Chapter 3 Classification
Statistical Learning Dong Liu Dept. EEIS, USTC.
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
INTRODUCTION TO Machine Learning
Presentation transcript:

MACHINE LEARNING 10 Decision Trees

Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all data  Non-Parametric  Find “similar”/”close” data points  Fit local model using these points  Costly computation of distance from all training data Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2

Motivation  Pre-split training data into region using small number of simple rules organized in hierarchical manner  Decision Trees  Internal decision nodes have splitting rule  Terminal leaves have class labels for classification problem or values for regression problem Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3

Tree Uses Nodes, and Leaves Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4

Decision Trees  Start from univariate decision trees  Each node looks only at single input feature  Want smaller decision trees  Less memory for representation  Less computation for a new instance  Want smaller generalization error Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5

Decision and Leaf Node  Implement simple test function f m (x)  Output: labels of branches  f m (x) discriminant in d-dimensional space  Complex discriminant is broken down into hierarchy of simple decisions  Leaf node describes a region in d-dimensional space with same value  Classification label  Regression value Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6

Classification Trees  What is the good split function?  Use Impurity measure  Assume N m training samples reach node m    Node m is pure if for all classes either 0 or 1  Need values in between Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7

Entropy  Measure amount of uncertainty on a scale from 0 to 1  Example: 2 events  If p1=p2=0.5, entropy is 1 which is maximum uncertainty  If p1=1=1-p0, entropy is 0, which is no uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8

Entropy Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9

Best Split Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10  Node is impure, need to split more  Have several split criteria (coordinates), have to choose optimal  Minimize impurity (uncertainty) after split  Stop when impurity is small enough  Zero stop impurity=>complex tree with large variance  Larger stop impurity=>small tress but large bias

Best Split  Impurity after split: N mj of N m take branch j.  N i mj belong to C i  Find the variable and split that min impurity  among all variables  split positions for numeric variables Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 ID3 algorithm for Classification and Regression Trees(CART)

Regression Trees  Value not a label in a leaf nodes  Need other impurity measure  Use Average Error Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13

Regression Trees Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14  After splitting:

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Example

Pruning Trees Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16  Number of data instances reach a node is small  Less then 5% of training data  Don’t want to split further regardless of impurity  Remove subtrees for better generalization  Prepruning: Early stopping  Postpruning: Grow the whole tree then prune subtrees Set aside pruning set Make sure pruning does not significantly increase error

Decision Trees and Feature Extraction  Univariate Tree uses only certain variable  Some variables might not get used  Features closer to the root have greater importance Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17

Interpretability  Conditions that are simple to understand  Path from the root =>one conjunction of test  All paths can be defined using set of IF_THEN rules  Form a rule base  Percentage of training data covered by the rule  Rule support  Tool for a Knowledge Extraction  Can be verified by experts Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18

Rule Extraction from Trees Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19 C4.5Rules (Quinlan, 1993)

Rule induction  Learn rules directly from data  Decision-tree is a breadth-first rule construction  Rule induction: depth-first construction  Start  Learn rules one by one  Rule is a conjunction of conditions  Add condition one by one by certain criteria  Entropy  Remove samples covered by rule from training data Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20

Ripper algorithm  Assume two classes K=2, positive and negative examples  Add rules to explain positive examples, all other examples are classified as negative  Foil algorithm: add condition to a rule to maximize information gain Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21

Multivariate Trees Based on for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22