Download presentation

Presentation is loading. Please wait.

Published byFelicity Brummitt Modified over 2 years ago

1
Non-Metric Methods: Decision Trees Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

2
Decision Trees Motivation: There are features (discrete) that don’t have an obvious notion of similarity or ordering (nominal data), e.g., book type, shape, sound type Taxonomies (i.e., trees with is-a relationship) are the oldest form of classification

3
Decision Trees: Definition Decision Trees are classifiers that classify samples based on a set of questions that are asked hierarchically (tree of questions) Example questions is color red? is x < 0.5? Terminology: root, leaf, node, arc, branch, parent, children, branching factor, depth

4
Fruit classifier Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour

5
Fruit classification Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour CHERRY

6
Fruit classification Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour CHERRY

7
Fruit classification Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour CHERRY

8
Fruit classification Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour CHERRY

9
Fruit classifier Color? green yellow red Size? Shape? Size? Taste? bigmed round thin big small med big small med sweetsour watermelon grape grapefruit cherrygrape

10
Binary Trees Binary trees: each parent node has exactly two children nodes (branching factor = 2) Any tree can be represented as a binary tree by changing set of questions and by increasing the tree depth e.g., Color? green yellow red Color = green? Color = yellow? YN Y N

11
Decision Trees: Problems 1.List of questions (features) All possible questions are considered 2.Which questions to split first (best split) The questions that split the data best (reduce impurity at each node) are asked first 3.Stopping criteria (pruning criteria) Stop when further splits don’t reduce imprurity

12
Best Split example Two class problem with 100 examples from w1 and w2 Three binary questions Q1, Q2 and Q3 that split the data as follows: 1. Node 1: (50,50)Node 2: (50,50) 2. Node 1: (100,0)Node 2: (0,100) 3. Node 1: (80,0)Node 2: (20,100)

13
Impurity Measures Impurity measures the degree of homogeneity of a node; a node is pure if it consists of training examples from a single class Impurity Measures Entropy Impurity: i(N) = - i P(w i ) log 2 (P(w i )) Variance (two-class): i(N) = P(w 1 ) P(w 2 ) Gini Impurity: i(N) = 1- i P 2 (w i ) Misclassification: i(N) = 1- max i P(w i )

14
Total Impurity Total Impurity at Depth 0: i(depth =0) = i(N) Total Impurity at Depth 1: i(depth =1) = p(N L ) i(N L ) + p(N R ) i(N R ) N yes no NLNL NRNR Depth 0 Depth 1

15
Impurity Example Node 1: (80,0)Node 2: (20,100) I(node 1) = 0 I(node 2) = - 20/120 log2(20/120) - 100/120 log2(100/120) = 0.65 P(node 1) = 80/200 = 0.4 P(node 2) = 120/200 = 0.6 I(total) = P(node 1) I(node 1) + P(node 2) I(node 2) = = 0 + 0.6*0.65 = 0.39

16
Continuous Example For continuous features: questions are of the type x
{
"@context": "http://schema.org",
"@type": "ImageObject",
"contentUrl": "http://images.slideplayer.com/12/3525071/slides/slide_16.jpg",
"name": "Continuous Example For continuous features: questions are of the type x

17
Summary Decision trees are useful categorical classification tools especially for nominal (non-metric) data CART creates trees that minimize impurity on the training set at each node Decision region shape CART is a useful tool for feature selection

Similar presentations

OK

Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.

Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on youth power in india Ppt on personality development for students Ppt on albert einstein at school Ppt on diode characteristics Ppt on digital media broadcasting korea Ppt on file system Maths ppt on rational numbers Professional ppt on home automation Ppt on power system stability Ppt on solid figures