DECISION TREES. Decision trees  One possible representation for hypotheses.

Slides:

Advertisements

Similar presentations

Learning from Observations Chapter 18 Section 1 – 3.

Advertisements

Decision Tree Approach in Data Mining

Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,

1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.

Classification Techniques: Decision Tree Learning

Decision Tree Learning 主講人：虞台文大同大學資工所智慧型多媒體研究室.

Chapter 7 – Classification and Regression Trees

Chapter 7 – Classification and Regression Trees

Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,

Lecture outline Classification Decision-tree classification.

Decision Tree Rong Jin. Determine Milage Per Gallon.

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

Ensemble Learning: An Introduction

ICS 273A Intro Machine Learning

Classification Continued

Lecture 5 (Classification with Decision Trees)

Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.

Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.

Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.

ICS 273A Intro Machine Learning

Classification.

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Ensemble Learning (2), Tree and Forest

Learning Chapter 18 and Parts of Chapter 20

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Issues with Data Mining

DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Bayesian Networks. Male brain wiring Female brain wiring.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,

Chapter 9 – Classification and Regression Trees

Learning from Observations Chapter 18 Through

CHAPTER 18 SECTION 1 – 3 Learning from Observations.

Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.

CS690L Data Mining: Classification

ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.

1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.

Decision Tree Learning

Lecture Notes for Chapter 4 Introduction to Data Mining

CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,

Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.

Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.

Chapter 18 Section 1 – 3 Learning from Observations.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Learning From Observations Inductive Learning Decision Trees Ensembles.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.

Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Learning from Observations

Learning from Observations

Introduce to machine learning

Presented By S.Yamuna AP/CSE

Issues in Decision-Tree Learning Avoiding overfitting through pruning

Introduction to Data Mining, 2nd Edition by

Introduction to Data Mining, 2nd Edition by

K Nearest Neighbor Classification

Introduction to Data Mining, 2nd Edition by

Basic Concepts and Decision Trees

Learning from Observations

Learning from Observations

Decision trees One possible representation for hypotheses

Presentation transcript:

DECISION TREES

Decision trees  One possible representation for hypotheses

Choosing an attribute  Idea : a good attribute splits the examples into subsets that are ( ideally ) " all positive " or " all negative "  Which is a better choice ?  Patrons

Using information theory  Implement Choose-Attribute in the DTL algorithm based on information content – measured by Entropy  Entropy is the measure of uncertainty of a random variable  More uncertainty leads to higher entropy  More knowledge leads to lower entropy

Entropy

Entropy Examples

Information Gain  Measures Reduction in Entropy achieved because of the split.  Choose the split that achieves most reduction ( maximizes Information Gain )  Disadvantage : Tends to prefer splits that result in large number of partitions, each being small but pure.

Information Gain Example  Consider the attributes Patrons and Type :  Patrons has the highest Information Gain of all attributes and so is chosen by the DTL algorithm as the root

Learned Restaurant Tree  Decision tree learned from the 12 examples :  Substantially simpler than the full tree  Raining and Reservation were not necessary to classify all the data.

Stopping Criteria  Stop expanding a node when all the records belong to the same class  Stop expanding a node when all the records have similar attribute values

Overfitting  Overfitting results in decision trees that are more complex than necessary  Training error does not provide a good estimate of how well the tree will perform on previously unseen records ( need a test set )

How to Address Overfitting 1 …

How to Address Overfitting 2 …

How to Address Overfitting…  Is the early stopping rule strictly better than pruning ( i. e., generating the full tree and then cutting it )?

Remaining Challenges…  Continuous values :  Need to be split into discrete categories.  Sort all values, then consider split points between two examples in sorted order that have different classifications.  Missing values :  Affect how an example is classified, information gain calculations, test set error rate.  Pretend that the example has all possible values for the missing attribute, weight by its frequency among all the examples in the current node.

Summary  Advantages of decision trees :  Inexpensive to construct  Extremely fast at classifying unknown records  Easy to interpret for small - sized trees  Accuracy is comparable to other classification techniques for many simple data sets  Learning performance = prediction accuracy measured on test set

K - NEAREST NEIGHBORS

K - Nearest Neighbors  What value do we assign to the green sample ?

K - Nearest Neighbors k = 1 k = 3

Decision Regions for 1- NN

K - Nearest Neighbors

Weighting the Distance to Remove Irrelevant Features o o oo o o o o o o o o o o o o o o ?

o o oo o o o o o o o o o o o o o o ?

oooooooooooooooooo ?

Nearest Neighbors Search q p

Quadtree

28 Quadtree Construction Input : point set P while Some cell C contains more than 1 point do Split cell C end j k fg l d a b c e i h X h b i a c de g f k j Y l X 25, Y 300 X 50, Y 200 X 75, Y 100

Nearest Neighbor Search

Quadtree - Query X Y X1,Y1 P≥X1 P≥Y1 P<X1 P<Y1 P≥X1 P<Y1 P<X1 P≥Y1 X1,Y1

Quadtree - Query X Y In many cases works X1,Y1 P<X1 P<Y1 P<X1 P≥Y1 X1,Y1 P≥X1 P≥Y1 P≥X1 P<Y1

Quadtree – Pitfall 1 X Y In some cases doesn’t: there could be points in adjacent buckets that are closer X1,Y1 P≥X1 P≥Y1 P<X1 P<Y1 P≥X1 P<Y1 P<X1 P≥Y1 X1,Y1

Quadtree – Pitfall 2 X Y Could result in Query time Exponential in dimensions

 Simple data structure.  Versatile, easy to implement.  Often space and time inefficient. Quadtree

kd - trees ( k - dimensional trees )  Main ideas :  one - dimensional splits  instead of splitting in the middle, choose the split “ carefully ” ( many variations )  nearest neighbor queries same as for quad - trees

2- dimensional kd - trees  Algorithm  Choose x or y coordinate ( alternate between them ).  Choose the median of the coordinate this defines a horizontal or vertical line.  Recurse on both sides until there is only one point left, which is stored as a leaf.  We get a binary tree  Size O ( n ).  Construction time O ( nlogn ).  Depth O ( logn ).

Nearest Neighbor with KD Trees We traverse the tree looking for the nearest neighbor of the query point.

Examine nearby points first: Explore the branch of the tree that is closest to the query point first. Nearest Neighbor with KD Trees

Examine nearby points first: Explore the branch of the tree that is closest to the query point first. Nearest Neighbor with KD Trees

When we reach a leaf node: compute the distance to each point in the node. Nearest Neighbor with KD Trees

When we reach a leaf node: compute the distance to each point in the node. Nearest Neighbor with KD Trees

Then we can backtrack and try the other branch at each node visited. Nearest Neighbor with KD Trees

Each time a new closest node is found, we can update the distance bounds. Nearest Neighbor with KD Trees

Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees

Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees

Using the distance bounds and the bounds of the data below each node, we can prune parts of the tree that could NOT include the nearest neighbor. Nearest Neighbor with KD Trees

Summary of K - Nearest Neighbor