1 Decision Trees. 2 OutlookTemp (  F) Humidity (%) Windy?Class sunny7570true play sunny8090true don’t play sunny85 false don’t play sunny7295false don’t.

Slides:



Advertisements
Similar presentations
Decision Trees Decision tree representation ID3 learning algorithm
Advertisements

C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Classification Techniques: Decision Tree Learning
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
CSci 8980: Data Mining (Fall 2002)
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Induction of Decision Trees
Tree-based methods, neutral networks
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Naive Bayes (DataSet) OutlookTempHumidityWindyClass sunnyhothighfalseDon't Play sunnyhothightrueDon't Play overcasthothighfalsePlay rainymildhighfalsePlay.
Classification Continued
Lecture 5 (Classification with Decision Trees)
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification II.
Copyright © 2004 by Jinyan Li and Limsoon Wong Rule-Based Data Mining Methods for Classification Problems in Biomedical Domains Jinyan Li Limsoon Wong.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 4 Classification. 2 Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Review - Decision Trees
Decision Trees Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Scaling up Decision Trees. Decision tree learning.
Data Mining-Knowledge Presentation—ID3 algorithm Prof. Sin-Min Lee Department of Computer Science.
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
Longin Jan Latecki Temple University
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
CS690L Data Mining: Classification
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Classification Vikram Pudi IIIT Hyderabad.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Classification: Basic Concepts, Decision Trees. Classification Learning: Definition l Given a collection of records (training set) –Each record contains.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Classification Vikram Pudi IIIT Hyderabad.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
SLIQ and SPRINT for disk resident data. Shortcommings of ID3 Scalability ? requires lot of computation at every stage of construction of decision tree.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Chapter 6 Classification and Prediction
ID3 Vlad Dumitriu.
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Data Mining Classification: Basic Concepts and Techniques
ID3 Algorithm.
Introduction to Data Mining, 2nd Edition
Machine Learning Chapter 3. Decision Tree Learning
Basic Concepts and Decision Trees
Classification with Decision Trees
Machine Learning Chapter 3. Decision Tree Learning
Presentation transcript:

1 Decision Trees

2 OutlookTemp (  F) Humidity (%) Windy?Class sunny7570true play sunny8090true don’t play sunny85 false don’t play sunny7295false don’t play sunny6970false play overcast7290true play overcast8378false play overcast6465true play overcast8175false play rain7180true don’t play rain6570true don’t play rain7580false play rain6880false play rain7096false play 14 cases: 9 Play 5 DP 5 cases: 2 Play 3 DP 4 cases: 4 Play 0 DP 5 cases: 3 Play 2 DP Outlook= sunny overcastrain 2 cases: 2 Play 0 DP 3 cases: 0 Play 3 DP 2 cases: 0 Play 2 DP 3 cases: 3 Play 0 DP Humidity  75% >75% Windy= false true

3 Basic Tree Building Algorithm MakeTree ( Training Data D ): Partition( D ) Partition ( Data D ): if all points in D are in same class: return Evaluate splits for each attribute A Use best split found to partition D into D 1,D 2,…, D n for each D i : Partition (D i )

4 ID3, CART ID3 Use information gain to determine best split gain = H(D) –  i=1…n P(D i ) H(D i ) H(p 1,p 2,…,p m ) = –  i=1…m p i log p i like 20-question game – Which attribute is better to look for first: “Is it a living thing?” or “Is it a duster?” CART Only create two children for each node Goodness of a split (  )  = 2 P(D 1 ) P(D 2 )  i=1…m | P(C j / D 1 ) – P(C j / D 2 ) |

5 Shannon’s Entropy An expt has several possible outcomes In N expts, suppose each outcome occurs M times This means there are N/M possible outcomes To represent each outcome, we need log N/M bits. – This generalizes even when all outcomes are not equally frequent. – Reason: For an outcome j that occurs M times, there are N/M equi-probable events among which only one cp to j Since p i = M / N, information content of an outcome is -log p i So, expected info content: H = - Σ p i log p i

6 C4.5, C5.0 Handle missing data – During tree building, ignore missing data – During classification, predict value of missing data based on attribute values of other records Continuous data – Divide continuous data into ranges based on attribute values found in training data Pruning – Prepruning – Postpruning – replace subtree with leaf if error-rate doesn’t change much Rules Goodness of split: Gain-ratio – gain favours attributes with many values (leads to over-fitting) – gain-ratio = gain / H(P(D 1 ), P(D 2 ),…, P(D n )) C5.0 – Commercial version of C4.5 – Incorporated boosting; other secret techniques

7 Motivations: – Previous algorithms consider only memory-resident data – In determining the entropy of a split on a non- categorical attribute, the attribute values have to be sorted SLIQ

8 Data structure: class list and attribute lists SLIQ OutlookTemp ( o F)Humidity (%)Windy?Class sunny7570truePlay sunny8090trueDon’t Play sunny85 falseDon’t Play sunny7295falseDon’t Play sunny6970falsePlay overcast7290truePlay overcast8378falsePlay overcast6465truePlay overcast8175falsePlay rain7180trueDon’t Play rain6570trueDon’t Play rain7580falsePlay rain6880falsePlay rain7096falsePlay IndexClassLeaf 1PlayN1 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN1 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list Humidity (%)Class List Index The attribute list for Humidity

9 Data structure: class list and attribute lists SLIQ IndexClassLeaf 1PlayN1 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN1 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list Humidity (%)Class List Index The attribute list for Humidity Node: N1 ValueClassFrequency <=65Play1

10 Data structure: class list and attribute lists SLIQ IndexClassLeaf 1PlayN1 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN1 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list Humidity (%)Class List Index The attribute list for Humidity Node: N1 ValueClassFrequency <=65Play1 Node: N1 ValueClassFrequency <=65Play1 <=70Play2

11 Data structure: class list and attribute lists SLIQ IndexClassLeaf 1PlayN1 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN1 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list Humidity (%)Class List Index The attribute list for Humidity Node: N1 ValueClassFrequency <=65Play1 <=70Play2 Node: N1 ValueClassFrequency <=65Play1 <=70Play3

12 Data structure: class list and attribute lists SLIQ IndexClassLeaf 1PlayN1 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN1 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list Humidity (%)Class List Index The attribute list for Humidity The entropies of various split points can be calculated from these figures. The next attribute list is then scanned Node: N1 ValueClassFrequency <=65Play1 <=70Play3 <=70DP1 ……… <=96Play9 <=96DP5

13 Data structure: class list and attribute lists SLIQ IndexClassLeaf 1PlayN1 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN1 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list Humidity (%)Class List Index The attribute list for Humidity Assume the split point “Humidity <= 80%” is the best one among all possible splits of all attributes The Humidity attribute list is scanned again to update the class list N1 N2N3 Humidity <= 80%> 80%

14 Data structure: class list and attribute lists SLIQ IndexClassLeaf 1PlayN1 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN1 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list Humidity (%)Class List Index The attribute list for Humidity IndexClassLeaf 1PlayN1 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN2 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list N1 N2N3 Humidity <= 80%> 80%

15 IndexClassLeaf 1PlayN1 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN2 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list Data structure: class list and attribute lists SLIQ Humidity (%)Class List Index The attribute list for Humidity IndexClassLeaf 1PlayN2 2Don’t PlayN1 3Don’t PlayN1 4Don’t PlayN1 5PlayN1 6PlayN1 7PlayN1 8PlayN2 9PlayN1 10Don’t PlayN1 11Don’t PlayN1 12PlayN1 13PlayN1 14PlayN1 The class list N1 N2N3 Humidity <= 80%> 80%

16 Data structure: class list and attribute lists SLIQ Humidity (%)Class List Index The attribute list for Humidity IndexClassLeaf 1PlayN2 2Don’t PlayN3 3Don’t PlayN3 4Don’t PlayN3 5PlayN2 6PlayN3 7PlayN2 8PlayN2 9PlayN2 10Don’t PlayN2 11Don’t PlayN2 12PlayN2 13PlayN2 14PlayN3 The class list N1 N2N3 Humidity <= 80%> 80%

17 Data structure: class list and attribute lists SLIQ Humidity (%)Class List Index The attribute list for Humidity IndexClassLeaf 1PlayN2 2Don’t PlayN3 3Don’t PlayN3 4Don’t PlayN3 5PlayN2 6PlayN3 7PlayN2 8PlayN2 9PlayN2 10Don’t PlayN2 11Don’t PlayN2 12PlayN2 13PlayN2 14PlayN3 The class list Node: N2 ValueClassFrequency <=65Play1 ……… <=80Play7 <=80DP2 Node: N3 ValueClassFrequency <=85DP1 ……… <=96Play2 <=96DP3

18 Motivations (review): – Previous algorithms consider only memory-resident data At any time, only the class list and 1 attribute list in memory A new layer (vs. the child nodes of a single node) is created by at most 2 scans of each attribute list – In determining the entropy of a split on a non- categorical attribute, the attribute values have to be sorted Presorting: each attribute list is sorted only once SLIQ

19 Motivations: – The class list in SLIQ has to reside in memory, which is still a bottleneck for scalability – Can the decision tree building process be carried out by multiple machines in parallel? – Frequent lookup of the central class list produces a lot of network communication in the parallel case SPRINT

20 SPRINT Proposed Solutions Eliminate the class list 1. Class labels distributed to each attribute list => Redundant data, but the memory-resident and network communication bottlenecks are removed 2. Each node keeps its own set of attribute lists => No need to lookup the node information Each node is assigned a partition of each attribute list. The nodes are ordered so that the combined lists of non-categorical attributes remain sorted Each node produces its local histograms in parallel, the combined histograms can be used to find the best splits