Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Mining Practical Machine Learning Tools and Techniques.

Slides:



Advertisements
Similar presentations
Machine Learning in Real World: C4.5
Advertisements

Decision Trees with Numeric Tests
Pavan J Joshi 2010MCS2095 Special Topics in Database Systems
C4.5 - pruning decision trees
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Classification with Decision Trees II
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Decision Tree Learning
Chapter 7 Decision Tree.
Machine Learning Chapter 3. Decision Tree Learning
Mohammad Ali Keyvanrad
Data Mining Schemes in Practice. 2 Implementation: Real machine learning schemes  Decision trees: from ID3 to C4.5  missing values, numeric attributes,
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Algorithms: Decision Trees 2 Outline  Introduction: Data Mining and Classification  Classification  Decision trees  Splitting attribute  Information.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
1 Classification with Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.1: Decision Trees Rodney Nielsen Many /
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Decision Trees with Numeric Tests. Industrial-strength algorithms  For an algorithm to be useful in a wide range of real- world applications it must:
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Data Mining Practical Machine Learning Tools and Techniques
Data Science Credibility: Evaluating What’s Been Learned
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Data Science Algorithms: The Basic Methods
Decision Trees with Numeric Tests
C4.5 - pruning decision trees
Data Science Algorithms: The Basic Methods
Artificial Intelligence
Ch9: Decision Trees 9.1 Introduction A decision tree:
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Machine Learning Techniques for Data Mining
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Classification with Decision Trees
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning in Practice Lecture 17
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Mining Practical Machine Learning Tools and Techniques Chapter 6.1 Decision Trees

Rodney Nielsen, Human Intelligence & Language Technologies Lab Industrial-Strength Algorithms For an algorithm to be useful in a wide range of real-world applications it must: Permit both nominal and numeric attributes Allow missing values Be robust in the presence of noise Be able to approximate arbitrary concept descriptions (at least in principle) Basic schemes need to be extended to fulfill these requirements

Rodney Nielsen, Human Intelligence & Language Technologies Lab Decision Trees Extending ID3: to permit numeric attributes: straightforward to deal sensibly with missing values: trickier to be robust to noisy data: requires pruning mechanism End result: C4.5 (Quinlan) Best-known and (probably) most widely-used learning algorithm Commercial successor: C5.0

Rodney Nielsen, Human Intelligence & Language Technologies Lab Numeric Attributes Standard method: binary splits E.g. temp < 45 Unlike nominal attributes, every attribute has many possible split points Solution is straightforward extension: Evaluate info gain (or other measure) for every possible split point of attribute Choose “best” split point Info gain for best split point is info gain for attribute Computationally more demanding

Rodney Nielsen, Human Intelligence & Language Technologies Lab Weather Data (again!) If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes

Rodney Nielsen, Human Intelligence & Language Technologies Lab Weather Data (again!) If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes If outlook = sunny and humidity > 83 then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity < 85 then play = no If none of the above then play = yes …………… YesFalseNormalMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTemperatureOutlook …………… YesFalseNormalMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTemperatureOutlook …………… YesFalse9670Rainy YesFalse8683Overcast NoTrue9080Sunny NoFalse85 Sunny PlayWindyHumidityTemperatureOutlook …………………………NoTrue7065Rainy ……………………………………… …………………………YesFalse8068Rainy

Rodney Nielsen, Human Intelligence & Language Technologies Lab Example Split on temperature attribute: E.g., temperature  71.5: yes/4, no/2 temperature  71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = bits Place split points halfway between values Can evaluate all split points in one pass! YesNoYesYesYesNoNoYesYesYesNoYesYesNo

Rodney Nielsen, Human Intelligence & Language Technologies Lab Can Avoid Repeated Sorting Sort instances by the values of the numeric attribute Time complexity for sorting: O (n log n) Does this have to be repeated at each node of the tree? No! Sort order for children can be derived from sort order for parent Time complexity of derivation: O (n) per tree level Drawback: need to create and store a list of sorted indices for each numeric attribute

Rodney Nielsen, Human Intelligence & Language Technologies Lab Binary vs. Multiway Splits Splitting (multi-way) on a nominal attribute exhausts all information in that attribute Nominal attribute is tested in the tree at most once on any path Not so for binary splits on numeric attributes! Numeric attribute may be tested several times along a path in the tree Disadvantage: tree is hard to read Possible remedy, if necessary: Pre-discretize numeric attributes, or Use multi-way splits instead of binary ones

Rodney Nielsen, Human Intelligence & Language Technologies Lab Missing Values Split instances with missing values into pieces A piece going down a branch receives a weight proportional to the popularity of the branch weights sum to 1 Info gain works with fractional instances Use sums of weights instead of counts During classification (test), split the instance into pieces in the same way

Rodney Nielsen, Human Intelligence & Language Technologies Lab Pruning Prevent over-fitting to noise in the data “Prune” the decision tree Two strategies: ● Post-pruning take a fully-grown decision tree and discard unreliable parts ● Pre-pruning stop growing a branch when information becomes unreliable Post-pruning preferred in practice—pre-pruning can “stop early” Consider scenarios similar to an exclusive-OR

Rodney Nielsen, Human Intelligence & Language Technologies Lab Pre-pruning Based on statistical significance test Stop growing the tree when there is no statistically significant association between any attribute and the class at a particular node Most popular test: chi-squared test ID3 uses chi-squared test in addition to information gain Only statistically significant attributes were allowed to be selected by information gain procedure What level of significance (i.e., what p value)?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Early Stopping Pre-pruning may stop the growth process prematurely: early stopping Classic example: XOR/Parity-problem No individual attribute exhibits any significant association to the class Structure is only visible in fully expanded tree Pre-pruning won’t expand the root node But: XOR-type problems rare in practice And pre-pruning is faster than post-pruning a classb

Rodney Nielsen, Human Intelligence & Language Technologies Lab Post-pruning First, build full tree Then, prune it ● Fully-grown tree shows all attribute interactions Problem: some subtrees might be due to chance effects Two pruning operations: ● Subtree replacement ● Subtree raising Possible strategies: ● Error estimation ● Significance testing ● MDL principle

Rodney Nielsen, Human Intelligence & Language Technologies Lab Subtree Replacement Bottom-up Consider replacing a subtree only after considering all its subtrees 15

Rodney Nielsen, Human Intelligence & Language Technologies Lab Subtree Raising Delete node Redistribute instances Slower than subtree replacement

Rodney Nielsen, Human Intelligence & Language Technologies Lab Estimating Error Rates Prune only if it does not increase the estimated error Error on the training data is NOT a useful estimator (would result in essentially no pruning) Use hold-out set for pruning (“reduced-error pruning”) C4.5’s method Derive confidence interval from training data Use a heuristic limit, derived from this, for pruning Standard Bernoulli-process-based method Shaky statistical assumptions (based on training data)

Rodney Nielsen, Human Intelligence & Language Technologies Lab C4.5’s Method Error estimate for subtree is weighted sum of error estimates for all its leaves Error estimate for a node: Standard z-value confidence interval, as in calculating a confidence interval for overall accuracy/error rate C4.5: c = 25% => z = 0.69 (~ normal distribution) Based on: The error on the training data The number of instances covered by the leaf

Rodney Nielsen, Human Intelligence & Language Technologies Lab Example f = 0.33 e ^ = 0.47 f = 0.5 e ^ = 0.72 f = 0.33 e ^ = 0.47 f = 5/14 e ^ = 0.46 e ^ < 0.51 so prune! Combined using ratios 6:2:6 => 0.47*6/ *2/ *6/14 = 0.51 f = actual error on the training data at the leaf e ^ = estimated error on unseen (test) data

Rodney Nielsen, Human Intelligence & Language Technologies Lab Cost-complexity Pruning C4.5's post-pruning often does not prune enough Tree size continues to grow when more instances are added even if performance on independent data does not improve Very fast and popular in practice Can be worthwhile in some cases to strive for a more compact tree At the expense of more computational effort Cost-complexity pruning method from the CART (Classification and Regression Trees) learning system

Rodney Nielsen, Human Intelligence & Language Technologies Lab Algorithmic Complexity C4.5 Tree Induction Complexity (according to book) Makes a number of assumptions that Big O should not make Assume m attributes n training instances tree depth O (log n) Building a treeO (m n log n) Subtree replacementO (n) Subtree raisingO (n (log n) 2 ) Total cost O (m n log n) + O (n (log n) 2 )

Rodney Nielsen, Human Intelligence & Language Technologies Lab From Trees to Rules Simple method: one rule for each leaf C4.5rules: greedily prunes conditions from each rule if this reduces its estimated error ● Can produce duplicate rules ● Check for this at the end Then ● look at each class in turn ● consider the rules for that class ● find a “good” subset (guided by MDL) Then rank the subsets to avoid conflicts Finally, remove rules (greedily) if this decreases error on held out development data

Rodney Nielsen, Human Intelligence & Language Technologies Lab C4.5: Choices and Options C4.5 has two parameters Confidence value (default 25%): lower values incur heavier pruning Minimum number of instances in the two most popular branches (default 2) C4.5rules slow for large and noisy datasets Commercial version C5.0rules uses a different technique Much faster and a bit more accurate

Rodney Nielsen, Human Intelligence & Language Technologies Lab Discussion Top-Down Induction of Decision Trees The most extensively studied method of machine learning used in data mining Different criteria for attribute/test selection rarely make much difference Different pruning methods mainly change the size of the resulting pruned tree C4.5 builds univariate decision trees One variable tested per node in the decision tree Some systems can build multivariate trees (e.g. CART)