Download presentation
Presentation is loading. Please wait.
Published by传 明 Modified over 6 years ago
1
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Trees Definition Mechanism Splitting Function Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and missing attributes What is machine learning?
2
Example of a Decision Tree
Example: Learning to classify stars. Luminosity <= T1 > T1 Mass What is machine learning? <= T2 > T2 Type C Type B Type A
3
Short vs Long Hypotheses
We mentioned a top-down, greedy approach to constructing decision trees denotes a preference of short hypotheses over long hypotheses. Why is this the right thing to do? Occam’s Razor: Prefer the simplest hypothesis that fits the data. What is machine learning? Back since William of Occam (1320). Great debate in the philosophy of science.
4
Issues in Decision Tree Learning
Practical issues while building a decision tree can be enumerated as follows: How deep should the tree be? How do we handle continuous attributes? What is a good splitting function? What happens when attribute values are missing? How do we improve the computational efficiency? What is machine learning?
5
How deep should the tree be? Overfitting the Data
A tree overfits the data if we let it grow deep enough so that it begins to capture “aberrations” in the data that harm the predictive power on unseen examples: t2 Possibly just noise, but the tree is grown larger to capture these examples What is machine learning? humidity t3 size
6
Overtting the Data: Definition
Assume a hypothesis space H. We say a hypothesis h in H overfits a dataset D if there is another hypothesis h’ in H where h has better classification accuracy than h’ on D but worse classification accuracy than h’ on D’. training data What is machine learning? overfitting testing data Size of the tree
7
Causes for Overtting the Data
What causes a hypothesis to overfit the data? Random errors or noise Examples have incorrect class label or incorrect attribute values. Coincidental patterns By chance examples seem to deviate from a pattern due to the small size of the sample. Overfitting is a serious problem that can cause strong performance degradation. What is machine learning?
8
Solutions for Overtting the Data
There are two main classes of solutions: Stop the tree early before it begins to overfit the data. + In practice this solution is hard to implement because it is not clear what is a good stopping point. 2) Grow the tree until the algorithm stops even if the overfitting problem shows up. Then prune the tree as a post-processing step. + This method has found great popularity in the machine learning community. What is machine learning?
9
1.) Grow the tree to learn the training data
Decision Tree Pruning What is machine learning? 1.) Grow the tree to learn the training data 2.) Prune tree to avoid overfitting the data
10
Methods to Validate the New Tree
Training and Validation Set Approach Divide dataset D into a training set TR and a validation set TE Build a decision tree on TR Test pruned trees on TE to decide the best final tree. What is machine learning? Dataset D Training TR Validation TE
11
Training and Validation
Dataset D Training TR (normally 2/3 of D) Validation TE (normally 1/3 of D) What is machine learning? There are two approaches: Reduced Error Pruning Rule Post-Pruning
12
1) Consider all internal nodes in the tree.
Reduced Error Pruning Main Idea: 1) Consider all internal nodes in the tree. For each node check if removing it (along with the subtree below it) and assigning the most common class to it does not harm accuracy on the validation set. Pick the node n* that yields the best performance and prune its subtree. 4) Go back to (2) until no more improvements are possible. What is machine learning?
13
Possible trees after pruning:
Example Possible trees after pruning: What is machine learning? Original Tree
14
Possible trees after 2nd pruning:
Example Possible trees after 2nd pruning: What is machine learning? Pruned Tree
15
Process continues until no improvement is observed
Example Process continues until no improvement is observed on the validation set: Stop pruning the tree What is machine learning? validation data Size of the tree
16
If the original data set is small, separating examples away for
Reduced Error Pruning Disadvantages: If the original data set is small, separating examples away for validation may leave you with few examples for training. Dataset D What is machine learning? Training TR Training set is too small and so is the validation set Testing TE Small dataset
17
1) Convert the tree into a rule-based system.
Rule Post-Pruning Main Idea: 1) Convert the tree into a rule-based system. Prune every single rule first by removing redundant conditions. 3) Sort rules by accuracy. What is machine learning?
18
Possible rules after pruning (based on validation set):
Example x1 Original tree 1 x3 x2 1 1 A C B A What is machine learning? Rules: ~x1 & ~x2 -> Class A ~x1 & x2 -> Class B x1 & ~x3 -> Class A x1 & x3 -> Class C Possible rules after pruning (based on validation set): ~x > Class A ~x1 & x2 -> Class B ~x > Class A x1 & x > Class C
19
Advantages of Rule Post-Pruning
The language is more expressive. Improves on interpretability. Pruning is more flexible. In practice this method yields high accuracy performance. What is machine learning?
20
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Trees Definition Mechanism Splitting Functions Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and missing attributes What is machine learning?
21
Discretizing Continuous Attributes
Example: attribute temperature. 1) Order all values in the training set 2) Consider only those cut points where there is a change of class 3) Choose the cut point that maximizes information gain What is machine learning? temperature
22
Claude Shannon 1916 – 2001 Funded information theory on 1948 with his paper: “A Mathematical Theory of Communication” Awarded the Alfred Noble American Institute of American Engineers Award for his master’s thesis. Worked at MIT, Bell Labs. Met with Alan Turing, Marvin Minsky, John von Neumann, and Albert Einstein. Creator of the “Ultimate Machine”. What is machine learning?
23
Missing Attribute Values
Example: X = (luminosity > T1, mass = ?) We are at a node n in the decision tree. Different approaches: Assign the most common value for that attribute in node n. Assign the most common value in n among examples with the same classification as X. Assign a probability to each value of the attribute based on the frequency of those values in node n. Each fraction is propagated down the tree. What is machine learning?
24
Decision-tree induction is a popular approach to classification
Summary Decision-tree induction is a popular approach to classification that enables us to interpret the output hypothesis. The hypothesis space is very powerful: all possible DNF formulas. We prefer shorter trees than larger trees. Overfitting is an important issue in decision-tree induction. Different methods exist to avoid overfitting like reduced-error pruning and rule post-processing. Techniques exist to deal with continuous attributes and missing attribute values. What is machine learning?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.