Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

Similar presentations

Presentation on theme: "CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4"— Presentation transcript:

1 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
CS 540 Fall 2015 (Shavlik) 11/16/2018 Today’s Topics HW1 due 11:55pm HW2 out (due in 16 days) Dealing with Noise Overfitting (the key issue in all of ML) A ‘Greedy’ Algorithm for Pruning D-Trees Generating IF-THEN Rules from D-Trees Rule Pruning 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

2 Don’t Share Variables in Recursive Methods!
Recall in ID3 we do newFeatures = oldFeatures – chosenFeature Do NOT use newFeatures = oldFeatures.remove(chosenFeature) Instead make a FRESH COPY of oldFeatures Remember: once ‘left recursion‘ done, need to do ‘right recursion’ 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

3 Noise: Major Issue in ML
Worst Case of Noise +, - at same point in feature space Causes of Noise Too few features (“hidden variables”) or too few possible values Incorrectly reported/measured/judged feature values Mis-classified instances 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

4 Noise - Major Issue in ML (cont.)
Overfitting Producing an ‘awkward’ concept because of a few ‘noisy’ points - + + + + - - - - + + + + - - - Bad performance on future ex’s? Better performance? 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

5 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
Overfitting Viewed in Terms of Function-Fitting (can exactly fit N points with an N - 1 degree polynomial) Overfitting? + f(x) Underfitting? x Data = Red Line + Some Noise Model 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

6 Definition of Overfitting
Assuming large enough test set so that it is representative, concept C overfit the training data if there exists a simpler concept S so that Training set accuracy of S Training set accuracy of C > but Test set accuracy of S Test set accuracy of C < 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

7 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
Remember! It is easy to learn/fit the training data What’s hard is generalizing well to future (‘test set’) data! Overfitting avoidance (reduction, really) is the key issue in ML Easy to think ‘spurious correlations’ are meaningful signals 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

8 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
See a Pattern? The first 10 digits of Pi: What comes next in Pi? 3 (already used) After that? 5 “35” rounds to “4” (in fractional part of number) “4” has since been added! Picture taken (by me) June 2015 in Lambeau Field Atrium, Green Bay, WI Presumably a ‘spurious correlation’ 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

9 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
Can One Underfit? Sure, if not fully fitting the training set Eg, just return majority category (+ or -) in the trainset as the learned model But also if not enough data to illustrate important distinctions Eg, color may be important, but all examples seen are red, so no reason to include color and make more complex model 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

10 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
Overfitting + Noise Using the strict definition of overfitting presented earlier, is it possible to overfit noise-free data? (Remember: overfitting the key ML issue, not just a decision-tree topic) 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

11 Example of Overfitting Noise-Free Data
Let Correct concept = A  B Feature C be true 50% of the time, for both + and – examples Prob(pos example) = 0.66 Training set +: A B C D E, A B C ¬D E, A B C D ¬E -: A ¬B ¬C D ¬E, ¬A B ¬C ¬D E 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

12 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
Example (concluded) Tree Trainset Accuracy TestSet Accuracy 100% 50% Pruned 60% 66% C T F + - + 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

13 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
ID3 & Noisy Data To avoid overfitting, could allow splitting to stop before all ex’s are of one class Early stopping was Quinlan’s original idea Stop if further splitting not justified by a statistical test (just skim text’s material on the 2 test) But post-pruning now seen as better More robust to weaknesses of greedy algo’s (eg, post-pruning benefits from seeing the full tree; a node may look bad when building tree, but not in hindsight) 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

14 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
ID3 & Noisy Data (cont.) Recap: Build complete tree, then use some ‘spare’ (tuning) examples to decide which parts of tree can be pruned - called Reduced [tuneset] Error Pruning 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

15 ID3 & Noisy Data (cont.) Better tuneset accuracy? discard? See which dropped subtree leads to highest tune-set accuracy Repeat (ie, another greedy algo) 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

16 Greedily Pruning D-Trees
tuneSet acc = 68% Sample (Hill Climbing) Search Space tune = 63% tune = 87% best tune= 89% Stop here since node’s best child is not an improvement tune = 79% Note in pruning we’re reversing the greedy tree-building process 63% 76% 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

17 Greedily Pruning D-trees - Pseudocode
+ Run ID3 to fully fit TRAIN’ Set, measure accuracy on TUNE Consider all subtrees where ONE interior node removed and replaced by leaf - label with majority category in pruned subtree IF progress on TUNE choose best subtree ELSE (ie, if no improvement) quit Go to 2 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

18 CS 540 - Fall 2016 (© Jude Shavlik), Lecture 6, Week 4
Train/Tune/Test Accuracies (same sort of curves for other tuned param’s in other algo’s) 100% Train Tune Accuracy Test Ideal tree to choose Chosen pruned tree Amount of Pruning 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

19 The General Tradeoff in Greedy Algorithms (more later)
Efficiency vs. Optimality Assume True Best Cuts Discard C’s & F’s subtrees Single Best Cut Discard B’s subtrees - irrevocable R Initial Tree A B C D F E Greedy Search: Powerful, General- Purpose, Trick–of-the-Trade 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

20 Generating IF-THEN Rules from Trees
Antecedent: Conjunction of all decisions leading to terminal node Consequent: Label of terminal node COLOR ? SIZE ? Blue Big Small + - Green Red 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

21 Generating Rules (cont)
Previous slide’s tree generates these rules If Color=Green  Output = - If Color=Blue  Output = + If Color=Red and Size=Big  + If Color=Red and Size=Small  - Note 1. Can ‘clean up’ the rule set (next slide) 2. Decision trees learn disjunctive concepts 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

22 Rule Post-Pruning (Another Greedy Algorithm)
Induce a decision tree Convert to rules (see earlier slide) Consider dropping any one rule antecedent (ie, ‘precondition’) Delete the one that improves tuning set accuracy the most Repeat as long as progress being made 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

23 Rule Post-Pruning (cont)
But note that the final rules might overlap one another – so need a ‘conflict resolution’ scheme Advantages Allows an intermediate node to be pruned from some rules but retained in others Can correct poor early decisions in tree construction Final concept more understandable Also applicable to ML algo’s that directly learn rules (eg, ILP, MLNs) 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

24 Training with Noisy Data
If we can clean up the training data, should we do so? No (assuming one can’t clean up the testing data when the learned concept will be used) Better to train with the same type of data as will be experienced when the result of learning is put into use Recall hadBankcruptcy was best indicator of “good candidate for credit card” story! 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

25 Aside: A Rose by Any Other Name …
Tuning sets also called Pruning sets (in d-tree algorithms) Validation sets (in general), but sometimes in the literature (eg, stats community) AI’s test sets called validation (and AI’s tuning sets called test sets!) 9/27/15 CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4

Download ppt "CS Fall 2016 (© Jude Shavlik), Lecture 6, Week 4"

Similar presentations

Ads by Google