Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check.

Similar presentations


Presentation on theme: "Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check."— Presentation transcript:

1 Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check your @wisc.edu Email! Forward to your Work Email? When is 100 < 99 ? (Unrelated to AI) Unstable Algorithms (mentioned on slide last week) D-Tree Wrapup What ‘Space’ does ID3 Search? (Transition to new AI topic: SEARCH ) 9/29/15CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 41

2 Unstable Algorithms An idea from the stats community An ML is unstable if small changes to the trainset can lead to large changes to the learned model D-trees unstable since one different example can change the root k-NN stable since impact of examples local Ensembles work best with unstable algos 9/29/15CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4 Lecture 1, Slide 2

3 CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4 ID3 Recap: Questions Addressed How closely should we fit the training data? –Completely, then prune –Use tuning sets to score candidates –Learn forests and no need to prune! Why? How do we judge features? –Use info theory (Shannon) What if a features has many values? –Convert to Boolean-valued features D-trees can also handle missing feature values (but we won’t cover this for d-trees) 9/29/153

4 CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4 ID3 Recap (cont.) What if some features cost more to evaluate (eg, CAT scan vs Temperature)? –Use an ad-hoc correction factor Best way to use in an ensemble? –Random forests often perform quite well Batch vs. incremental (aka, online) learning? –Basically a ‘batch’ approach –Incremental variants exist but since ID3 is so fast, why not simply rerun ‘from scratch’ whenever a mistake is made? Looks like a d-tree! 9/29/154

5 CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4 ID3 Recap (cont.) What about real-valued outputs? –Could learn a linear approximation for various regions of the feature space, eg How rich is our language for describing examples? –Limited to fixed-length feature vectors (but they are surprisingly effective) f 1 + 2 f 2 3 f 1 - f 2 f4f4 Venn 9/29/155

6 CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4 Summary of ID3 Strengths –Good technique for learning models from ‘real world’ (eg, noisy) data –Fast, simple, and robust –Potentially considers complete hypothesis space –Successfully applied to many real-world tasks –Results (trees or rules) are human-comprehensive –One of the most widely used techniques in data mining 9/29/156

7 CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4 Summary of ID3 (cont.) Weaknesses –Requires fixed-length feature vectors –Only makes axis-parallel (univariate) splits –Not designed to make probabilistic predictions –Non-incremental –Hill-climbing algorithm (poor early decisions can be disastrous) However, extensions exist 9/29/157

8 CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4 A Sample Search Tree - so we can use another search method besides hill climbing (‘greedy’ algo) Nodes are PARTIALLY COMPLETE D-TREES Expand ‘left most’ (in yellow) question mark (?) of current node All possible trees can be generated (given thresholds ‘implied’ by real values in train set) F2F2 ? ? F1F1 ? ? FNFN ? ? F1F1 ? ? ?... Add F 1 Add F N Add F 1 Add F 2 F2F2 ? +... Create leaf node + - 9/29/158 F2F2 ? Assume F2 scores best

9 CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4 Viewing ID3 as a Search Algorithm Search Space Operators Search Strategy Heuristic Function Start Node Goal Node 9/29/159

10 CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 4 Viewing ID3 as a Search Algorithm Search Space Space of all decision trees constructible using current feature set OperatorsAdd a node (ie, grow tree) Search Strategy Hill Climbing Heuristic Function Information Gain (Other d-tree algo’s use similar ‘purity measures’) Start Node An isolated leaf node marked ‘?’ Goal Node Tree that separates all the training data (‘post pruning’ may be done later to reduce overfitting) 9/29/1510

11 What We’ve Covered So Far Supervised ML Algorithms –Instance-based (kNN) –Logic-based (ID3, Decision Stumps) –Ensembles (Random Forests, Bagging, Boosting) Train/Tune/Test Sets, N-Fold Cross Validation Feature Space, (Greedily) Searching Hypothesis Spaces Parameter Tuning (‘Model Selection’), Feature Selection (info gain) Dealing w/ Real-Valued and Hierarchical Features Overfitting Reduction, Occam’s Razor Fixed-Length Feature Vectors, Graph/Logic-Based Reps of Examples Understandability of Learned Models, “Generalizing not Memorizing” Briefly: Missing Feature Values, Stability (to small changes in training sets) Algo’s Methodology Issues 9/29/15CS 540 - Fall 2015 (Shavlik©), Lecture 8, Week 411


Download ppt "Today’s Topics HW1 Due 11:55pm Today (no later than next Tuesday) HW2 Out, Due in Two Weeks Next Week We’ll Discuss the Make-Up Midterm Be Sure to Check."

Similar presentations


Ads by Google