Machine Learning III Decision Tree Induction

Slides:

Advertisements

Similar presentations

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Advertisements

Decision Trees Decision tree representation ID3 learning algorithm

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.

Classification Algorithms

RIPPER Fast Effective Rule Induction

Decision Tree Algorithm (C4.5)

ICS320-Foundations of Adaptive and Learning Systems

Classification Techniques: Decision Tree Learning

Jesse Davis Machine Learning Jesse Davis

Machine Learning II Decision Tree Induction CSE 473.

Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.

Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.

Decision tree LING 572 Fei Xia 1/10/06. Outline Basic concepts Main issues Advanced topics.

Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.

Induction of Decision Trees

Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.

Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.

Decision Tree Learning

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Decision tree learning

By Wang Rui State Key Lab of CAD&CG

Machine Learning Chapter 3. Decision Tree Learning

Artificial Intelligence 7. Decision trees

Mohammad Ali Keyvanrad

For Wednesday No new reading Homework: –Chapter 18, exercises 3, 4, 7.

Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.

Machine Learning Lecture 10 Decision Tree Learning 1.

CpSc 810: Machine Learning Decision Tree Learning.

Learning from Observations Chapter 18 Through

Decision-Tree Induction & Decision-Rule Induction

For Wednesday No reading Homework: –Chapter 18, exercise 6.

CS690L Data Mining: Classification

For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.

CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.

Machine Learning II Decision Tree Induction CSE 573.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.

Decision Trees, Part 1 Reading: Textbook, Chapter 6.

Decision Tree Learning

Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.

Outline Logistics Review Machine Learning –Induction of Decision Trees (7.2) –Version Spaces & Candidate Elimination –PAC Learning Theory (7.1) –Ensembles.

Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.

Learning From Observations Inductive Learning Decision Trees Ensembles.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.

CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.

Learning from Observations

Machine Learning Inductive Learning and Decision Trees

CS 9633 Machine Learning Decision Tree Learning

Decision Tree Learning

Decision trees (concept learnig)

Machine Learning Lecture 2: Decision Tree Learning.

Introduce to machine learning

Classification Algorithms

Decision Tree Learning

Artificial Intelligence

Knowledge Representation

Data Science Algorithms: The Basic Methods

Introduction to Machine Learning Algorithms in Bioinformatics: Part II

Decision Tree Saed Sayad 9/21/2018.

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning: Lecture 3

Decision Trees Decision tree representation ID3 learning algorithm

Machine Learning Chapter 3. Decision Tree Learning

Decision Trees.

Learning from Observations

A task of induction to find patterns

Presentation transcript:

Machine Learning III Decision Tree Induction CSE 473

Machine Learning Outline Function approximation Bias Supervised learning Classifiers & concept learning Version Spaces (restriction bias) Decision-trees induction (pref bias) Overfitting Ensembles of classifiers © Daniel S. Weld

Two Strategies for ML Restriction bias: use prior knowledge to specify a restricted hypothesis space. Version space algorithm over conjunctions. Preference bias: use a broad hypothesis space, but impose an ordering on the hypotheses. Decision trees. © Daniel S. Weld

Decision Trees Convenient Representation Expressive Developed with learning in mind Deterministic Expressive Equivalent to propositional DNF Handles discrete and continuous parameters Simple learning algorithm Handles noise well Classify as follows Constructive (build DT by adding nodes) Eager Batch (but incremental versions exist) © Daniel S. Weld

Concept Learning E.g. Learn concept “Edible mushroom” Target Function has two values: T or F Represent concepts as decision trees Use hill climbing search Thru space of decision trees Start with simple concept Refine it into a complex concept as needed © Daniel S. Weld

Experience: “Good day for tennis” Day Outlook Temp Humid Wind PlayTennis? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n © Daniel S. Weld

Decision Tree Representation Good day for tennis? Leaves = classification Arcs = choice of value for parent attribute Outlook Sunny Overcast Rain Humidity Wind Yes Strong Normal High Weak Yes No No Yes Decision tree is equivalent to logic in disjunctive normal form G-Day  (Sunny  Normal)  Overcast  (Rain  Weak) © Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

DT Learning as Search Nodes Operators Initial node Heuristic? Goal? Decision Trees Tree Refinement: Sprouting the tree Smallest tree possible: a single leaf Information Gain Best tree possible (???) © Daniel S. Weld

What is the Simplest Tree? Day Outlook Temp Humid Wind Play? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n What is the Simplest Tree? How good? [10+, 4-] Means: correct on 10 examples incorrect on 4 examples © Daniel S. Weld

Which attribute should we use to split? Successors Yes Humid Wind Outlook Temp Which attribute should we use to split? © Daniel S. Weld

To be decided: How to choose best attribute? Information gain Entropy (disorder) When to stop growing tree? © Daniel S. Weld

Disorder is bad Homogeneity is good No Better Good © Daniel S. Weld

1.0 0.5 Entropy P of as % .00 .50 1.00 © Daniel S. Weld

Entropy (disorder) is bad Homogeneity is good Let S be a set of examples Entropy(S) = -P log2(P) - N log2(N) where P is proportion of pos example and N is proportion of neg examples and 0 log 0 == 0 Example: S has 10 pos and 4 neg Entropy([10+, 4-]) = -(10/14) log2(10/14) - (4/14)log2(4/14) = 0.863 © Daniel S. Weld

 Information Gain Measure of expected reduction in entropy Resulting from splitting along an attribute  v  Values(A) Gain(S,A) = Entropy(S) - (|Sv| / |S|) Entropy(Sv) Where Entropy(S) = -P log2(P) - N log2(N) © Daniel S. Weld

Gain of Splitting on Wind Day Wind Tennis? d1 weak n d2 s n d3 weak yes d4 weak yes d5 weak yes d6 s yes d7 s yes d8 weak n d9 weak yes d10 weak yes d11 s yes d12 s yes d13 weak yes d14 s n Values(wind)=weak, strong S = [10+, 4-] Sweak = [6+, 2-] Ss = [3+, 3-] Gain(S, wind) = Entropy(S) - (|Sv| / |S|) Entropy(Sv) = Entropy(S) - 8/14 Entropy(Sweak) - 6/14 Entropy(Ss) = 0.863 - (8/14) 0.811 - (6/14) 1.00 = -0.029  v  {weak, s} © Daniel S. Weld

Gain of Split on Humidity Day Outlook Temp Humid Wind Play? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n Gain(S,A) = Entropy(S) - (|Sv| / |S|) Entropy(Sv) Where Entropy(S) = -P log2(P) - N log2(N)  v  Values(A) © Daniel S. Weld

Is… Entropy([4+,3-]) = .985 Entropy([7,0]) = Gain = 0.863- .985/2 = 0.375 © Daniel S. Weld

Evaluating Attributes Yes Humid Wind Gain(S,Wind) =-0.029 Gain(S,Humid) =0.375 Outlook Temp Gain(S,Temp) =0.029 Gain(S,Outlook) =0.401 Value is actually different than this, but ignore this detail © Daniel S. Weld

Resulting Tree …. Good day for tennis? Outlook Sunny Overcast Rain No [2+, 3-] Yes [4+] No [2+, 3-] © Daniel S. Weld

Recurse! Outlook Sunny Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c n weak yes d11 m n s yes © Daniel S. Weld

One Step Later… Outlook Sunny Overcast Rain Humidity Yes [4+] No [2+, 3-] Normal High Yes [2+] No [3-] © Daniel S. Weld

Decision Tree Algorithm BuildTree(TraingData) Split(TrainingData) Split(D) If (all points in D are of the same class) Then Return For each attribute A Evaluate splits on attribute A Use best split to partition D into D1, D2 Split (D1) Split (D2) © Daniel S. Weld

Movie Recommendation Features? Rambo Matrix Rambo 2 © Daniel S. Weld

Issues Content vs. Social Non-Boolean Attributes Missing Data Scaling up © Daniel S. Weld

Missing Data 1 Don’t use this instance for learning? Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes Don’t use this instance for learning? Assign attribute … most common value at node, or most common value, … given classification © Daniel S. Weld

Fractional Values 75% h and 25% n Use in gain calculations [0.75+, 3-] Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes [1.25+, 0-] 75% h and 25% n Use in gain calculations Further subdivide if other missing attributes Same approach to classify test ex with missing attr Classification is most probable classification Summing over leaves where it got divided © Daniel S. Weld

Non-Boolean Features Features with multiple discrete values Construct a multi-way split Test for one value vs. all of the others? Group values into two disjoint subsets? Real-valued Features Discretize? Consider a threshold split using observed values? © Daniel S. Weld

Attributes with many values So many values that it Divides examples into tiny sets Each set likely uniform  high info gain But poor predictor… Need to penalize these attributes © Daniel S. Weld

One approach: Gain ratio SplitInfo  entropy of S wrt values of A (Contrast with entropy of S wrt target value) attribs with many uniformly distrib values e.g. if A splits S uniformly into n sets SplitInformation = log2(n)… = 1 for Boolean © Daniel S. Weld

Machine Learning Outline Supervised learning Overfitting What is the problem? Reduced error pruning Ensembles of classifiers © Daniel S. Weld

Overfitting On training data Accuracy On test data 0.9 0.8 0.7 0.6 Number of Nodes in Decision tree © Daniel S. Weld

Overfitting 2 Figure from w.w.cohen © Daniel S. Weld

Overfitting… DT is overfit when exists another DT’ and DT has smaller error on training examples, but DT has bigger error on test examples Causes of overfitting Noisy data, or Training set is too small © Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

Effect of Reduced-Error Pruning Cut tree back to here © Daniel S. Weld

Machine Learning Outline Supervised learning Overfitting Ensembles of classifiers Bagging Cross-validated committees Boosting © Daniel S. Weld