Mohammad Ali Keyvanrad

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Decision Tree Approach in Data Mining
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Tree Learning
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Decision Tree Algorithm
CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.
Decision tree LING 572 Fei Xia 1/10/06. Outline Basic concepts Main issues Advanced topics.
Decision Tree Learning Learning Decision Trees (Mitchell 1997, Russell & Norvig 2003) –Decision tree induction is a simple but powerful learning paradigm.
Decision Trees Decision tree representation Top Down Construction
1 Interacting with Data Materials from a Course in Princeton University -- Hu Yan.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Ch 3. Decision Tree Learning
Decision Tree Pruning Methods Validation set – withhold a subset (~1/3) of training data to use for pruning –Note: you should randomize the order of training.
Classification.
Decision Tree Learning
Chapter 7 Decision Tree.
Decision tree learning
By Wang Rui State Key Lab of CAD&CG
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Artificial Intelligence 7. Decision trees
Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
November 10, Machine Learning: Lecture 9 Rule Learning / Inductive Logic Programming.
Decision Trees DefinitionDefinition MechanismMechanism Splitting FunctionSplitting Function Issues in Decision-Tree LearningIssues in Decision-Tree Learning.
Decision-Tree Induction & Decision-Rule Induction
Decision Tree Learning
Artificial Intelligence Project #3 : Analysis of Decision Tree Learning Using WEKA May 23, 2006.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Decision Tree Learning
Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.
Seminar on Machine Learning Rada Mihalcea Decision Trees Very short intro to Weka January 27, 2003.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Decision Trees Reading: Textbook, “Learning From Examples”, Section 3.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Machine Learning Lecture 2: Decision Tree Learning.
Decision Tree Learning
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Tree Saed Sayad 9/21/2018.
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning Chapter 3. Decision Tree Learning
Presentation transcript:

Mohammad Ali Keyvanrad In the Name of God Machine Learning Decision Tree Mohammad Ali Keyvanrad Thanks to: Tom Mitchell (Carnegie Mellon University ) Rich Caruana (Cornell University) 1393-1394 (Spring)

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning

Decision Tree for Play Tennis

Decision Trees internal node = attribute test branch = attribute value leaf node = classification

Decision tree representation In general, decision trees represent a disjunction of conjunctions of constraints on the attribute values of instances. Disjunction: or Conjunctions: and

Appropriate Problems For Decision Tree Learning Instances are represented by attribute-value pairs The target function has discrete output values Disjunctive descriptions may be required The training data may contain errors The training data may contain missing attribute values Examples Medical diagnosis

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning

Top-Down Induction of Decision Trees Main loop find “best” attribute test to install at root split data on root test find “best” attribute tests to install at each new node split data on new tests repeat until training examples perfectly classified Which attribute is best?

ID3

ID3

ID3

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning

Entropy Entropy measure the impurity of 𝑆 Or is a measure of the uncertainty 𝑆 is a sample of training examples 𝑝 ⊕ is the proportion of positive examples in 𝑆 𝑝 ⊝ is the proportion of negative examples in 𝑆

Entropy 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆)= expected number of bits needed to encode class (⊕ or ⊖) of randomly drawn member of 𝑆 (under the optimal, shortest-length code) Why? Information theory: optimal length code assigns − log 2 𝑝 bits to message having probability 𝑝 𝑆𝑡𝑟𝑖𝑛𝑔: 𝛼 0 𝛼 0 𝛼 1 𝛼 0 𝛼 0 𝛼 1 𝛼 2 𝛼 3 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 =1.75

Information Gain Expected reduction in entropy due to splitting on an attribute 𝑉𝑎𝑙𝑢𝑒𝑠(𝐴) is the set of all possible values for attribute 𝐴 𝑆 𝑣 is the subset of 𝑆 for which attribute 𝐴 has value 𝑣

Training Examples

Selecting the Next Attribute Which Attribute is the best classifier?

Hypothesis Space Search by ID3 The hypothesis space searched by ID3 is the set of possible decision trees. ID3 performs a simple-to complex, hill-climbing search through this hypothesis space.

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning

Overfitting ID3 grows each branch of the tree just deeply enough to perfectly classify the training examples. Difficulties Noise in the data Small data Consider adding noisy training example #15 Sunny, Hot, Normal, Strong, PlayTennis=No Effect? Construct a more complex tree

Overfitting Consider error of hypothesis ℎ over Training data: 𝑒𝑟𝑟𝑜 𝑟 𝑡𝑟𝑎𝑖𝑛 (ℎ) Entire distribution 𝐷 of data : 𝑒𝑟𝑟𝑜 𝑟 𝐷 (ℎ) Hypothesis ℎ∈𝐻 overfits training data if there is an alternative hypothesis ℎ′∈𝐻 such that and

Overfitting in Decision Tree Learning

Avoiding overfitiing How can we avoid overfitting? Stop growing before it reaches the point where it perfectly classifies the training data (more direct) Grow full tree, then post-prune (more successful) How to select “best” tree? Measure performance over training data Measure performance over separate validation data MDL (Minimum Description Length): 𝑆𝑖𝑧𝑒(𝑡𝑟𝑒𝑒)+𝑠𝑖𝑧𝑒(𝑚𝑖𝑠𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠(𝑡𝑟𝑒𝑒))

Reduced-Error Pruning Split data into training and validation set Do until further pruning is harmful (decreases accuracy of the tree over the validation set) Evaluate impact on validation set of pruning each possible node (plus those below it) Greedily remove the one that most improves validation set accuracy

Effect of Reduced-Error Pruning

Rule Post-Pruning Each attribute test along the path from the root to the leaf becomes a rule antecedent (precondition) Method Convert tree to equivalent set of rules Prune each rule independently of others each such rule is pruned by removing any antecedent, whose removal does not worsen its estimated accuracy Sort final rules into desired sequence for use Perhaps most frequently used method (e.g., C4.5)

Converting A Tree to Rules

Rule Post-Pruning Main advantages of convert the decision tree to rules The pruning decision regarding an attribute test can be made differently for each path. If the tree itself were pruned, the only two choices would be to remove the decision node completely, or to retain it in its original form. Converting to rules removes the distinction between attribute tests that occur near the root of the tree and those that occur near the leaves. Converting to rules improves readability. Rules are often easier for to understand.

Continuous-Valued Attributes Partition the continuous attribute value into a discrete set of intervals. These candidate thresholds can then be evaluated by computing the information gain associated with each. 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟 𝑒 >54 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟 𝑒 >85 48+60 2 =54 80+90 2 =85

Unknown Attribute Values What if some examples missing values of A? Use training example anyway, sort through tree If node 𝑛 tests 𝐴, assign most common value of 𝐴 among other examples sorted to node 𝑛. Assign most common value of A among other examples sorted to node 𝑛 with same target value. Humidity Wind

Unknown Attribute Values Assign probability 𝑝 𝑖 to each possible value 𝑣 𝑖 of 𝐴 Assign fraction 𝑝 𝑖 of example to each descendant in tree fractional examples are used for the purpose of computing information Gain Classify new examples in same fashion(summing the weights of the instance fragments classified in different ways at the leaf nodes) Humidity Wind

Attribute with Costs Consider Medical diagnosis, “Blood Test” has cost $150 How to learn a consistent tree with low expected cost? One approach: Replace gain by Tan and Schlimmer 𝐺𝑎𝑖 𝑛 2 𝑆,𝐴 𝐶𝑜𝑠𝑡 𝐴 Nunez (2 𝐺𝑎𝑖𝑛 𝑆,𝐴 −1) 𝐶𝑜𝑠𝑡 𝐴 +1 𝑤 Where 𝑤∈[0,1] determines importance of cost.