Exercises Decision Trees In decision tree learning, the information gain criterion helps us select the best attribute to split the data at every node.

Slides:



Advertisements
Similar presentations
FORS 8450 Advanced Forest Planning Lecture 4 Extensions to Linear Programming.
Advertisements

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Machine Learning Decision Trees. Exercise Solutions.
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Final Exam: May 10 Thursday. If event E occurs, then the probability that event H will occur is p ( H | E ) IF E ( evidence ) is true THEN H ( hypothesis.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Decision Trees.
Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…
Data Quality Class 9. Rule Discovery Decision and Classification Trees Association Rules.
Decision Tree Algorithm
Constructing Decision Trees. A Decision Tree Example The weather data example. ID codeOutlookTemperatureHumidityWindyPlay abcdefghijklmnabcdefghijklmn.
Induction of Decision Trees
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Ensemble Learning (2), Tree and Forest
Learning Chapter 18 and Parts of Chapter 20
Decision Tree Learning
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Ch10 Machine Learning: Symbol-Based
CS Decision Trees1 Decision Trees Highly used and successful Iteratively split the Data Set into subsets one attribute at a time, using most informative.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
MULTI-INTERVAL DISCRETIZATION OF CONTINUOUS VALUED ATTRIBUTES FOR CLASSIFICATION LEARNING KIRANKUMAR K. TAMBALKAR.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
1 Illustration of the Classification Task: Learning Algorithm Model.
Big Data Analysis and Mining Qinpei Zhao 赵钦佩 2015 Fall Decision Tree.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Decision Trees.
Classification and Regression Trees
Oliver Schulte Machine Learning 726 Decision Tree Classifiers.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
CSE343/543 Machine Learning: Lecture 4.  Chapter 3: Decision Trees  Weekly assignment:  There are lot of applications and systems using machine learning.
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
Classification Algorithms
Decision Trees.
Analysis and design of algorithm
Decision Trees Greg Grudic
Machine Learning: Lecture 3
Lecture 05: Decision Trees
Decision trees.
Chapter 7: Transformations
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

Exercises Decision Trees In decision tree learning, the information gain criterion helps us select the best attribute to split the data at every node. Information gain is briefly described as: Gain(S,A) = Entropy(S) - Average weighted entropy induced by A. Notice that the first term “Entropy(S)” is the same for all attributes. If this is the case, why do we need it? Why not simply choose the attribute that maximizes the second term (or minimizes average weighted entropy induced by A)?

Exercise 2 In decision tree learning we assign one attribute to each internal node of the tree, normally by choosing the one attribute with maximum value for a certain quality metric (e.g., information gain or gain ratio). Assume you have only binary attributes (Boolean) and that you have been asked to modify the mechanism of decision trees by assigning two attributes instead of one at each internal node. Each pair of attributes will be joined by logical operator AND. For example, let's assume we have three attributes A1, A2, and A3. Our candidates for a tree node are A1&A2, or A1&A3, or A2&A3. Answer the following questions: How many branches would come out of each internal node? Can we use information gain or gain ratio to choose the best pair of attributes (i.e., conjunction of attributes)? Explain.

In decision tree learning we assign one attribute to each internal node of the tree, normally by choosing the one attribute with maximum value for a certain quality metric (e.g., information gain or gain ratio). Assume you have only binary attributes (Boolean) and that you have been asked to modify the mechanism of decision trees by assigning two attributes instead of one at each internal node. Each pair of attributes will be joined by logical operator AND. For example, let's assume we have three attributes A1, A2, and A3. Our candidates for a tree node are A1&A2, or A1&A3, or A2&A3. Answer the following questions: How many branches would come out of each internal node? Can we use information gain or gain ratio to choose the best pair of attributes (i.e., conjunction of attributes)? Explain.

Question Part 1 The first term is important because it indicates how much we are reducing entropy before spitting the data. If we only use the second term we miss relevant information: if the difference between the first and second term is very small then it is not worth splitting the data any further.

Question Part 2 How many branches would come out of each internal node? Answer: 2 Can we use information gain or gain ratio to choose the best pair of attributes (i.e., conjunction of attributes)? Explain. Answer: Yes, each conjunction would stand as a new feature with two values. Both metrics are perfectly valid in this setting.