Decision Tree Learning - ID3

Slides:

Advertisements

Similar presentations

Introduction to Artificial Intelligence CS440/ECE448 Lecture 21

Advertisements

Data Mining in Micro array Analysis

Data Pre-processing Data Cleaning : Sampling:

1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)

Decision Trees Decision tree representation ID3 learning algorithm

1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.

Enhancements to basic decision tree induction, C4.5

Classification Algorithms

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Decision Tree Approach in Data Mining

Decision Tree Algorithm (C4.5)

ICS320-Foundations of Adaptive and Learning Systems

Machine Learning Group University College Dublin Decision Trees What is a Decision Tree? How to build a good one…

Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.

CS 590M Fall 2001: Security Issues in Data Mining Lecture 4: ID3.

Induction of Decision Trees

Decision Trees Decision tree representation Top Down Construction

Ch 3. Decision Tree Learning

Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Decision tree learning

By Wang Rui State Key Lab of CAD&CG

Fall 2004 TDIDT Learning CS478 - Machine Learning.

Machine Learning Chapter 3. Decision Tree Learning

Artificial Intelligence 7. Decision trees

Mohammad Ali Keyvanrad

Decision tree learning Maria Simi, 2010/2011 Inductive inference with decision trees  Decision Trees is one of the most widely used and practical methods.

Machine Learning Lecture 10 Decision Tree Learning 1.

CpSc 810: Machine Learning Decision Tree Learning.

CHAPTER 18 SECTION 1 – 3 Learning from Observations.

Decision-Tree Induction & Decision-Rule Induction

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.

Longin Jan Latecki Temple University

Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.

CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.

CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.

Classification Algorithms Decision trees Rule-based induction Neural networks Memory(Case) based reasoning Genetic algorithms Bayesian networks Basic Principle.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Decision Trees, Part 1 Reading: Textbook, Chapter 6.

Decision Tree Learning

Decision Tree Learning Presented by Ping Zhang Nov. 26th, 2007.

Seminar on Machine Learning Rada Mihalcea Decision Trees Very short intro to Weka January 27, 2003.

Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.

Chapter 18 Section 1 – 3 Learning from Observations.

Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.

Friday’s Deliverable As a GROUP, you need to bring 2N+1 copies of your “initial submission” –This paper should be a complete version of your paper – something.

1 By: Ashmi Banerjee (125186) Suman Datta ( ) CSE- 3rd year.

Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.

Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

CMPT 310 Simon Fraser University Oliver Schulte Learning.

Learning from Observations

Learning from Observations

Machine Learning Inductive Learning and Decision Trees

DECISION TREES An internal node represents a test on an attribute.

Decision Trees an introduction.

Università di Milano-Bicocca Laurea Magistrale in Informatica

CS 9633 Machine Learning Decision Tree Learning

Decision Tree Learning

Machine Learning Lecture 2: Decision Tree Learning.

Artificial Intelligence

Data Science Algorithms: The Basic Methods

SAD: 6º Projecto.

Decision Tree Saed Sayad 9/21/2018.

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning: Lecture 3

Decision Trees Decision tree representation ID3 learning algorithm

Machine Learning Chapter 3. Decision Tree Learning

Learning from Observations

Decision Trees Decision tree representation ID3 learning algorithm

Machine Learning: Decision Tree Learning

Presentation transcript:

Decision Tree Learning - ID3

Decision tree examples ID3 algorithm Occam Razor Top-Down Induction in Decision Trees Information Theory gain from property P

Training Examples Day Outlook Temp. Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 Cold D10 D11 D12 D13 D14

Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity Yes Wind High Normal Strong Weak No Yes No Yes

Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity Each internal node tests an attribute High Normal Each branch corresponds to an attribute value node No Yes Each leaf node assigns a classification

Decision Tree for PlayTennis Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak ? No Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes

Decision Tree for Conjunction Outlook=Sunny  Wind=Weak Outlook Sunny Overcast Rain Wind No No Strong Weak No Yes

Decision Tree for Disjunction Outlook=Sunny  Wind=Weak Outlook Sunny Overcast Rain Yes Wind Wind Strong Weak Strong Weak No Yes No Yes

Decision Tree for XOR Outlook=Sunny XOR Wind=Weak Outlook Sunny Overcast Rain Wind Wind Wind Strong Weak Strong Weak Strong Weak Yes No No Yes No Yes

Decision Tree decision trees represent disjunctions of conjunctions Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes (Outlook=Sunny  Humidity=Normal)  (Outlook=Overcast)  (Outlook=Rain  Wind=Weak)

When to consider Decision Trees Instances describable by attribute-value pairs Target function is discrete valued Disjunctive hypothesis may be required Possibly noisy training data Missing attribute values Examples: Medical diagnosis Credit risk analysis Object classification for robot manipulator (Tan 1993)

Decision tree for credit risk assessment

The decision tree represents the classification of the table It can classify all the objects in the table Each internal node represents a test on some property Each possible value of that property corresponds to a branch of the tree An individual of unknown type may be classified be traversing this tree

Decision tree for credit risk assessment In classifying any given instance, the tree does not use all the properties in the table Decision tree for credit risk assessment If a person has a good credit history and low debit, we ignore her collateral income and classify her as low risk In spite of omitting certain tests, the tree classifies all examples in the table

In general, the size of a tree necessary to classify a given set of examples varies according to the order with which properties (=attributes) are tested Given a set of training instances and a number of different decision trees that correctly classify the, we may ask which tree has the greatest likelihood of correctly classifying using instances of the population?

This is a simplified decision tree for credit risk assessment It classifies all examples of the table correctly

ID3 algorithm assumes that a good decision tree is the simplest decision tree Heuristic: Preferring simplicity and avoiding unnecessary assumptions Known as Occam‘s Razor

Occam Razor was first articulated by the medieval logician William of Occam in 1324 born in the village of Ockham in Surrey (England) about 1285, believed that he died in a convent in Munich in 1349, a victim of the Black Death It is vain do with more what can be done with less.. We should always accept the simplest answer that correctly fits our data The smallest decision tree that correctly classifies all given examples

The simplest decision tree that covers all examples should be the least likely to include unnecessary constraints

ID3 selects a property to test at the current node of the tree and uses this test to partition the set of examples The algorithm then recursively constructs a sub tree for each partition This continuous until all members of the partition are in the same class That class becomes a leaf node of the tree

Because the order of tests is critical to constructing a simple tree, ID3 relies heavily on its criteria for selecting the test at the root of each sub tree

Top-Down Induction of Decision Trees ID3 A  the “best” decision attribute for next node Assign A as decision attribute (=property) for node 3. For each value of A create new descendant Sort training examples to leaf node according to the attribute value of the branch If all training examples are perfectly classified (same value of target attribute) stop, else iterate over new leaf nodes

ID3 constructs the tree for credit rist assessment Beginning with the full table of examples, ID3 selects income as the root property using function selecting “best” property (attribute) The examples are divided, listed by their number in the list

ID3 applies the method recursively for each partition The partition {1,4,7,11} consists entirely of high-risk individuals, a node is created ID3 selects credit history property as the root of the subtree for the partition {2,3,12,14} Credit history further divides this partition into {2,3},{14} and {12} ID3 implements a form of hill climbing in the space of all possible trees

Hypothesis Space Search ID3 two classes: +,- + - + + - + A2 + - - + - + A1 - - + A2 A2 - + - + - A3 A4 + - - +

Two classes: +,-

Information Theory Information theory (Shannon 1948) provides the information content of a message We may think of a message as an instance in a universe of possible messages The act of transmitting a message is the same as selecting one of these possible messages Define information content of a message as depending upon both the size of this universe and the frequency with which each possible message occurs

The importance of the number of possible messages is evident in an example from gambling: compare a message correctly predicting the outcome the spin of the roulette wheel with one predicting the outcome of toss of an honest coin Message concerning roulette is more valuable to us

Shannon formalized these intuitions Given a universe of messages M={m1,m2,...,mn} and a probability p(mi) for the occurrence of each message, the information content (also called entropy)of a message M is given

Information content of a message telling the outcome of the flip of a honest coin

However if the coin has been rigged to come up heads 75 percent

We may think of a decision tree as conveying information about the classification of examples in the decision table The information content of the tree is computed from the probabilities of different classifications

The credit history loan table has following information p(risk is high)=6/14 p(risk is moderate)=3/14 p(risk is low)=5/14

For a given test, the information gain provided by making that test at the root of the current tree is equal to Total information of the table - the amount of information needed to complete the classification after performing the test The amount of information needed to complete the tree is defined as weighted average of the information content of each sub tree

The amount of information needed to complete the tree is defined as weighted average of the information content of each sub tree by the percentage of the examples present C a set of training instances. If property Property (for example income) with n values, C will be divided into the subsets {C1,C2,..,Cn} Expected information needed to complete the tree after making P root

The gain from the property P is computed by subtracting the expected information to complete E(P) fro the total information

This makes the division into In the credit history loan table we make income the property tested at the root This makes the division into C1={1,4,7,11},C2={2,3,12,14},C3={5,6,8,9,10,13}

gain(income)=I(credit_table)-E(income) gain(income)=0.967 bits gain(credit history)=0.266 gain(debt)=0.581 gain(collateral)=0.756

Day Outlook Temp. Humidity Wind Play Tennis D1 Sunny Hot High Weak No D2 Strong D3 Overcast Yes D4 Rain Mild D5 Cool Normal D6 D7 D8 D9 Cold D10 D11 D12 D13 D14

Decision tree examples ID3 algorithm Occam Razor Top-Down Induction in Decision Trees Information Theory gain from property P

Enhancements to basic decision tree induction C4.5, CART algorithm