Decision Trees (suggested time: 30 min)

Slides:



Advertisements
Similar presentations
Data Mining Lecture 9.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification Techniques: Decision Tree Learning
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
Decision Tree Algorithm
Induction of Decision Trees
Rule induction: Ross Quinlan's ID3 algorithm Fredda Weinberg CIS 718X Fall 2005 Professor Kopec Assignment #3.
Lecture 5 (Classification with Decision Trees)
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
LEARNING DECISION TREES
ICS 273A Intro Machine Learning
Decision Tree Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Chapter 9 – Classification and Regression Trees
Learning from Observations Chapter 18 Through
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Decision Tree Learning
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Decision Trees.
1 Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) [Edited by J. Wiebe] Decision Trees.
Machine Learning Recitation 8 Oct 21, 2009 Oznur Tastan.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 4-Inducción de árboles de decisión (1/2) Eduardo Poggi.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
k-Nearest neighbors and decision tree
Decision Tree Learning
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Introduction to Data Mining, 2nd Edition by
Classification and Prediction
ID3 Algorithm.
Machine Learning Chapter 3. Decision Tree Learning
Data Mining – Chapter 3 Classification
Machine Learning Chapter 3. Decision Tree Learning
Decision trees.
Statistical Learning Dong Liu Dept. EEIS, USTC.
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
Presentation transcript:

Decision Trees (suggested time: 30 min) Definition Mechanism Splitting Functions Issues in Decision-Tree Learning (if time permits) Avoiding overfitting through pruning Numeric and Missing attributes Applications to Security What is machine learning?

Example: Learning to identify Spam Illustration Example: Learning to identify Spam Spam Not Spam Is the user unknown? No Yes Number of Recipients < N ≥ N What is machine learning?

There are two types of nodes: Definition A decision-tree learning algorithm approximates a target concept using a tree representation, where each internal node corresponds to an attribute, and every terminal node corresponds to a class. There are two types of nodes: Internal node.- Splits into different branches according to the different values the corresponding attribute can take. Example: Number of recipients <= N or Number of recipients > N. Terminal Node.- Decides the class assigned to the example. What is machine learning?

X = (Unknown Sender, Number of recipients > N) Classifying Examples X = (Unknown Sender, Number of recipients > N) Spam Not Spam Is the sender unknown? No Yes Number of Recipients < N ≥ N What is machine learning? Example 1 - Basic Spam filter: We decide whether a message is spam based on two tests. Is the sender unknown? This is a categorical variable with “yes/no” values Number of recipients. This is a numerical variable over N. Of course this doesn’t give the complete picture, but it is enough to filter some messages (and probably give some false positives) Assigned Class

Appropriate Problems for Decision Trees Attributes are both numeric and nominal. Target function takes on a discrete number of values. Data may have errors. Some examples may have missing attribute values. What is machine learning?

Issues in Decision-Tree Learning Avoiding overfitting through pruning Decision Trees Definition Mechanism Splitting Functions Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and Missing attributes What is machine learning?

Historical Information Ross Quinlan – Induction of Decision Trees. Machine Learning Journal 1: 81-106, 1986 (over 8 thousand citations) What is machine learning?

Historical Information Leo Breiman – CART (Classification and Regression Trees), 1984. What is machine learning?

There are different ways to construct trees from data. Mechanism There are different ways to construct trees from data. We will concentrate on the top-down, greedy search approach: Basic idea: 1. Choose the best attribute a* to place at the root of the tree. 2. Separate training set D into subsets {D1, D2, .., Dk} where each subset Di contains examples having the same value for a* 3. Recursively apply the algorithm on each new subset until examples have the same class or there are few of them. What is machine learning?

Attributes: Destination Port and Duration Illustration P1 D2 Class A: Attack Class B: Benign Duration D3 Destination Port What is machine learning? Example 2 – Intrusion detection in Networks: The variables chosen are Destination Port of the message and The duration of the message. Other features of the attack can be found on: Brugger, S. Terry. "Data mining methods for network intrusion detection."University of California at Davis (2004). Attributes: Destination Port and Duration Destination Port has two values: > P1 or <= P1 Duration has three values: > D2, <=D2 and > D3, <= D3

Suppose we choose Destination Port Illustration Suppose we choose Destination Port as the best attribute: D2 Destination Port > P1 <= P1 ? Duration D3 What is machine learning? P1 A Class A: Attack Class B: Benign

Suppose we choose Duration as the next best attribute: Illustration Suppose we choose Duration as the next best attribute: Destination Port D2 <= P1 > P1 Duration D3 A > D2 What is machine learning? ≤ D3 P1 B A B Class A: Attack Class B: Benign > D3 and <= D2

Create a root for the tree Formal Mechanism Create a root for the tree If all examples are of the same class or the number of examples is below a threshold return that class If no attributes available return majority class Let a* be the best attribute For each possible value v of a* Add a branch below a* labeled “a = v” Let Sv be the subsets of example where attribute a*=v Recursively apply the algorithm to Sv What is machine learning?

What attribute is the best to split the data? Let us remember some definitions from information theory. A measure of uncertainty or entropy that is associated to a random variable X is defined as H(X) = - Σ pi log pi where the logarithm is in base 2. This is the “average amount of information or entropy of a finite complete probability scheme” (Introduction to I. Theory by Reza F.). What is machine learning?

There are two possible complete events A and B (Example: flipping a biased coin). P(A) = 1/256, P(B) = 255/256 H(X) = 0.0369 bit P(A) = 1/2, P(B) = 1/2 H(X) = 1 bit P(A) = 7/16, P(B) = 9/16 H(X) = 0.989 bit What is machine learning?

Entropy is a function concave downward. 1 bit What is machine learning? 0.5 1

Attributes: Destination Port and Duration Illustration D2 Class A: Attack Class B: Benign Duration D3 Destination Port P1 What is machine learning? Attributes: Destination Port and Duration Destination Port has two values: > P1 or <= P1 Duration has three values: > D2, <=D2 and > D3, <= D3

Splitting based on Entropy Destination Port divides the sample in two: S1 = { 6A, 0B} S2 = { 3A, 5B} D2 Duration D3 H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) What is machine learning? P1 Destination Port S1 S2

Splitting based on Entropy Duration divides the sample in three: S1 = { 2A, 2B} S2 = { 5A, 0B} S3 = { 2A, 3B} D2 S2 Duration D3 S3 H(S1) = 1 H(S2) = 0 H(S3) = -(2/5)log2(2/5) -(3/5)log2(3/5) P1 Destination Port What is machine learning?

Information Gain IG(A) = H(S) - Σv (Sv/S) H (Sv) H(S) is the entropy of all examples H(Sv) is the entropy of one subsample after partitioning S based on all possible values of attribute A. What is machine learning?

Components of IG(A) H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) Destination Port P1 D2 D3 Duration S1 S2 H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) H(S) = -(9/14)log2(9/14) -(5/14)log2(5/14) |S1|/|S| = 6/14 |S2|/|S| = 8/14 What is machine learning?

Components of IG(A) H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) |S1|/|S| = 6/14 |S2|/|S| = 8/14 Destination Port P1 D2 D3 Duration S1 S2 What is machine learning?

Gain Ratio Let’s define the entropy of the attribute: H(A) = - Σ pj log pj Where pj is the probability that attribute A takes value Vj. Then GainRatio(A) = IG(A) / H(A) What is machine learning?

Gain Ratio S2 H(size) = -(6/14)log2(6/14) - (8/14)log2(8/14) Destination Port P1 D2 D3 Duration S1 S2 S2 What is machine learning? H(size) = -(6/14)log2(6/14) - (8/14)log2(8/14) where |S1|/|S| = 6/14 |S2|/|S| = 8/14

Security Applications Decision trees have been used in: Intrusion detection [> 11 papers] Online dynamic security assessment [He et al. ISGT 12] Password checking [Bergadano et al. CCS 97] Database inference [Chang, Moskowitz NSPW 98] Analyzing malware [Ravula et al. KDIR 11]