Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decision Trees (suggested time: 30 min)

Similar presentations


Presentation on theme: "Decision Trees (suggested time: 30 min)"— Presentation transcript:

1 Decision Trees (suggested time: 30 min)
Definition Mechanism Splitting Functions Issues in Decision-Tree Learning (if time permits) Avoiding overfitting through pruning Numeric and Missing attributes Applications to Security What is machine learning?

2 Example: Learning to identify Spam
Illustration Example: Learning to identify Spam Spam Not Spam Is the user unknown? No Yes Number of Recipients < N ≥ N What is machine learning?

3 There are two types of nodes:
Definition A decision-tree learning algorithm approximates a target concept using a tree representation, where each internal node corresponds to an attribute, and every terminal node corresponds to a class. There are two types of nodes: Internal node.- Splits into different branches according to the different values the corresponding attribute can take. Example: Number of recipients <= N or Number of recipients > N. Terminal Node.- Decides the class assigned to the example. What is machine learning?

4 X = (Unknown Sender, Number of recipients > N)
Classifying Examples X = (Unknown Sender, Number of recipients > N) Spam Not Spam Is the sender unknown? No Yes Number of Recipients < N ≥ N What is machine learning? Example 1 - Basic Spam filter: We decide whether a message is spam based on two tests. Is the sender unknown? This is a categorical variable with “yes/no” values Number of recipients. This is a numerical variable over N. Of course this doesn’t give the complete picture, but it is enough to filter some messages (and probably give some false positives) Assigned Class

5 Appropriate Problems for Decision Trees
Attributes are both numeric and nominal. Target function takes on a discrete number of values. Data may have errors. Some examples may have missing attribute values. What is machine learning?

6 Issues in Decision-Tree Learning Avoiding overfitting through pruning
Decision Trees Definition Mechanism Splitting Functions Issues in Decision-Tree Learning Avoiding overfitting through pruning Numeric and Missing attributes What is machine learning?

7 Historical Information
Ross Quinlan – Induction of Decision Trees. Machine Learning Journal 1: , (over 8 thousand citations) What is machine learning?

8 Historical Information
Leo Breiman – CART (Classification and Regression Trees), 1984. What is machine learning?

9 There are different ways to construct trees from data.
Mechanism There are different ways to construct trees from data. We will concentrate on the top-down, greedy search approach: Basic idea: 1. Choose the best attribute a* to place at the root of the tree. 2. Separate training set D into subsets {D1, D2, .., Dk} where each subset Di contains examples having the same value for a* 3. Recursively apply the algorithm on each new subset until examples have the same class or there are few of them. What is machine learning?

10 Attributes: Destination Port and Duration
Illustration P1 D2 Class A: Attack Class B: Benign Duration D3 Destination Port What is machine learning? Example 2 – Intrusion detection in Networks: The variables chosen are Destination Port of the message and The duration of the message. Other features of the attack can be found on: Brugger, S. Terry. "Data mining methods for network intrusion detection."University of California at Davis (2004). Attributes: Destination Port and Duration Destination Port has two values: > P1 or <= P1 Duration has three values: > D2, <=D2 and > D3, <= D3

11 Suppose we choose Destination Port
Illustration Suppose we choose Destination Port as the best attribute: D2 Destination Port > P1 <= P1 ? Duration D3 What is machine learning? P1 A Class A: Attack Class B: Benign

12 Suppose we choose Duration as the next best attribute:
Illustration Suppose we choose Duration as the next best attribute: Destination Port D2 <= P1 > P1 Duration D3 A > D2 What is machine learning? ≤ D3 P1 B A B Class A: Attack Class B: Benign > D3 and <= D2

13 Create a root for the tree
Formal Mechanism Create a root for the tree If all examples are of the same class or the number of examples is below a threshold return that class If no attributes available return majority class Let a* be the best attribute For each possible value v of a* Add a branch below a* labeled “a = v” Let Sv be the subsets of example where attribute a*=v Recursively apply the algorithm to Sv What is machine learning?

14 What attribute is the best to split the data?
Let us remember some definitions from information theory. A measure of uncertainty or entropy that is associated to a random variable X is defined as H(X) = - Σ pi log pi where the logarithm is in base 2. This is the “average amount of information or entropy of a finite complete probability scheme” (Introduction to I. Theory by Reza F.). What is machine learning?

15 There are two possible complete events A and B
(Example: flipping a biased coin). P(A) = 1/256, P(B) = 255/256 H(X) = bit P(A) = 1/2, P(B) = 1/2 H(X) = 1 bit P(A) = 7/16, P(B) = 9/16 H(X) = bit What is machine learning?

16 Entropy is a function concave downward.
1 bit What is machine learning? 0.5 1

17 Attributes: Destination Port and Duration
Illustration D2 Class A: Attack Class B: Benign Duration D3 Destination Port P1 What is machine learning? Attributes: Destination Port and Duration Destination Port has two values: > P1 or <= P1 Duration has three values: > D2, <=D2 and > D3, <= D3

18 Splitting based on Entropy
Destination Port divides the sample in two: S1 = { 6A, 0B} S2 = { 3A, 5B} D2 Duration D3 H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) What is machine learning? P1 Destination Port S1 S2

19 Splitting based on Entropy
Duration divides the sample in three: S1 = { 2A, 2B} S2 = { 5A, 0B} S3 = { 2A, 3B} D2 S2 Duration D3 S3 H(S1) = 1 H(S2) = 0 H(S3) = -(2/5)log2(2/5) -(3/5)log2(3/5) P1 Destination Port What is machine learning?

20 Information Gain IG(A) = H(S) - Σv (Sv/S) H (Sv)
H(S) is the entropy of all examples H(Sv) is the entropy of one subsample after partitioning S based on all possible values of attribute A. What is machine learning?

21 Components of IG(A) H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8)
Destination Port P1 D2 D3 Duration S1 S2 H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8) H(S) = -(9/14)log2(9/14) -(5/14)log2(5/14) |S1|/|S| = 6/14 |S2|/|S| = 8/14 What is machine learning?

22 Components of IG(A) H(S1) = 0 H(S2) = -(3/8)log2(3/8) -(5/8)log2(5/8)
|S1|/|S| = 6/14 |S2|/|S| = 8/14 Destination Port P1 D2 D3 Duration S1 S2 What is machine learning?

23 Gain Ratio Let’s define the entropy of the attribute:
H(A) = - Σ pj log pj Where pj is the probability that attribute A takes value Vj. Then GainRatio(A) = IG(A) / H(A) What is machine learning?

24 Gain Ratio S2 H(size) = -(6/14)log2(6/14) - (8/14)log2(8/14)
Destination Port P1 D2 D3 Duration S1 S2 S2 What is machine learning? H(size) = -(6/14)log2(6/14) - (8/14)log2(8/14) where |S1|/|S| = 6/14 |S2|/|S| = 8/14

25 Security Applications
Decision trees have been used in: Intrusion detection [> 11 papers] Online dynamic security assessment [He et al. ISGT 12] Password checking [Bergadano et al. CCS 97] Database inference [Chang, Moskowitz NSPW 98] Analyzing malware [Ravula et al. KDIR 11]


Download ppt "Decision Trees (suggested time: 30 min)"

Similar presentations


Ads by Google