3-1 Decision Tree Learning Kelby Lee. 3-2 Overview ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Data Mining Classification: Alternative Techniques
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Decision Tree Learning
Decision Trees Decision tree representation ID3 learning algorithm
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Learning Rules from Data
RIPPER Fast Effective Rule Induction
Decision Tree Approach in Data Mining
ICS320-Foundations of Adaptive and Learning Systems
Decision Tree Learning 主講人:虞台文 大同大學資工所 智慧型多媒體研究室.
Chapter 7 – Classification and Regression Trees
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Induction of Decision Trees
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Decision trees and empirical methodology Sec 4.3,
ICS 273A Intro Machine Learning
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Machine Learning Chapter 3. Decision Tree Learning
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Mohammad Ali Keyvanrad
Chapter 9 – Classification and Regression Trees
For Friday No reading No homework. Program 4 Exam 2 A week from Friday Covers 10, 11, 13, 14, 18, Take home due at the exam.
CpSc 810: Machine Learning Decision Tree Learning.
Decision-Tree Induction & Decision-Rule Induction
Classification and Prediction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot Readings: Chapter 6 – Han and Kamber.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CS690L Data Mining: Classification
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
Decision Trees. What is a decision tree? Input = assignment of values for given attributes –Discrete (often Boolean) or continuous Output = predicated.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
1 Decision Tree Learning Original slides by Raymond J. Mooney University of Texas at Austin.
Decision Trees, Part 1 Reading: Textbook, Chapter 6.
Class1 Class2 The methods discussed so far are Linear Discriminants.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
DECISION TREES An internal node represents a test on an attribute.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Computational Intelligence: Methods and Applications
Fast Effective Rule Induction
Classification Algorithms
Decision Tree Learning
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Data Mining Classification: Alternative Techniques
Decision Tree Saed Sayad 9/21/2018.
Rule Learning for Go Introduction Data Extraction Bad Move Problems
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Data Mining Rule Classifiers
Machine Learning Chapter 3. Decision Tree Learning
Decision Trees By Cole Daily CSCI 446.
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 2nd Edition
Data Mining CSCI 307, Spring 2019 Lecture 21
Presentation transcript:

3-1 Decision Tree Learning Kelby Lee

3-2 Overview ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application

3-3 What is Decision Tree

3-4 What is Decision Tree u Select best attribute that classifies examples u Top Down Start with concept that represents all u Greedy Algorithm Select attribute that classifies maximum examples u Does not backtrack u ID3

3-5 ID3 Algorithm u ID3(Examples, Target_attribute, Attributes) u Create a Root node for the tree u If Examples all positive? Return Single Node Tree Root, with label = + u If Examples all negative? Return Single node Tree Root, with label = - u If Attributes is empty Return single-node tree Root, label = most common value of Target_attribute in Examples

3-6 ID3 Algorithm u Otherwise A  Best_Attribute (Attributes, Examples) Root  A v For each value v i of A –Add a new tree branch –Examples_svi is a subset of Examples for vi –If Examples_svi is empty? –Add leaf node label = most common value of Target_attribute –Add a new sub tree: ID3(Examples_svi, Target_attribute, Attributes – {A})

3-7 Selecting Best Attribute u New property of Attribute: Information Gain u Information Gain: Measures how well a given attribute separates the training examples according to their target classification

3-8 Information Gain {E1+, E2+, E3-, E4-} {E1+, E2+} {E3-, E4-} att1 {E1+, E2+, E3-, E4-} {E1+, E3-} {E2+, E4-} att2 att1 = 1 att2 = 0.5

3-9 Tree Pruning ¨ Overfit and Simplify ¨ Simplify Tree ¨ In most cases it improves accuracy

3-10 REP ¨ Reduced Error Pruning ¨ Deletes Single Conditions or Single Rules ¨ Improves on Noisy Data ¨ O(n 4 ) on large data sets

3-11 IREP ¨ Incremental Reduced Error Pruning ¨ Produces one rule at a time and eliminates all examples covered by that rule ¨ Stops when no positive examples or pruning produces unacceptable error

3-12 IREP Algorithm PROCEDURE IREP(Pos, Neg) BEGIN Ruleset := 0 WHILE Pos != 0 DO /* Grow and Prune a New Rule */ split (Pos, Neg) into (GrowPos, GrowNeg) Rule := GrowRule( GrowPos, GrowNeg ) Rule := PruneRule( Rule, PrunePos, PruneNeg )

3-13 IREP Algorithm IF error rate of Rule on ( PrunePos, PruneNeg ) exceeds 50% THEN RETURN Ruleset ELSE Add Rule to Ruleset Remove examples covered by Rule from ( Pos, Neg ) ENDIF ENDWHILE RETURN Ruleset END

3-14 RIPPER ¨ Repeated Grow and Simplify produces quite different results than REP ¨ Repeatedly prune the rule set to minimize the error ¨ Repeated Incremental Pruning to Produce Error Reduction (RIPPER)

3-15 RIPPER Algorithm PROCEDURE RIPPERk (Pos, Neg) BEGIN Ruleset : = IREP(Pos, Neg) REPEAT k TIMES Ruleset := Optimize(Ruleset, Pos, Neg) UncovPos : = Pos \ {data covered by Ruleset } UncovNeg : = Neg \ {data covered by Ruleset } Ruleset : = Ruleset  IREP(UncovPos, UncovNeg) ENDREPEAT END

3-16 Optimization Function FUNCTION Optimize (Ruleset, Pos, Neg) BEGIN FOR each rule r  Ruleset do split ( Pos, Neg) into (GrowPos, GrowNeg) and (PrunePos, PruneNeg) /* Compute Replacement for r */ r’ : = GrowRule (GrowPos, GrowNet) r’ : = PruneRule ( r’, PrunePos, PruneNeg ) guided by error of Ruleset \ {c}  {c’}

3-17 Optimization Function /* Compute Replacement for r */ r’’ : = GrowRule (GrowPos, GrowNet) r’’ : = PruneRule ( r’, PrunePos, PruneNeg ) guided by error of Ruleset \ {c}  {c’’} Replace c in Ruleset with best of c, c’, c’’ guided by description length of Compress(Ruleset\{c}  {x}) ENDFOR RETURN Ruleset END

3-18 RIPPER Data 3,6.0E+00,6.0E+00,4.0E+00,none,35,empl_contr, E+00,14,false,9,gnr,true,f ull,true,full,good. 2,4.5E+00,4.0E+00, E+00,none, 40,empl_contr, E+00,4,false,10,gnr,true,half,true,full,good. 3,5.0E+00,5.0E+00,5.0E+00,none,40,empl_contr, E+00, E+00,false,12,avg,true,half,true,half,good. 2,4.6E+00,4.6E+00, E+00,tcf,3 8,empl_contr, E+00, E+00,false, E+01,b a,true,half,true,half,good.

3-19 RIPPER Names file good,bad. dur:continuous. wage1:continuous. wage2:continuous. wage3:continuous. cola:none, tcf, tc. hours:continuous. pension:none, ret_allw, empl_contr. stby_pay:continuous. shift_diff:continuous. educ_allw:false, true. holidays:continuous. vacation:ba, avg, gnr. lngtrm_disabil:false, true. dntl_ins:none, half, full. bereavement:false, true. empl_hplan:none, half, full.

3-20 RIPPER Output Final hypothesis is: bad :- wage1<=2.8 (14/3). bad :- lngtrm_disabil=false (5/0). default good (34/1). =====================summary================== Train error rate: 7.02% +/- 3.41% (57 datapoints) << Hypothesis size: 2 rules, 4 conditions Learning time: 0.01 sec

3-21 RIPPER Hypothesis bad 14 3 IF wage1 <= 2.8. bad 5 0 IF lngtrm_disabil = false. good 34 1 IF..

3-22 IDS ¨ Intrusion Detection System

3-23 IDS ¨ Use Data Mining to Detect Anomaly ¨ Better than Pattern Matching since may be possible to detect undiscovered attacks

3-24 RIPPER IDS data 86, , ,2698, ,22,6,17, 40,2096, ,14054,normal. 87, , ,22,192p168p0p120,2698,6,16,40,58387, ,46725,normal , , ,80, ,2703,6,16, 40,58400, ,46738,anomaly. 12, , ,80, ,2703,6,16, 1500,58400, ,45277,anomaly.

3-25 RIPPER IDS names normal,anomaly. recID: ignore. timestamp: symbolic. sourceIP: set. sourcePORT: symbolic. destIP: set. destPORT: symbolic. protocol: symbolic. flags: symbolic. length: symbolic. winsize: symbolic. ack: symbolic. checksum: symbolic.

3-26 RIPPER Output Final hypothesis is: anomaly :- sourcePORT='80' (33/0). anomaly :- destPORT='80' (35/0). anomaly :- ack=' e+07' (3/0). anomaly :- ack=' e+07' (2/0). default normal (87/0). =================summary===================== Train error rate: 0.00% +/- 0.00% (160 datapoints) << Hypothesis size: 4 rules, 8 conditions Learning time: 0.01 sec

3-27 RIPPER Output anomaly 33 0 IF sourcePORT = 80. anomaly 35 0 IF destPORT = 80. anomaly 3 0 IF ack = e+07. anomaly 2 0 IF ack = e+07. normal 87 0 IF..

3-28 IDS Output

3-29 IDS Output

3-30 Conclusion ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application