# 3-1 Decision Tree Learning Kelby Lee. 3-2 Overview ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application.

## Presentation on theme: "3-1 Decision Tree Learning Kelby Lee. 3-2 Overview ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application."— Presentation transcript:

3-1 Decision Tree Learning Kelby Lee

3-2 Overview ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application

3-3 What is Decision Tree

3-4 What is Decision Tree u Select best attribute that classifies examples u Top Down Start with concept that represents all u Greedy Algorithm Select attribute that classifies maximum examples u Does not backtrack u ID3

3-5 ID3 Algorithm u ID3(Examples, Target_attribute, Attributes) u Create a Root node for the tree u If Examples all positive? Return Single Node Tree Root, with label = + u If Examples all negative? Return Single node Tree Root, with label = - u If Attributes is empty Return single-node tree Root, label = most common value of Target_attribute in Examples

3-6 ID3 Algorithm u Otherwise A  Best_Attribute (Attributes, Examples) Root  A v For each value v i of A –Add a new tree branch –Examples_svi is a subset of Examples for vi –If Examples_svi is empty? –Add leaf node label = most common value of Target_attribute –Add a new sub tree: ID3(Examples_svi, Target_attribute, Attributes – {A})

3-7 Selecting Best Attribute u New property of Attribute: Information Gain u Information Gain: Measures how well a given attribute separates the training examples according to their target classification

3-8 Information Gain {E1+, E2+, E3-, E4-} {E1+, E2+} {E3-, E4-} att1 {E1+, E2+, E3-, E4-} {E1+, E3-} {E2+, E4-} att2 att1 = 1 att2 = 0.5

3-9 Tree Pruning ¨ Overfit and Simplify ¨ Simplify Tree ¨ In most cases it improves accuracy

3-10 REP ¨ Reduced Error Pruning ¨ Deletes Single Conditions or Single Rules ¨ Improves on Noisy Data ¨ O(n 4 ) on large data sets

3-11 IREP ¨ Incremental Reduced Error Pruning ¨ Produces one rule at a time and eliminates all examples covered by that rule ¨ Stops when no positive examples or pruning produces unacceptable error

3-12 IREP Algorithm PROCEDURE IREP(Pos, Neg) BEGIN Ruleset := 0 WHILE Pos != 0 DO /* Grow and Prune a New Rule */ split (Pos, Neg) into (GrowPos, GrowNeg) Rule := GrowRule( GrowPos, GrowNeg ) Rule := PruneRule( Rule, PrunePos, PruneNeg )

3-13 IREP Algorithm IF error rate of Rule on ( PrunePos, PruneNeg ) exceeds 50% THEN RETURN Ruleset ELSE Add Rule to Ruleset Remove examples covered by Rule from ( Pos, Neg ) ENDIF ENDWHILE RETURN Ruleset END

3-14 RIPPER ¨ Repeated Grow and Simplify produces quite different results than REP ¨ Repeatedly prune the rule set to minimize the error ¨ Repeated Incremental Pruning to Produce Error Reduction (RIPPER)

3-15 RIPPER Algorithm PROCEDURE RIPPERk (Pos, Neg) BEGIN Ruleset : = IREP(Pos, Neg) REPEAT k TIMES Ruleset := Optimize(Ruleset, Pos, Neg) UncovPos : = Pos \ {data covered by Ruleset } UncovNeg : = Neg \ {data covered by Ruleset } Ruleset : = Ruleset  IREP(UncovPos, UncovNeg) ENDREPEAT END

3-16 Optimization Function FUNCTION Optimize (Ruleset, Pos, Neg) BEGIN FOR each rule r  Ruleset do split ( Pos, Neg) into (GrowPos, GrowNeg) and (PrunePos, PruneNeg) /* Compute Replacement for r */ r’ : = GrowRule (GrowPos, GrowNet) r’ : = PruneRule ( r’, PrunePos, PruneNeg ) guided by error of Ruleset \ {c}  {c’}

3-17 Optimization Function /* Compute Replacement for r */ r’’ : = GrowRule (GrowPos, GrowNet) r’’ : = PruneRule ( r’, PrunePos, PruneNeg ) guided by error of Ruleset \ {c}  {c’’} Replace c in Ruleset with best of c, c’, c’’ guided by description length of Compress(Ruleset\{c}  {x}) ENDFOR RETURN Ruleset END

3-18 RIPPER Data 3,6.0E+00,6.0E+00,4.0E+00,none,35,empl_contr, 7.444444444444445E+00,14,false,9,gnr,true,f ull,true,full,good. 2,4.5E+00,4.0E+00,3.913333333333334E+00,none, 40,empl_contr,7.444444444444445E+00,4,false,10,gnr,true,half,true,full,good. 3,5.0E+00,5.0E+00,5.0E+00,none,40,empl_contr, 7.444444444444445E+00,4.870967741935484E+00,false,12,avg,true,half,true,half,good. 2,4.6E+00,4.6E+00,3.913333333333334E+00,tcf,3 8,empl_contr,7.444444444444445E+00,4.870967 741935484E+00,false,1.109433962264151E+01,b a,true,half,true,half,good.

3-19 RIPPER Names file good,bad. dur:continuous. wage1:continuous. wage2:continuous. wage3:continuous. cola:none, tcf, tc. hours:continuous. pension:none, ret_allw, empl_contr. stby_pay:continuous. shift_diff:continuous. educ_allw:false, true. holidays:continuous. vacation:ba, avg, gnr. lngtrm_disabil:false, true. dntl_ins:none, half, full. bereavement:false, true. empl_hplan:none, half, full.

3-20 RIPPER Output Final hypothesis is: bad :- wage1<=2.8 (14/3). bad :- lngtrm_disabil=false (5/0). default good (34/1). =====================summary================== Train error rate: 7.02% +/- 3.41% (57 datapoints) << Hypothesis size: 2 rules, 4 conditions Learning time: 0.01 sec

3-21 RIPPER Hypothesis bad 14 3 IF wage1 <= 2.8. bad 5 0 IF lngtrm_disabil = false. good 34 1 IF..

3-22 IDS ¨ Intrusion Detection System

3-23 IDS ¨ Use Data Mining to Detect Anomaly ¨ Better than Pattern Matching since may be possible to detect undiscovered attacks

3-24 RIPPER IDS data 86,543520084,192168000120,2698,192168000190,22,6,17, 40,2096,158723779,14054,normal. 87,543520084,192168000190,22,192p168p0p120,2698,6,16,40,58387,39130843,46725,normal............................ 11,543520084,192168000190,80,192168000120,2703,6,16, 40,58400,39162494,46738,anomaly. 12,543520084,192168000190,80,192168000120,2703,6,16, 1500,58400,39162494,45277,anomaly.

3-25 RIPPER IDS names normal,anomaly. recID: ignore. timestamp: symbolic. sourceIP: set. sourcePORT: symbolic. destIP: set. destPORT: symbolic. protocol: symbolic. flags: symbolic. length: symbolic. winsize: symbolic. ack: symbolic. checksum: symbolic.

3-26 RIPPER Output Final hypothesis is: anomaly :- sourcePORT='80' (33/0). anomaly :- destPORT='80' (35/0). anomaly :- ack='7.01238e+07' (3/0). anomaly :- ack='7.03859e+07' (2/0). default normal (87/0). =================summary===================== Train error rate: 0.00% +/- 0.00% (160 datapoints) << Hypothesis size: 4 rules, 8 conditions Learning time: 0.01 sec

3-27 RIPPER Output anomaly 33 0 IF sourcePORT = 80. anomaly 35 0 IF destPORT = 80. anomaly 3 0 IF ack = 7.01238e+07. anomaly 2 0 IF ack = 7.03859e+07. normal 87 0 IF..

3-28 IDS Output

3-29 IDS Output

3-30 Conclusion ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application

Download ppt "3-1 Decision Tree Learning Kelby Lee. 3-2 Overview ¨ What is a Decision Tree ¨ ID3 ¨ REP ¨ IREP ¨ RIPPER ¨ Application."

Similar presentations