Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speaker: Li-Chin Huang1 USING DATA MINING TO DISCOVER SIGNATURES IN NETWORK-BASED INTRUSION DETECTIONINTRUSION DETECTION Procedings of the First International.

Similar presentations


Presentation on theme: "Speaker: Li-Chin Huang1 USING DATA MINING TO DISCOVER SIGNATURES IN NETWORK-BASED INTRUSION DETECTIONINTRUSION DETECTION Procedings of the First International."— Presentation transcript:

1 Speaker: Li-Chin Huang1 USING DATA MINING TO DISCOVER SIGNATURES IN NETWORK-BASED INTRUSION DETECTIONINTRUSION DETECTION Procedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002 Hong Han, Xlan-liang Lu, Li-Yong Ren

2 Speaker: Li-Chin Huang2 Outline 1.Introduction 2.SigSniffer—System architecture 3.Signature Mining 4.Conclusion 5.Comment

3 Speaker: Li-Chin Huang3 Introduction – Intrusion Detection System (IDS) An intrusion any set of actions that attempt to compromise the integrity, confidentiality or availability of a resource. --1990 R. Heady, G. Luger …

4 Speaker: Li-Chin Huang4 Introduction – Intrusion Detection System (IDS) Two kinds of analysis approaches in IDS:  Misuse detection To use patterns of well-known attacks or weak spots of the system to identify intrusions unable to detect any future intrusions  Anomaly detection Establish normal usage patterns using statistical measures on system features ex: CPU and I/O Experience is relied upon in selecting the system

5 Speaker: Li-Chin Huang5 Introduction-- example A rule of Snort: alert tcp any 110 -> $HOME_NET any (msg: “ LURHQ-01-Virus-Possible Incoming QAZ Worm ” ; content: ” |71 61 7a 77 73 78 2e 68 73 71| ” ;) A packet incoming from port 110 of TCP Contains a string (signature): 71 61 7a 77 73 78 2e 68 73 71 The worm QAZ attempts to penetrate.

6 Speaker: Li-Chin Huang6 The structure of SigSniffer SigSniffer has five parts: Packet Sensor: capture packets for Signature Miner Signature Miner: find the candidate signatures from packets Signature Set: show those candidate signatures to analysts for further analysis Associated Signature Miner: continue to mine these candidate signatures and find the associations of them Rule Set: The associations of signature are candidate rules that will be send to Rule Set.

7 Speaker: Li-Chin Huang7 The structure of SigSniffer

8 Speaker: Li-Chin Huang8  Get two sets of training data 1. Abnormal packets One contains out-coming packets of the attack tool 2. Normal packets  Every signature generated from Signature Apriori  The sample records will be classified into two classes: attack or non-attack.  using ID3 to classify the possible signatures Experiment

9 Speaker: Li-Chin Huang9 Experiments and results

10 Speaker: Li-Chin Huang10 The results of detection by Snort

11 Speaker: Li-Chin Huang11 Conclusion  to present an algorithm Signature Apriori (SA)  using the signatures of packets content

12 Speaker: Li-Chin Huang12 Comment The bottleneck of Apriori: candidate generation Huge candidate sets: 10 4 frequent 1-itemset will generate 10 7 candidate 2-itemsets To discover a frequent pattern of size 100, e.g., {a 1, a 2, …, a 100 }, one needs to generate 2 100  10 30 candidates. Multiple scans of database: Needs (n +1 ) scans, n is the length of the longest pattern  FP tree, DHP, and Invert Hashing and Pruning(IHP)

13 Speaker: Li-Chin Huang13 The main difference between Apriori and Signature Apriori 1. Apriori -- transaction is a composite of action.Apriori ex: ( PasswordFails,ExecutionDenied) 2. Signature Apriori – transaction is a permutation of action. ex: (PasswordFails,ExecutionDenied), (ExecutionDenied, PasswordFails) Signature Apriori

14 Speaker: Li-Chin Huang14 Packet 1Packet 2Packet 3Packet n … Step 1: find a set M(1, sup) Step 2: Ci = NewCandidateSig Step 3: find all M(i, sup) i = 1, 2, … length compute (1) M(1, sup) M(2, sup) Ex:{a, b, c}{ab, ba, ac, ca, bc, cb} (2) M(i, sup) i > 1 Ex: M(3, 0.8) = {‘hel’, ‘elw’, ‘mhe’, ‘ooo’, ‘ddd’} hel elw mhe ooo ddd {helw, mhel}

15 Speaker: Li-Chin Huang15 Algorithm Signature Apriori

16 Speaker: Li-Chin Huang16

17 Speaker: Li-Chin Huang17

18 Speaker: Li-Chin Huang18  目的:有效偵測來自內外部網路對主機的入侵行為  分類:  入侵偵測系統一般分為主機型 (Host) 與網路型 (Network) 兩種。 主機型 IDS, 可直接與主機伺服器上的作業系統 與應用程式做密切的整合,因此可偵測出許多網 路型 IDS 所查覺不出的攻擊模式 ( 如網頁置換、作 業系統的 kernal 竄改 ) 。 網路型 IDS 會針對網路上的連線狀態及傳輸封包 的內容進行監控。 入侵偵測系統 (IDS)

19 Speaker: Li-Chin Huang19 Introduction – Anomaly detection  difficulties  intuition and experience is relied upon  unable to detect any future intrusions

20 Speaker: Li-Chin Huang20 Sample of training data with signatures as attribute RIDSignat— Ure1 Signat— Ure2 Signat— Ure3 Signat— Ure4 Class: is_attack 1yes 2 no 3 yes noyes Table 1. Sample of training data with signatures as attribute.

21 The Apriori Algorithm — Example Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3 SID Action ActionSet

22 Mining Association Rule — Example Database D SID Action Frequent Itemset Support {A1} 2/4 = 50% {A2} 3/4 = 75% {A3} 3/4 = 75% {A5} 3/4 = 75% {A1, A3} 2/4 = 50% {A2, A3} 2/4 = 50% {A2, A5} 3/4 = 75% {A3, A5} 2/4 = 50% {A2, A3, A5} 2/4 = 50% Min. support 50% Min. confidence 50% For rule A1 => A3: support = support({A1  A3}) =

23 ID3 (example) Signat Ure1? Signat Ure2?Signat Ure3? noyes no yes no yes Is_attack

24 Training Data set (ID3 example ) This follows an example from Quinlan’s ID3 Password Fails Session CPU Session Output ProgramResourc eExhaustion Is_attack

25  Assume there are two classes, P and N Let the set of examples S contain p elements of class P and n elements of class N The amount of information, needed to decide if an arbitrary example in S belogns to P or N is defined as Information Gain (ID3) Class P (is_attack = yes) p Class N (is_attack = yes) n Set of examples S

26 Information Gain (ID3) ClassP (is_attack = yes) Class N(is_attack = no) Set of examples S S1S1 S2S2 S3S3 S4S4 Passwor dFails Session CPU Session Output Program Resource Exhaustion p1 p1 p2 p2 p3 p3 p4 p4 n1 n1 n2 n2 n3 n3 n4 n4 p n

27 Information Gain (ID3) -- Example Passwo rdFails Sessio nCPU Sessio nOutp ut ProgramResour ceExhaustion Is_attack ClassP (is_attack = yes) Class N(is_attack = no) Set of examples S S1S1 S2S2 S3S3 S4S4 Passwor dFails Session CPU Session Output Program Resource Exhaustion p1 p1 p2 p2 p3 p3 p4 p4 n1 n1 n2 n2 n3 n3 n4 n4 p n

28 Information Gain (ID3) -- Example  Class positive: Is_attack = “ yes ”  Class negative: Is_attack = “ no ” the  Compute the entropy for PasswordFails: PasswordFailsP i (Is_Attack= yes) n i (Is_Attack = no) I ( p i, n i ) <= 30230.971 30 … 40400 > 40320.971 total9514 I(p, n) = I(9, 5) = (-9/14*LOG 2 (9/14) + (-5/14*LOG 2 (5/14)= 0.940 E(PasswordFails) Gain(PasswordFails) = I(p, n) – E(passwordFails) Hence =0.940-0.693=0.247 Similarly, Gain( SessionCPU ) = 0.029 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048

29 Output: A Decision Tree for “is_attack” PasswordFails ? overcast <=30 >40 31..40 Gain(PasswordFails) = 0.247 Gain( SessionCPU ) = 0.029 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048 Session CPU Sessio nOutp ut ProgramRes ourceExhau stion Is_ attack high no high no medium no low yes medium yes Session CPU Sessio nOutp ut Progra mReso urceEx haustio n Is_ attack high noyes low yes medium noyes high yes Session CPU Sessio nOutp ut Progra mReso urceEx haustio n Is_ attack medium noyes low yes low yesno medium yes medium no

30 Output: A Decision Tree for “is_attack” PasswordFails ? >40 31..40 Gain(PasswordFails) = 0.247 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048 Gain( SessionCPU ) = 0.029 Sess ion CP U Program Resourc eExhaus tion Is_ atta ck high no high no med ium no Sess ion CP U Prog ram Reso urce Exha ustio n Is_ attac k med ium yes med ium no SessionOutput? no yes Ses sion CP U Progra mReso urceEx haustio n Is_ atta ck low yes me diu m yes SessionOutput? yes Sessi onC PU Program Resourc eExhaus tion Is_ attac k high yes low yes medi um yes high yes SessionOutput? no yes Ses sio nC PU Progr amRe sourc eExha ustion Is_ atta ck low yes low no me diu m yes <=30

31 Output: A Decision Tree for “is_attack” PasswordFails ? >40 31..40 Gain(PasswordFails) = 0.247 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048 Gain( SessionCPU ) = 0.029 Session CPU Is_ attack high no high no medium no Sess ion CP U Is_ atta ck med ium no SessionOutput? Session CPU Is_ attack low yes medium yes SessionOutput? SessionC PU Is_ attack high yes low yes medium yes high yes SessionOutput? Sess ionC PU Is_ att ack low no <=30 no yes ProgramResource Exhaustion? yes ProgramResource Exhaustion? yes ProgramResource Exhaustion? no Sess ion CP U Is_ attac k med ium yes ProgramResource Exhaustion? noyes Ses sio nC PU Is_ atta ck low yes med ium yes

32 Output: A Decision Tree for “is_attack” PasswordFails ? >40 31..40 Gain(PasswordFails) = 0.247 Gain(SessionOutput) = 0.151 Gain(ProgramResourceExhaustion) = 0.048 Gain( SessionCPU ) = 0.029 SessionOutput? <=30 no yes ProgramResource Exhaustion? yes ProgramResource Exhaustion? yes ProgramResource Exhaustion? noyes ProgramResource Exhaustion? noyes SessionCPU high Is_attack = no medium Is_attack = no SessionCPU low Is_attack = yes medium Is_attack = yes SessionCPU high med ium low Is_attack = yes med ium SessionCPU Is_attack = no Is_attack = yes SessionCPU med ium SessionC PU low Is_attack = no SessionC PU low med ium Is_attack = yes

33 Extracting Classification Rule from Trees PasswordFails ? >40 31..40 SessionOutput? <=30 no yes ProgramResource Exhaustion? yes ProgramResource Exhaustion? yes ProgramResource Exhaustion? noyes ProgramResource Exhaustion? noyes SessionCPU high Is_attack = no medium SessionCPU low Is_attack = yes medium Is_attack = yes SessionCPU high med ium low Is_attack = yes med ium SessionCPU Is_attack = no Is_attack = yes SessionCPU med ium SessionC PU low Is_attack = no SessionC PU low med ium Is_attack = yes Is_attack = no If PasswordFails = “<=30” and SessionOupt = “no” and ProgramResourceExhaustion =“no” and SessionCPU = “high” then Is_attack = “no”

34 FP-Tree (Frequent-Pattern growth) SIDList_of_Action 100I1, I2, I5 200I2, I4 300I2, I3 400I1, I2, I4 500I1, I3 600I2, I3 700I1, I3 800I1, I2, I3, I5 900I1, I2, I3 null{} Step 1: Compute L by the order of diescending support count L = [I2:7, I1:6, I3:6, I4:2, I5:2] Step 2: reorder List_of_Action in L order Step 3: construct FP-Tree SIDList_of_Action 100I2, I1, I5 200I2, I4 300I2, I3 400I2, I1, I4 500I1, I3 600I2, I3 700I1, I3 800I2, I1, I3, I5 900I2, I1, I3 reorder SID=100 I2:1 I1:1 I5:1 null{} SID=200 I2:2 I1:1 I5:1 I4:1

35 I1:2 FP-Tree (Frequent-Pattern growth) SIDList_of_Action 100I2, I1, I5 200I2, I4 300I2, I3 400I2, I1, I4 500I1, I3 600I2, I3 700I1, I3 800I2, I1, I3, I5 900I2, I1, I3 reorder null{} SID=300 I2:3 I1:1 I5:1I4:1 null{} SID=400 I2:4 I5:1 I4:1 I3:1 I4:1 I1:2 null{} SID=500 I2:4 I5:1 I3:1 I4:1 I1:1 I3:1 null{} I3:1 I4:1 I1:2 SID=600 I2:5 I5:1 I3:2 I4:1 I1:1 I4:1 I3:2 null{} I1:2 SID=700 I2:5 I5:1I3:2 I4:1 I1:2 I4:1 I3:2 null{} I1:3 SID=800 I2:6 I5:1 I3:2 I4:1 I1:2 I4:1 I3:1 I5:1 null{} I1:4 SID=900 I2:7 I5:1 I3:2 I4:1 I1:2 I4:1 I3:2 I5:1

36 FP-Tree (Frequent-Pattern growth) Item ID Support count Node- link I27 I16 I36 I42 I52 null{} I1:4 I2:7 I5:1 I3:2 I4:1 I1:2 I4:1 I3:2 I5:1 I3:2 itemConditional pattern baseConditional FP-treeFrequent patterns generated I5{(I2 I1:1), (I2 I1 I3:1)} I2 I5:2, I1 I5:2, I2 I1 I5:2 I4{(I2 I1:1), (I2:1)} I2 I4:2 I3{(I2 I1:2), (I2:2), (I1:2)}, I2 I3:4, I1 I3:4, I2 I1 I3:2 I1{I2:4} I2 I1:4


Download ppt "Speaker: Li-Chin Huang1 USING DATA MINING TO DISCOVER SIGNATURES IN NETWORK-BASED INTRUSION DETECTIONINTRUSION DETECTION Procedings of the First International."

Similar presentations


Ads by Google