Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real-Time Intrusion Detection Systems Sandeep Kotagiri Graduate Student, CACS April 11 th 2006.

Similar presentations

Presentation on theme: "Real-Time Intrusion Detection Systems Sandeep Kotagiri Graduate Student, CACS April 11 th 2006."— Presentation transcript:

1 Real-Time Intrusion Detection Systems Sandeep Kotagiri Graduate Student, CACS April 11 th 2006

2 Papers Presented ADMIT : Anomaly-based Data Mining for Intrusions  K. Sequeira, M. Zaki  ACM SIGKDD, 2002. Integrated Access Control and Intrusion Detection for Web Servers  Tatyana Ryutov, Clifford Neuman, Dongho Kim, and Li Zhou  IEEE Transactions on Parallel & Distributed Systems, September 2003 The Specification and Enforcement of Advanced Security Policies  Tatyana Ryutov and Clifford Neuman  IEEE Proceedings of the Third International Workshop on Policies for Distributed Systems and Networks 2002

3 ADMIT: Anomaly-based Data Mining for Intrusions According to the 2000 Computer Security Institute/FBI computer crime study, 85% of the 538 companies surveyed, reported an intrusion or exploit of their corporate data, with 64% suffering a loss. Features of a good IDS ADMIT: Real time IDS with host-based data collection and processing Problem : Differentiate between masqueraders and the true users of a computer terminal How: augment password authentication with ADMIT What does ADMIT do? It is terminal resident, monitors terminal usage for user, creates user profile and verifies data against it.

4 Overview of ADMIT Types of IDS: signature based and anomaly based Network level data, System call-level data, User command-level data User profile for intrusion detection through clustering Observation : Distribution of test point to clusters changes significantly at the time of attacks which is an indicator of anomalous behavior ADMIT is a user-profile dependent, temporal sequence clustering based, real-time intrusion detection system with host based data collection and processing. Advantages using clustering  Model scaling  Reduction of noise through cluster support  Analyzing cluster centers and thus significant data reduction  Intra-cluster similarity threshold and alarms (Type A and Type B)

5 ADMIT ARCHITECTURE  2 main stages : training and testing  Capturing user data :  Unix shell command data captured via t(csh) mechanism  Recognizer parses user history data and emits them as tokens  Session: all data between logging on and logging off (*SOF* and *EOF*)

6 Parsing user data into tokens An example session *SOF* ; Is –l ; vi tl.txt ; ps –eaf ; vi t2.txt ; ls -a /usr/bin/* ; rm -i /home/* ; vi t3.txt t4.txt ; ps –ef ; *EOF* Conversion to Tokens T={ti :0, t2 = ps-eaf, t3 =vi, t4 = Is -a, t5 = rm -i, t6 = vi, and t7= ps -ef. gives the number of arguments (n) of a command vi t1.txt is tokenized as vi and vi t3.txt t5.txt t6.txt as vi

7 Familiarizing with terms used sequence s, of specified length l, is a list of tokens, occurring contiguously in the same session of audit data, i.e., s  T l, where T is the token alphabet. cluster c, is a collection of sequences of user initiated command data, such that all its sequences are very similar to others within itself using some similarity measure Sim(), but different from those in other clusters. If c={s0,s1,s2,…..,sn-1} is a cluster with n sequences then cluster center s c is A profile p, is the set of clusters of sequences of user-initiated command data whose centers characterize the user behavior. Thus, for user u, Where r and r’ are intra-cluster and inter-cluster similarity threshold and Sim(s1,s2) is similarity between two sequences and

8 Flow of Control in ADMIT

9 Similarity Measure Sim(s1, s2) 2 sequences s 1 ={vi, ps-eaf, vi,ls –a,} S 2 ={vi, ls –a, rm –i, vi } MCP (match count polynomial bound ) : counts the number slots in the two sequences for which both have identical tokens  MCP for above example is 1 MCE (match count exponential bound) is a variant of MCP in that it doubles for each matching value MCAP/MCAE (Match Count with Adjacency Reward and Polynomial/Exponential Bound) is a variant of MCP/MCE where adjacent matches are rewarded LCS (Longest Common Subsequence) is length of longest subsequences of tokens that the sequences have in common  It is 2 for the above sequences

10 ADMIT Algorithms Data Training  Data Pre-processing  Clustering user sequences  Cluster refinement Merge clusters Split clusters Online Testing  Real-time data pre-processing  Similarity search within profile  Sequence rating  Sequence classification

11 Data Training – Data Pre-processing SOF* ; ls -1 ; vi t1.txt ; ps –eaf ; vi t2.txt ; ls - a /usr/bin/* ; rm -i/home/*; vi t3.txt t4.txt; ps -ef; *EOF* FeatureSelector parses, cleans and tokenizes the audit data, within each session specified by the ProfileManager. T = {ti : 0 _, t2 = ps -eaf, t3 = vi, t4 = ls - a, t5 = rm -i, t6 = vi, and t7 = ps -ef. FeatureSelector creates sequences of length l. For e.g. if l=4 the set of user sequences is given as S={s i : 0 < I < |T| - l} Where S 0 = { ls -1, vi, ps -eaf, vi } S 1 = { vi, ps -eaf, vi, ls -a } s 2 = { ps -eaf, vi, ls -a, rm -i } s 3 = {vi, ls -a, rm -i, vi } s 4 = {Is -a, rm -i, vi, ps -ef }

12 Data Training – Clustering User Sequences

13 Example: with r = 3 Initially S u, = S u a = {s 0, s 1, s 2, s 3, s 4 }, p u, = S u c = 0. Say new center is s 0. For all remaining sequences in S u - S u c where S u c = {s 0 }, we compute similarity to the new center s 0. Using LCS as the similarity metric we get Sim(s 1, s 0 ) = 3 since vi, ps -eaf, vi is their LCS. || y we get: Sim(s 2, s 0 ) = 2, Sim(s 3, s 0 ) = 1, and Sim(s 4, s 0 ) = 0. Since s 1 passes the threshold, we add it to the new cluster to get c new = {s 0, s 1 }. Therefore the new S u a = {s 2, s 3, s 4 }. Repeating the while loop we get the profile as p u, = {c 0 = {s 0, s 1 }, c 1 = {s 2 }, c 2 = {s 3, s 4 }}.

14 Data Training – Cluster Refinement Purpose of Cluster Refinement  setting the intra-cluster similarity r may require experimentation.  Cluster may have a lot in common with another  Larger sub-clusters within clusters Algorithms

15 Data Training – Cluster Refinement Example From above p u, = {c 0,c l,c 2 } and r' = 2 Using LCS, Sim(c 0,c l ) = Sim(s 0,s 2 ) = 2. In this case, the two clusters should be merged to get c 0 = {s 0, s 1, s 2 } Now c 1 is deleted from the profile. Also, the center for c 0 becomes s 1. For clusters that have high support, SplitClusters calls DynamicClustering to re-cluster them into smaller, higher density clusters.

16 Online Testing – Real Time Data Pre-processing Testing must happen in an online manner as the user sequences are produced Example Sequence: *SOF*; vi t4.txt ; vi t4.txt ; vi t4.txt ; ls -a/home/* ; rm - i/home/turbo/tmp/*; ls- a/home/* ; vi t2.txt t4.txt ; ps –el ; Right padding is done in the absence of complete sequences Tokenizing : T' = {t’ i : 0, t’ 1 = vi, t’ 2 = vi, t’ 3 = ls -a, t’ 4 = rm -i, t‘ 5 = ls -a,t’ 6 = vi, t’ 7 = ps -of. For l=4 S' = {s’ i : 0 < i < IT'I - l}  s’ 0 = {vi,vi,vi,ls -a }  s’ 1 = {vi,vi,ls -a,rm -i }  s’ 2 = {vi,ls -a,rm -i,ls -a }  s’ 3 = {Is-a,rm-i,ls-a,vi }  s’ 4 = {rm -i,ls -a,vi,ps -ef}

17 Online Testing – Profile Search for each sequence s’ i, find the most similar cluster in p u similarity between a sequence s’ i and a profile p u Sim(s’ i,p u,) = max cj, {Sim(s’ i, s cj )} Example p u = {c 0 = {s 0, s* 1, s 2 }, c 1 = {s* 3, s 4 }} (cluster centers are indicated with '*'). Then Sim(s’ 0,p u =) = max( Sim( s 0, s c0 ), Sim(s 0, s c1 ) ) = max( Sim(s 0, s 1 ), Sim(s 0, s 3 )) = max(3, 2) = 3. Similarly Sim(s’ 1,p u ) = 3, Sim(s’ 2,p u ) = 3, Sim(s’ 3,p u ) = 3, and Sim( s’ 4,p u ) = 2.

18 Online Testing – Sequence Rating Noisy data and high false positive rates Using past sequences, present sequences are tested to see if it is noise or true change in profile LAST_n  Arithmetic mean of the similarity of last n sequences  For the five new sequences, using this rating metric with n = 3, we would get the following ratings: Ro = R1 = R2 =R3 = 3, and Ra = 8/3 = 2.67

19 Online Testing – Sequence Rating WEIGHTED  The weighted mean of the last rating and the current sequence's similarity. The rating R j for the jth sequence is calculated as  Rj =  *Sim(s j.,p u ) + (1 –  ) * R j-1, where R 0 = Sim(s’ 0,p u ).  For example, if  = 0.33, then Ro =R1 =R2 =R3 =3, and R4 =2.66. DECAYED_WEIGHTS  A variant of WEIGHTED.  is varied according to the sequence number  The rating Rj for jth sequence is calculated as  E.g. if y = 4100 and z = 7500, then R0 = R1 R2 = R3 = 3, and R4 = 2.66.

20 Online Testing: Prediction (Normal Vs Anomaly) Normal i.e. true user, anomaly i.e. possible masquerader Based upon the sequence rating R j for sequence s j Normal Sequences  T ACCEPT is lower accept threshold  If user sequence rating > T ACCEPT then normal user  E.g. T ACCEPT =2.7, for WEIGHTED rating metric (a = 0.33) no alarm will be raised for s’0, since R0 = 3 > 2.7. || y, s’ 1, s‘ 2, s‘ 3 are all normal; assigned to the nearest profile cluster, e.g., c 0 = {s 0, s* 1, s 2, s’ 0, s’ 1 } and c 1 = {s* 3, s 4, s’ 2, s’ 3 } Cluster centers are recalculated

21 Online Testing: Prediction (Normal Vs Anomaly) Anomalous Sequences  Sequences that fail T ACCEPT Test  E.g. for s’ 4 R 4 =2.66 < 2.7 Type A alarm Reasons  Noise (typing errors)  Concept drift (change of project)  Anomalous Sequence larger the number of anomalous sequences in near succession, the more suspicious the identity of the user Cluster the anomalous sequences to get a better estimate of behavioral change Type B alarm if cluster size crosses certain threshold T cluster

22 Incremental Clustering Algorithm Initially p u ={c 0,c 1 }, S” a =  and S C U ={s 1,s 3 } Since R 4 =2.66<2.7 s’ i =s’ 4 Assign s’ 4 to S” a and p u =p u U (c 2 ={s’ 4 }) After testing p u becomes p u =( c 0 = {s 0, s* 1, s 2, s’ 0, s’ 1 }, c 1 = {s* 3, s 4, s’ 2, s’ 3 }, c 2 ={s’ 4 }

23 Results The system achieves approximately 80% detection rate and 15% false positive rate The security analyst should only go through the anomalous clusters instead of vast amounts of audit data

24 Integrated Access Control and Intrusion Detection for Web Servers Problems faced by Web Servers  Stealing and destroying data  Denying user access  Changing website content to embarrass organizations  Subverting Web Servers through vulnerable cgi scripts  Denial of Service (DOS) attack Traditional access control systems were not designed to detect and adjust their behavior to take corrective action Separate components like fire-walls, IDSs and code integrity checkers – they do not fully address a web server’s security needs. This approach supports access control policies extended with the capability of identifying intrusions and respond to the intrusions in real time.

25 Generic Application Level Intrusion Detection Framework

26 Generic Authorization and Access Control API Supports fine grained access control and application level intrusion detection and response Evaluates HTTP requests and determines whether the requests are allowed and if they represent a threat according to a policy. Provides general-purpose execution environment in which EACLs are evaluated Policy Enforcement – 3 phases  Before requested operation starts (is the operation authorized)  During execution of the authorized operation (detect malicious behavior during exec)  After operation completes (logging and notification whether the operation succeeded or failed ) respond to suspected intrusion in real-time before it causes damage Can be easily integrated with different applications  Apache Web server, SOCKS5, sshd, and FreeS/WAN IPsec for Linux.

27 Policy Representation - EACL EACL-Extended Access Control List  Simple policy language designed to describe user-level authorization policy  EACL is associated with an object to be protected Specifies negative and positive access rights on the object Also has optional set of associated conditions Types of Conditions  Pre-conditions : What must be true in order to grant request  Request-result conditions : must be activated whether granted or denied  Mid-conditions : what must be true during the execution of requested op  Post-conditions: what must happen after the completion of operation EACL entry consists of positive or negative access rights and four condition blocks : a set of pre-conditions ……

28 EACL Syntax An EACL is specified according to the following format: eacl ::= {eacl_entry} eacl_entry ::= pos_access_ right_ conditions | neg_access_right_conditions pos_access_right ::= "pos_access_right" def_auth value neg_access_right ::= "neg_access_right" def_auth_value conditions ::= pre_conds mid_conds rr_conds post_conds pre_conds ::= {condition} mid_conds ::= {condition} rr_conds ::= {condition} post_conds ::= {condition} condition ::= cond_type def_auth value cond_type ::= alphanumeric_string def_auth ::= alphanumeric_string value ::= alphanumeric_string cond_type : type of condition def_auth : authority responsible for defining the value within cond_type value : value of the condition

29 EACL Example : Access to host # EACL entry 1 neg_access_right test host_login pre_cond_access_id KerberosV.5 tom@ORGB.EDU # EACL entry 2 pos_access_right test host_login pre_cond_location IPsec pre_cond_access_id X509”/C=US/O=Trusted/ partnerB” pre_cond_threshold_local <3 failures/day/failed log/ rr_cond_update_log local on : failure/failed_log/info:userID mid_cond_duration local _< 8hrs # EACL entry 3 pos access right test host login pre cond location IPsec pre cond access id KerberosV.5 partnerb@ORGB.EDU pre cond threshold local <3 failures/day/failed log/ rr cond update log local on:failure/failed log/info:userID mid cond duration local < 8hrs # EACL entry 4 pos access right test host check status pre cond location IPsec # EACL entry 5 pos access right test host shut down pre cond access id KerberosV.5 trusted@ORGA.EDU rr cond audit local on:success/info:userID post cond notify local email/to:sysadmin/on:failure

30 EACL Policy Composition and Modules in GAA Policy Composition  Process of relating separately specified policies  System-wide policy and local policy (merged)  System-wide policy specifies a composition mode that describes how local policies are to be composed with it Expand – disjunction of rights Narrow – conjunction of rights Stop – local policies are ignored GAA Modules  Access Control  Detector  Countermeasure handler Security Database

31 GAA-API and IDS Interaction “GAA-API to IDS” Interaction  Ill-formed access requests  Access request with abnormal parameters  Denied Access  Exceeding threshold  Incidents and Suspicious application behavior  Legitimate activity (creating and updating user profiles) “IDS to GAA-API” Interaction  Can be used for updating policies and adjusting policy values such as thresholds, times and locations.

32 GAA-API and APACHE Integration Apache Access Control.htaccess file Order Deny; Allow Deny from All Allow from 10:0:0:0=255:0:0:0 AuthType Basic AuthUserFile /usr/local/apache2/:htpasswd-isi- staff Require valid-user Satisfy All Access request _--> check access control policies Outputs: HTTP_OK HTTP_DECLINED HTTP_AUTHREQUIRED

33 GAA-API to Enhance the Access Control of Apache Server Apache Server does not support fine-grained policies like  Which users or user groups from which location are allowed to access  Does not support other conditions like time, threat level, system load. GAA-APACHE Access Control  Makes use of system-wide and local policy and configuration files  3 status values are returned to describe policy enforcement process Authorization Status S a indicates whether the request is authorized (GAA_YES), not authorized (GAA_NO) or uncertain (GAA_MAYBE) Midcondition enforcement status S m indicate status of mid-conditions Postcondition enforcement status S p indicate the status of post-conditions Policy evaluation happens in four phases as in the figure Sa to Apache format  GAA_YES  HTTP_OK  GAA_NO  HTTP_DECLINED  GAA_MAYBE  HTTP_AUTHREQUIRED

34 Examples When system level is higher than low, lock down the system and require user authentication for all accesses within the network System-wide policy eacl_mode 1 # composition mode narrow #EACL entry 1 neg_access_right * * pre_cond_system_threat_level local = high Local policy: #EACL entry 1 pos_access_right apache * pre_cond_system_threat_level local > low pre_cond_accessID_USER apache * Prevention of penetration and/or surveillance attacks by detecting CGI script abuse System-wide policy eacl_mode 1# composition mode narrow #EACL entry 1 neg_access_right * * pre_cond_accessID_GROUP local BadGuys Local policy #EACL entry 1 neg_access_right apache * pre_cond_regex gnu “ ‘*phf*’ ‘test-cgi*’ “ rr_cond_notify local on:failure/email/sysadmin/info : CGIexploit rr_cond_update_log local on:failure/BadGuys/info:IP #EACL entry 2 Pos_access_right apache *

35 Conclusions Traditional access control mechanisms have little ability to support or respond to the detection of attacks. A generic authorization framework that supports security policies that can detect attempted and actual security breaches and which can actively respond by modifying security policies dynamically has been developed. The GAA-API implementation is available at

Download ppt "Real-Time Intrusion Detection Systems Sandeep Kotagiri Graduate Student, CACS April 11 th 2006."

Similar presentations

Ads by Google