Presentation is loading. Please wait.

Presentation is loading. Please wait.

Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)

Similar presentations


Presentation on theme: "Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)"— Presentation transcript:

1 Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI) Raúl Monroy (ITESM)

2 Introduction Intrusion detection often amounts to detecting an known pattern of computer misuse, or an deviation from expected user behaviour. Current Intrusion Detection Systems (IDSs) get easily overwhelmed for the amount of information they ought to analyse.

3 Goal To make intrusion detection more tractable, we compact the information to be analysed without losing key information. To make it scalable, we suggest to structure intrusion detection as a collection of sensors, each of which is specialised to a particular service.

4 Agenda IDS Architecture System Call Normalization Using Rough Sets Session Folding Using N-Gram Models Intrusion Detection With Reduced Audit Files Service Selection Using Hidden Markov Models Conclusions

5 IDS Architecture The architecture is novel in two respects:  It incorporates an service discriminator that separates input sequences in terms of the service they belong to.  it incorporates both an misuse detector (MIDS) and anomaly detector (AIDS), probabilistically combining their output.

6 IDS Architecture

7 In the attribute filter module, in order to remove redundant information we use an attribute filter based on Rough Set Theory. The session folding module takes in input session and substitutes common subsequences with a fresh tag to reduce session length.

8 IDS Architecture Service selection module serves as a discriminator that uses hidden Markov models (HMMs) to calculate the probability that a given session belongs to certain service. Misuse detection module uses HMMs and word networks to detect similarities of a session with known attacks.

9 IDS Architecture Anomaly detection module relies on probabilistic context-free grammars, to create normal behaviour profiles and test for anomalies. By combining both the misuse and anomaly IDSs, false positive ratio is reduced because of the misuse confirmation, and unseen attacks are detected because of the anomaly detection.

10 System Call Normalization Using Rough Sets We used RST to find chief attributes which ought to be considered for session analysis. Attribute reduction will be minimal with respect to content of information. We ran a number of separate analysis, each of which considers a session segment, and then collected the associated reducts.

11 System Call Normalization Using Rough Sets To find the minimum common reduct, we performed an statistical analysis which removes those attributes that appeared least frequently. Our methodology to reduct extraction closely follows that one outlined by Komorowski. Log files have originally 51 attributes

12 System Call Normalization Using Rough Sets Using a frequency-based discriminator, we constructed a minimum common reduct (MCR). The largest reduct in the original set has 15 attributes and our minimum common reduct has 18 attributes. This is a reduction of 66%.

13 System Call Normalization Using Rough Sets The validation methodology basically appeals to so- called association patterns. Given both a reduct and a log file, the corresponding association patterns are extracted. The association patterns are compared against another log to compute how well they cover that log file information. The more information about the system the reduct comprehends the higher the matching ratio the association patterns of that reduct will have.

14 Session Folding Using N-Gram Models N-Gram extraction consists of the application of a blind, exhaustive procedure. As a result, we obtained the n-grams that occur most frequently in the training sessions. If an n-gram is present in the training log files an occurrence frequency is assigned to it, if it is not present then an occurrence probability is assigned.

15 Session Folding Using N-Gram Models Using both n-gram occurrence frequency and probability, we estimated a reduction ratio later used as a priority Pr for every selected n-gram. This priority is used to avoid overlapping when making an n-gram substitution. By substituting n-grams with higher ratio we warranty that, even if there is overlapping, only the n-grams that provide maximum reduction are used.

16 Session Folding Using N-Gram Models We used three dimensional histograms to analyse the amount of n-grams with a frequency similar to a multiple of the session number in the training data. Based on this frequency analysis we identify the n-grams with a desired frequency. Extracted n-grams provide an average reduction of 74% within the training sessions.

17 Intrusion Detection With Reduced Audit Files We used HMMs and word networks as our misuse IDS. The attacks used during the tests are part of the 1998 and 1999 DARPA repositories. We used 20 instances of each attack to train the HMMs. We tested against 800 telnet, 1000 smtp, 50 ftp and 150 finger sessions.

18 Intrusion Detection With Reduced Audit Files Detection ratio for non-reduced data is about 92% with a 6% false positive ratio. Using reduced data we obtained a 94% detection ratio and a false positive ratio of 7%. The difference in false positives was found in short attacks as eject. Most of the false positives were normal sessions labelled as one of these short attacks.

19 Service Selection Using Hidden Markov Models Every service such as telnet, ftp, or smtp has a distinctive session header. Variations of similar n-grams describing the headers will be regarded as different. With the use of hidden Markov Models (HMM) to group similar n-grams, we group such n- grams as a family.

20 Service Selection Using Hidden Markov Models The main advantages of using a service discriminator are:  Flexibility, the configuration for the service we are monitoring can change, i.e. different port number. Service selection will not be affected since we are analysing behaviour and not a specific configuration.  Efficiency, the search space is reduced whether we make misuse or anomaly detection (e.g. only attacks for the selected service are verified).

21 Service Selection Using Hidden Markov Models  Scalability, in order to monitor a new service, we only need to add a discriminator for that service, and train the misuse and anomaly detection modules accordingly. By using service discrimination, our IDS uses a reduced search space to look for an intrusion. These characteristics have been of interest for an IDS since Denning's paper in 1985, and have been pointed out by Forrest et al., Zamboni et al., and Mell et al.

22 Service Selection Using Hidden Markov Models The analysis is made to the sequence of system calls previous to the call fork which usually indicates the beginning of user interaction process. By using our service discrimination method, when a fork is reached the service is already selected.

23 Service Selection Using Hidden Markov Models We used an n-gram frequency analysis to identify the sequence of system calls prior to the fork call. We then gather all different header files for each service and use them to train a HMM for each service. Table 1 shows the percentage of correctly discriminated sessions.

24 Service Selection Using Hidden Markov Models The matrix in table 1 shows in its first column the specific service HMM used to discriminate columns 2-5 sessions. Columns not in the diagonal are the false positives. Table 1 HMMstelnetsmtpftpfinger telnet100%2%1%0% smtp1%100%2%0% ftp0%1%100%0% finger0% 100%

25 Conclusions We have been able to show how two techniques, rough sets for attribute reduction and n-gram models for session folding, can be used to reduce the size of the audit files. The attribute reduction method provides a 64% reduction of the original attributes. The session folding method provides a 74% reduction in session length.

26 Conclusions By using the service discrimination module, we are able to scale IDSs by adding any number of services. There is added flexibility because a service only needs to keep the same header behaviour in order to be selected. The detection module also benefits from the service discrimination by reducing the search space for both misuse and anomaly detection.


Download ppt "Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)"

Similar presentations


Ads by Google